JOURNAL OF CLINICAL MICROBIOLOGY, May 2004, p. 1923–1932 0095-1137/04/$08.00⫹0 DOI: 10.1128/JCM.42.5.1923–1932.2004
Vol. 42, No. 5
Sequencing and Comparative Analysis of Flagellin Genes fliC, fljB, and flpA from Salmonella J. R. McQuiston,* R. Parrenas, M. Ortiz-Rivera, L. Gheesling, F. Brenner, and P. I. Fields Foodborne and Diarrheal Diseases Branch, Division of Bacterial and Mycotic Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia 30333 Received 1 August 2003/Returned for modification 16 December 2003/Accepted 16 February 2004
Salmonella isolates have traditionally been classified by serotyping, the serologic identification of two surface antigens, O-polysaccharide and flagellin protein. Serotyping has been of great value in understanding the epidemiology of Salmonella and investigating disease outbreaks; however, production and quality control of the hundreds of antisera required for serotyping is difficult and time-consuming. To circumvent the problems associated with antiserum production, we began the development of a system for determination of serotype in Salmonella based on DNA markers. To identify flagellar antigen-specific sequences, we sequenced 280 alleles of the three genes that are known to encode flagellin in Salmonella, fliC, fljB, and flpA, representing 67 flagellar antigen types. Analysis of the data indicated that the sequences from fliC, fljB, and flpA clustered by the antigen(s) they encode not by locus. The sequences grouped into four clusters based on their conserved regions. Three of the four clusters included multiple flagellar antigen types and were designated the G complex, the Z4 complex, and the ␣ cluster. The fourth cluster contained a single antigen type, H:z29. The amino acid sequences of the conserved regions within each cluster have greater than 95% amino acid identity, whereas the conserved regions differ substantially between clusters (75 to 85% identity). Substantial sequence heterogeneity existed between alleles encoding different flagellar antigens while alleles encoding the same flagellar antigen were homologous, suggesting that flagellin genes may be useful targets for the molecular determination of flagellar antigen type.
Salmonella serotypes, which is used by most laboratories for the characterization of Salmonella isolates, recognizes 46 O serogroups and 114 H antigens that, in various combinations, make up the 2,523 characterized serotypes (19, 20). Some H antigens are composed of multiple antigens, termed factors; for example, H:e,n,x is the designation for a flagellar antigen that consists of three separate factors, e, n, and x, that occur together in one flagellum. The 114 H antigens are composed of combinations of 99 distinct antigenic factors. Flagellar antigens that are immunologically related are known as complexes. For example, the G complex includes all flagellar antigen types that contain antigenic factor g (e.g., g,m; f,g; g,z51), plus flagellar antigen m,t. Flagellar antigen types that include antigen H:z4 are considered the Z4 complex. The Kauffmann-White scheme for serotype designation includes subspecies identification, which is typically determined by biochemical characterization. Phenotypic and genetic characterization of Salmonella identified two species, Salmonella enterica and Salmonella bongori (3). S. enterica was further subdivided into six subspecies (I, II, IIIa, IIIb, IV, and VI) (5, 8, 9). A seventh subspecies has also been described based on multilocus enzyme electrophoresis (2), but it is not used for the purpose of serotype determination. Subspecies IIIa and IIIb comprise what was formerly referred to as the genus Arizona (7). S. bongori was initially described as subspecies V, but subsequent studies showed that it was sufficiently divergent to be considered a separate species (22). However, for simplicity, S. bongori is still commonly referred to as subspecies V for the purpose of serotype designation. The name S. enterica does not have taxonomic standing with the Judicial Commission of the International Committee of Systematic Bacteriology; the no-
Salmonella causes an estimated 1.4 million illnesses and 600 deaths in the United States each year (18). Understanding the epidemiology of these organisms and the investigation of outbreaks caused by Salmonella have been greatly facilitated by the characterization of isolates by serotyping. In the United States, serotyping is the basis for the National Salmonella Surveillance System, which collects reports of isolates of Salmonella from human sources from all 50 states. These data are reported to the Foodborne and Diarrheal Diseases Branch and Biostatistics and Information Management Branch at the Centers for Disease Control and Prevention (CDC) in Atlanta, Ga. (4). Serotyping consists of the immunologic classification of two surface structures, O-polysaccharide (O antigen) and flagellin protein (H antigen). Salmonella is unique among the Enterobacteriaceae in that it commonly has two distinct H antigens, the phase 1 and phase 2 flagellar antigens, that are coordinately regulated such that only one flagellar antigen is expressed at a time in a single cell (23). Rarely, Salmonella isolates express additional flagellar antigens. Some of these additional antigens have been unstable while others behave as typical flagellar antigens or are thought to be variants of common flagellar antigen types (10, 12, 19). They have been termed phase 3 and R phases; for simplicity, we will refer to all of these additional flagellar antigens as phase 3/R antigens. The Kauffmann-White serotyping scheme for designation of
* Corresponding author. Mailing address: National Salmonella Reference Laboratory, Foodborne and Diarrheal Diseases Branch, Division of Bacterial and Mycotic Diseases, MS-C03, Centers for Disease Control and Prevention, 1600 Clifton Rd., Atlanta, GA 30333. Phone: (404) 639-0270. Fax: (404) 639-3333. E-mail: [email protected]
MCQUISTON ET AL.
menclature used here is based on the recommendations of the World Health Organization Collaborating Center for Reference and Research on Salmonella and is used by most laboratories worldwide (3). Serotyping by traditional methods has several drawbacks. The complexity of the serotyping scheme makes it difficult to maintain. It requires more than 250 different typing sera as well as 350 different antigens for preparation and quality control of the antisera. Commercial antisera often are unavailable for less common antigens or, if available, are of variable quality. According to the current serotyping method, a minimum of 3 days is required to determine the serotype of an isolate; depending on the complexity of the serotype, it can take much longer. To circumvent the problems associated traditional serotyping, we have begun the development of a system for the molecular identification of serotype based on the genes responsible for expression of serotype antigens. There are many advantages to this approach. DNA probes can be chemically rather than biologically synthesized, making them easier to reproduce and quality control than antisera. The technology for DNA-based assays is fairly universal, making it available to more laboratories than traditional serotyping. Also, DNAbased methods have the potential to be faster and able to be automated, and generally, they are more precise than traditional serological typing. In most isolates of Salmonella, two genes encode flagellar antigens. fliC encodes the phase 1 antigens, and fljB encodes the phase 2 antigens (30). These genes are coordinately expressed by a phase-variation mechanism (23). fliC is located in one of the flagellar biosynthesis operons, is present in all Salmonellae, and has a homologue in Escherichia coli (16). fljB is located in a region of the genome that is unique to Salmonella and is present in four of the six subspecies. Isolates of S. bongori have been reported to have a gene homologous to fljB, although this species is typically monophasic (1). A triphasic isolate that was genetically described possessed the third flagellin gene, flpA, on a plasmid (24). Genes that encode bacterial flagellin are typically highly conserved at their 5⬘ and 3⬘ ends while the middle region is generally quite variable. The conserved regions encode the flagellar filament backbone and are critical for the assembly of the filament. The central region, corresponding approximately to amino acids 181 to 390, encodes the surface-exposed and antigenically variable portion of the filament (13–15, 29). Several studies have reported DNA sequences for Salmonella flagellin genes (6, 11, 13, 15, 17, 24, 25, 27, 29). As of June 2003, 74 complete or partial Salmonella fliC alleles and 25 complete or partial Salmonella fljB allele sequences had been reported in GenBank release no. 132, excluding complete genome sequences. Here we report an analysis of 280 flagellin alleles from Salmonella. We sequenced complete fliC, fljB, and flpA alleles that represented 67 of the 114 known flagellar antigenic types. The 67 flagellar antigen genes that were characterized include all flagellar antigens found in the 100 most common serotypes in the United States; the 100 most common serotypes are responsible for 98% of culture-confirmed human infections (4). We characterized common and unique features of the flagellin alleles as a group, and we performed comparative DNA sequence analysis to determine the amount of genetic
J. CLIN. MICROBIOL. TABLE 1. PCR and sequencing primers used in this study Primer designation
Primer type and target
PCR primers fliC
Sense 56 (27) Flj4 R
Flp F2 Flp R
Sequencing primers External primers fliC alleles
F14s F R
Internal primers All alleles
1 2 3 4 5 6
G3 G5 G6 G7
GCGCGGAATAATGAGG CATAAAGC GCTTTCGCTGCCTTGAT TGTGT TGTCGATAACCTGGATG ACACAGG GGCATCCAGTGTAGTA CCATTATC AGTCCCGTGGAGCCTT CCGGATTAACGTATCA GAGACAGC
GAAATTCAGGTGCCGA TACAAGGG CGCTGCCTTGATTGTGT TCGATAACCTGGATGA CACAG CATTTACAGCTATACAT TCCATAAAGA (CT)GAAAT(CT)AACAA CAACCTG GTTAGCAATCGCCTGAC CTG CAGATCAACTCTCAGAC CCTGG AACTGGGTGGCGTAGA( CT)GG(CT)AA TTTGCTGGCATTGTAGG TTTTAC TGGATGGTCAGGGTGT TGTC GTCTGCGCCACCCAGTT GTTCAACGGCGTGAAA GTCC CCATTAAAGGTGGTAA GGAAGGAG ATCACCTTAGCAGGCA AAACC GTAACATTCTTGAAGCT GGATTTC AGTTTCGCACTCTCAT TTTTGG TTAGTTTTGATTCGGA TAAAGATGT GGCTTTTAGATCTGCT CCATC
diversity within flagellin alleles and to determine whether or not flagellin sequences might serve as a useful target for molecular determination of flagellar antigen type. MATERIALS AND METHODS Bacterial strains. All Salmonella strains were obtained from the National Salmonella Reference Laboratory, CDC, with the exception of one isolate, S. enterica serotype Moscow strain number 27 from Statens Serum Institute, Copenhagen, Denmark. The panel included 270 strains; in some strains, both fliC and fljB were sequenced. A strain list is available upon request. Genomic DNA extraction. Genomic DNA for PCRs was prepared by using the QIAamp DNeasy kit (Qiagen) and the procedure for bacterial DNA isolation supplied by the manufacturer. Approximately 40 g of genomic DNA was isolated from a 10-l loop of bacteria. Primers. Primers for amplification of the fliC, fljB, and flpA genes and for DNA sequencing are listed in Table 1. Because of their different genomic locations, the sequences that flank fliC, fljB, and flpA are distinct. Primers for the amplification of each gene correspond to the 5⬘ and 3⬘ noncoding sequences. Two nested, external primers were used to sequence each amplicon; an additional six internal primers were used to sequence alleles from the alpha cluster.
ANALYSIS OF FLAGELLIN GENES OF SALMONELLA
VOL. 42, 2004
TABLE 2. fliC alleles sequenceda No. of alleles sequenced
1,2 1,5 1,5,7 1,6 a b c d e,h f,g f,g,m,t f,g,s f,g,t g,m g,m,p,s g,m,q g,m,s g,m,s,t g,m,t g,p g,p,s g,p,u g,q g,s,t g,t g,z51 g,z62 i
1 2d 4d 4d 2 6 7 6 3 2 1 2 5 5 2 1 2 2 2 2 1 1 2 2 2 2 2 16
je k (k) l,v l,w l,z13 l,z13,z28 l,z28 m,p,t,u m,t r r,i y z z4,z23
2 7 4 5 2 2 2 2 2 2 5 2 2 4 13
I 4,12:1,2:1,2c II 40:z6:1,5; II 11:⫺:1,5 II 9,46:e,n,x:1,5,7; II 16:e,n,x:1,(5),7; II 17:e,n,x,z15:1,5,7; IIIb 53:z:1,5,(7) II 16:z6:1,6; II 17:e,n,x,z15:1,6; II 42:e,n,x:1,6; IIIb 40:z39:1,6 Paratyphi A (2) Niederoderwitz; Paratyphi B (2); I 6,7:b:z33; IIIa 47:b:⫺; IIIb (6), 14:b:e,n,x,z15 Choleraesuis (2); Goeteborg; Jericho; IIIb 41:c:e,n,x,z15; IIIb 57:c:e,n,x,z15; IIIb 57:c:z:z60 Isangi; Muenchen; Schwarzengrund; Typhi (2); Virginia Newport; Saintpaul; Sandiego v. d⫹ Derby (2) II 6,8:f,g,m,t:e,n,x Agona (2) Berta (3); II 17:f,g,t:e,n,x,z15; II 40:g,m,s,t:z42 Enteritidis (5) Montevideo (2) Blegdam Amsterdam (2) II 43:g,m,s,t:z42; II 50:g,m,s,t:1,5 II 6,8:g,m,t:1,7; II 28:g,m,t:e,n,x Dublin (2) Naestved Rostock Moscow (2) Missouri; Senftenberg Agodi; II 16:g,t:1,5 Travis; IIIa 45:g,z51:⫺ II 9,46:g,z62:⫺; II 50:g,z62:e,n,x Agama; Augustenborg (2); Bandia; Gloucester; Idikan; Kentucky; Lindenburg; Typhimurium (3); II 4,12,27:i:z35; IIIb 21:i:e,n,x,z15; IIIb 48:i:z; IIIb 57:i:e,n,x,z15; IIIb 61:i:z Typhi v. j⫹ (2) Blockley; Inverness; IIIb 17:k:z; IIIb 42:k:e,n,x,z15; IIIb 50:k:z; IIIb 50:k:z53; IIIb 61:k:1,5,(7) IIIb 16:(k):e,n,x,z15; IIIb 38:(k):z35; IIIb 42:(k):z35; IIIb 65:(k):z Brandenburg; Give; Kimberley; Potsdam; IIIb 38:1,v:z54 Ayton; Glidji Kenya; Westerstede Connecticut; Fallowfield Javiana (2) Haelsingborg (2) Oranienburg (2) Heidelberg; Rubislaw; IIIb 35:r:z61; IIIb 38:r:e,n,x,z15; IIIb 58:r:z53:z47 Bovismorbificans; Hidalgo Freetown; Giza Indiana; Poona; II 16:z:z42; II 40:z:z42 Ajiobo; Ekotedo; Gera; I 47:z4,z23:⫺; I 8,20:z4,z23:⫺; IIIa 18:z4,z23:⫺; IIIa 40:z4,z23:⫺; IIIa 41:z4,z23:⫺: IIIa 43:z4,z23:⫺; IIIa 44:z4,z23:⫺; IIIa 62:z4,z23:⫺; IV 38:z4,z23:⫺; IV 40:z4,z23:⫺ Romanby; II 9,12,46,27:z4,z24:1,5; IIIa 53:z4,z24:⫺ Tallahassee; IIIa 18:z4,z32:⫺; IIIa 41:z4,z32:⫺; IIIa 63:z4,z32:⫺; IV 44:z4,z32:⫺; IV 48:z4,z32:⫺ IIIa 13,23:z4,z23,z32:⫺; IIIa 44:z4,z23,z32:⫺; IIIa 56:z4,z23,z32:⫺ II 48:e,n,x,z15:z6 Harrisonburg; Istanbul; IIIb 28:z10:z Cubana; Tennessee; Mundubbera (2):⫺; IIIa 41:z29:⫺; IIIa 62:z29:⫺ Tema; Tienba Potosi; Weslaco; IIIa 43:z36:⫺ IV 43:z36,z38:⫺; IV 53:z36,z38:⫺ Fresno; Lille Grancanaria (2); IIIb 6,7:z39:1,2 Bredeney v. z40⫹; Give v. z40⫹ Maska; Ottawa I 39:z48:1,5 IIIb 50:z52:z35; IIIb 65:z52:z35 S. bongori ser. 66:z65:⫺ Pietersburg S. bongori ser. 40:z81:⫺; S. bongori ser. 66:z81:⫺
z4,z24 z4,z32 z4,z23,z32 z6 z10 z29 z35 z36 z36,z38 z38 z39 z40e z41 z48e z52 z65 z69 z81 a
3 6 3 1d 3 6 2 3 2 2 3 2 2 1 2 1 1 2
Flagellar antigens that are expected to be encoded by fliC but not tested were H:g,s,q; H: g,z63; and H: g,z85. All serotypes belong to S. enterica unless otherwise noted. Subspecies I serotypes are denoted by name wherever possible; serotypes for all other subspecies are denoted by formula. c This isolate appeared to be monophasic by traditional serotyping, expressing only flagellar antigen 1,2. DNA sequence analysis of fliC and fljB revealed an H:1,2 allele at both loci. d Alleles were predicted to be at fljB based on the Kauffmann-White scheme. e Flagellar antigens j, z40, and z48 are considered R phases in the Kauffmann-White scheme (18). H:j and H:z40 alleles contained deletions in what were otherwise typical H:d and H:l,v alleles. The H:z48 allele was most related to H:y alleles but distinct (Fig. 1). This isolate is considered to be a variant of serotype Champaign (formula I 38:k:1,5) in the Kauffmann-White scheme. b
MCQUISTON ET AL.
J. CLIN. MICROBIOL. TABLE 3. fljB and other flagellin alleles sequenceda,b
No. of alleles sequenced
1,2 1,2,7 1,5 1,5d 1,5,7
7 3 7 1 8
1,6 1,7 1, . . .e a d df e,n,x
3 5 3 1 1 1 13
e,n,x,z15 e,n,z15 k l,w l,z13;z28 zh z6 z35 z39 z88h
5 5 3 2 2 5 3 6 6 1
Derby; Heidelberg; Litchfield; Muenchen; Stanley; Typhimurium (2) Eingedi; Kambole (2) Bovismorbificans; Infantis; Thompson (3); I Rough:r:1,5; I 43:e,h:1,5 Houston Hisingen; IIIb Rough:z10:1,5,7; IIIb 6,7:l,v:1,5,7; IIIb 47:r:1,5,7; IIIb 48:1,v:1,5,(7); IIIb 48:1,w:1,5,7:z50; IIIb 61:k:1,5,(7); IIIb 61:l,v:1,5,7 Agama; Poona (2) Beaudesert; Bredeney; Give; Nola; Pomona Bulovka v. 1, . . .⫹; II 48:d:1, . . . ; II 60:b:1, . . . II 45:a:z10g Houston Sandiego v. H:d⫹ Bessi; Bonn; Chester (2); Gatuni; Istanbul; Kokomlemle; Singapore; Tambacounda; Tiergarten; II 9,12:1,w:e,n,x; II 9,46:e,n,x:1,5,7g; II 56:e,n,x:1,7g II 17:e,n,x,z15:1,6g; IIIb16:z10:e,n,x,z15; IIIb 42:k:e,n,x,z15; IIIb 47:i:e,n,x,z15; IIIb 57:i:e,n,x,z15 Braenderup; Brandenburg; Sandiego v.d⫹; Sanktgeorg; Uno II 13,23:k:z41g; II 52:c:k; IIIb 52:c:k Gloucester; Ohio Lutetia; Poano IIIb 48:i:z; IIIb 48:z:1,5,(7)g; IIIb 50:k:z; IIIb 53:z:1,5,(7)g; IIIb 61:i:z Bere; Frankeng; Weltevreden Kolar; Tamilnadu; IIIb 35:z52:z35; IIIb 38:(k):z35; IIIb 38:z52:z35; IIIb 65:z52:z35 II 4,12:g,m,t:z39; II 43:d:z39; II 51:l,z28:z39; II 52:z39:1,5,7g; II 6,7:z39:1,5,7g; IIIb 40:z39:1,6g VI 6,14:1,v:z88
All alleles are fljB except where noted. Phase 2 and phase 3 flagellin antigens that were not tested were 1,2,5; 1,6,7; z34; z37; z46; z47; z50; z56; z58; z64; z66; z67; z68; z70; z72; z73; z74; z75; z76; z77; z78; z80; z82; z83; z84; z86; z87; and z89. c All serotypes belong to S. enterica unless otherwise noted. Subspecies I serotypes are denoted by name wherever possible; serotypes for all other subspecies are denoted by formula. d The genomic location of this allele was not determined. A sequencing template was generated by using primers corresponding to sequences within the conserved regions. e Flagellar antigen 1 . . . indicates an antigen that reacts with 1 complex antisera but not with antisera specific for the secondary antigens of the 1 complex. H:1 . . . represents multiple antigenic specificities and is considered R phase in the Kauffmann-White scheme. f This allele was sequenced from the flpA locus. g Alleles in these serotypes were in the locus opposite from that predicted by the Kauffmann-White scheme, except that the location of the H:z67 allele in serotype Franken (formula I 9,12:z6:z67) was not determined. h The fljB H:z and H:z88 alleles were homologous, with 99.4% nucleotide identity and 100% amino acid identity. These antigens have been noted to be immunologically related. b
To sequence alleles from the G and Z4 complexes, six and four additional internal primers were used for each complex, respectively. Additional primers were needed to complete the sequences of the more divergent alleles (b; d; j; k; l,v; z29; z36; z38; and z81). These primer sequences are available upon request. PCR. Ready-to-Go PCR beads (Amersham Biosciences Corp., Piscataway, N.J.) were used with 1 l (200 ng) of genomic DNA, 1 l of forward and reverse primer mix (5 pmol of each primer), and 23 l of deionized H2O. Amplification parameters were a preheat step of 96°C for 4 min, followed by 35 cycles of 96°C for 30 s, 58°C for 30 s, and 72°C for 1 min. The products were visualized on a 1% agarose gel stained with ethidium bromide under UV light. The PCR products were desalted by using the QIAquick PCR extraction kit (Qiagen) following the protocol of the manufacturer, except that two additional wash steps were added and an additional 3-min centrifugation step was used to ensure complete removal of the wash buffer. The eluent was diluted 1:10 for use in DNA sequencing reactions. DNA sequencing. All sequencing was performed on an ABI 377 by using the Big Dye sequencing kit (Applied Biosystems). Sequencing reactions were performed according to the protocol supplied with the kit, except that 3 l of Big Dye mix was used with 3.3 pmol of sequencing primer and 11 l of template in a 15-l reaction. Dye terminators were removed with Centri-sep spin columns (Princeton Separations). Sequences were determined for both strands, resulting in twofold redundancy. Unique variable bases were confirmed by sequencing a second sequencing template. DNA sequence analysis. DNA sequences were assembled and analyzed by using Lasergene 5.0 software (DNAstar) and the Wisconsin Package, version 10.1 (Genetics Computer Group, Madison, Wis.). Nucleotide sequence accession numbers. All sequences from this study are available from GenBank under accession numbers AY353258 to AY353269, AY353271 to AY353287, AY353289 to AY353296, AY353298 to AY353303,
AY353305 to AY353309, AY353311 to AY353389, AY353391 to AY353434, and AY353436 to AY353549.
RESULTS General characteristics of flagellin alleles. The entire coding sequence was determined for 280 fliC, fljB, and flpA alleles (Tables 2 and 3). In addition, a partial sequence was determined for one H:1,5 allele that was not located at fliC, fljB, or flpA. It was amplified with primers within the coding sequences for flagellin; its genomic location was not determined (Table 3). Most of the coding sequences were 1,488 to 1,536 nucleotides in length; sequences encoding antigens of the Z4 complex and related antigens (see below) ranged from 1,266 to 1,280 nucleotides. All sequences were highly conserved at the extreme 5⬘ and 3⬘ ends and diverged, with respect to other alleles, as they approached the middle, variable portion of the gene. Conserved and variable regions were consistent with previous designations (15, 29). Typically, alleles expected to encode the same flagellar antigen clustered together. Alleles encoding the same antigen from the same subspecies were more closely related than alleles encoding the same antigen from different subspecies; this was particularly noticeable for subspecies IIIa and IIIb alleles.
VOL. 42, 2004
ANALYSIS OF FLAGELLIN GENES OF SALMONELLA
FIG. 1. Ninety representative Salmonella fliC, fljB, and flpA alleles. Amino acid sequences of representative alleles encoding the 67 flagellar antigens were aligned with the program Clustal V (DNAstar). The figure is a cluster analysis and does not imply any phylogenetic relationships between the sequences. The tree was generated in DNAstar. E. coli sequences are from Wang and colleagues (28); GenBank accession numbers for these sequences are shown on the figure.
MCQUISTON ET AL.
J. CLIN. MICROBIOL.
FIG. 2. Diagram of 5⬘ and 3⬘ ends of fliC, fljB, and flpA. Sequences on the left are nucleic acid sequences; the corresponding amino acid sequences are on the right. Start and stop (Term.) codons are indicated. fliC is the reference sequence; bases or amino acids that differ from this consensus are indicated by substitutions. Numbers in parentheses are numbers of alleles with the sequence.
In a few instances, alleles expected to encode a rare flagellar antigen clustered closely with more common flagellar antigen types (Fig. 1). Two fliC alleles expected to encode flagellar antigen z52 were related to alleles encoding H:l,v. A fliC allele expected to encode flagellar antigen z48 was related to alleles encoding H:y; fliC and fljB alleles expected to encode flagellar antigens z69 and z88 were related to alleles encoding H:z6 and H:z, respectively. With the exception of H:z and H:z88, none of these flagellar antigen pairs has been noted to be antigenically related. Whether these antigens were actually encoded by fliC or fljB, or whether they were encoded elsewhere in the genome and the fliC or fljB was not expressed in these isolates, was not determined. Alignment of the full-length flagellin sequences revealed that they clustered into four main groups (Fig. 1). Three of the four groups contained multiple antigenic types and were designated the G complex, the Z4 complex, and the ␣ cluster; the fourth group contained a single antigenic type, z29. The G complex contained all alleles encoding flagellar antigen g plus alleles encoding the immunologically related antigen m,t. The Z4 complex contained all alleles encoding flagellar antigen z4 in addition to antigens z36; z38; and z36,z38. Flagellar antigens z36; z38; and z36,z38 had not previously been noted to be related to the Z4 complex. The ␣ cluster contained the remainder of the flagellar antigen types, with the exception of H:z29, which separated into a group by itself. Comparison of the conserved region. Overall, the 5⬘ and 3⬘ ends of the genes between the fliC, fljB, and flpA alleles were highly conserved, particularly at the extreme ends, where they approached 100% identity. However, with a few exceptions, there were 6 and 8 synonymous nucleotide substitutions within
the first 37 and last 30 nucleotides, respectively, when comparing fliC alleles to fljB alleles (Fig. 2). The flpA alleles were closely related to fljB alleles in the conserved regions, as noted by Smith (24), although they encoded flagellar antigen d, which is typically encoded by fliC. We designated these unique sequences at the extreme 5⬘ and 3⬘ ends of the gene as the fliC and fljB signature sequences. There were 20 exceptions to the signature sequences among the 280 alleles sequenced (Fig. 2). Inspection of the sequence alignments suggested that sequences within a cluster had a high degree of identity within their conserved regions. Comparative analysis of 181 amino acids from the N terminus and 106 amino acids from the C terminus from 74 unique sequences that represented all of the flagellar antigen alleles was performed. There was greater than 97% amino acid sequence identity within the conserved region of the Z4 complex, the G complex, and the H:z29 sequences. Sequences within the ␣ cluster had ⬎89% amino acid sequence identity, but this number rose to greater than 94% amino acid sequence identity when the most divergent antigens (flagellar antigens b and d) were removed from the analysis. Li and colleagues (15) noted that flagellar antigen g,z51 clustered separately from the rest of the G complex; when g,z51 alleles were removed from the analysis, the G complex alleles had greater than 98% amino acid identity in the conserved region. In contrast, there was 73 to 87% amino acid identity between the groups, which is on the order of the sequence identity with representative E. coli fliC sequences from GenBank release 132. Comparison of the variable region. Alleles in the four clusters exhibited different levels of diversity within the variable region depending on the cluster. H:z29 alleles from multiple
VOL. 42, 2004
ANALYSIS OF FLAGELLIN GENES OF SALMONELLA
FIG. 3. Characterization of H:k alleles. (a) Dendrogram of H:k and H:i alleles. The serotype from which the sequence was obtained is listed to the right of each branch. Named serotypes are from subspecies I. Other subspecies are indicated as part of the serotype designation. (b) Alignment of amino acids 250 to 300 of the variable region. Amino acids that differ from the consensus are indicated (boxed).
subspecies had greater than 98% amino acid identity within the variable region. With the exception of the most divergent antigens, H:m,t and H:g,z51, alleles in the G complex had at least 90% amino acid identity within the variable region; some antigens, such as H:g,p and H:g,p,s, differed by a single amino acid. Alleles in the Z4 complex that encoded a H:z4 epitope had greater than 84% amino acid identity in the variable region. Alleles encoding H:z36 and H:z38 antigens shared very little identity with the rest of the Z4 complex in the variable region. The ␣ cluster, which is composed of the largest number of antigens, was also the most diverse. Alleles encoding immuno-
logically related antigens, such as those of the L or EN complexes, typically had greater than 90% amino acid identity in the variable region. Alleles encoding immunologically unrelated antigens had as high as about 70% amino acid identity in the variable region, although most alleles had no identifiable homology in the variable region. Alleles encoding antigen H:k were noted to be particularly diverse. Sequence comparison of the variable region from H:k alleles from subspecies I, II, IIIa, and IIIb revealed three groups of alleles that differed by about 20 to 25% of their amino acids (Fig. 3). In contrast, alleles encoding H:i from subspecies I, II, and IIIb had greater than 97% amino acid identity in the variable region and greater
MCQUISTON ET AL.
J. CLIN. MICROBIOL. TABLE 4. Flagellar antigens not encoded by fliC
z44 z53 z60 z61 z71
Serotype tested (formula)
Sequence of fliC
Bulovka (I 6,7:z44:⫺) Quinhon (I 47:z44:⫺) IIIb 38:z53:z50 IIIb 38:z53:⫺ Aesch (I 6,8:z60:1,2) IIIa 38:z61:⫺ IIIb 38:z61:z53 Delmenhorst (I 18:z71:⫺)
H:k homolog with a premature stop codon H:z homolog with a premature stop codon H:l,v homolog with a 1,400-bp insertiona H:l,v homolog with a 1,400-bp insertiona H:e,h homolog with a premature stop codon H:l,v homolog with a premature stop codon H:l,v homolog with a premature stop codon H:z4,z23 homolog with an IS30 homolog insert
Blast searches of GenBank release number 132 with the 1,400-nucleotide inserted sequence did not produce any significant matches, but a 152-amino-acid open reading frame from the insert had similarity to many putative transposases in GenBank. The best match was with a putative transposase from Pseudomonas putida (81% amino acid similarity; accession number AJ318529 [14a]).
than 94% amino acid identity when compared to H:i alleles from S. bongori (Fig. 3). Alleles encoding several flagellar antigens were noted to contain deletions in what were otherwise genes encoding related flagellar antigens, including H:j, which has previously been reported to be encoded by an 86-amino-acid deletion in antigen H:d allele (10). Two alleles encoding H:z40 were identical to alleles encoding H:l,v, except for a 19- or 7-amino-acid deletion in the variable region. Flagellar antigen r,i alleles were closely related to H:r alleles, but they contained a 2-amino-acid deletion adjacent to a 1-amino-acid substitution at position 301 (Ala3Thr). The deletion and substitution were identical to the corresponding residues in H:i alleles. An allele encoding flagellar antigen 1. . . (a Phase 3/R antigen related to 1 complex antigens) contained a typical H:1,2 allele but with a 40-aminoacid deletion. Two other H:1. . . alleles were the same length as 1 complex alleles but were distinct from any other 1 complex alleles (Table 3; Fig. 1). Genetic location of 1 complex, EN complex, and other alleles. The Kauffmann-White serotyping scheme lists most 1 complex flagellar antigens as phase 2 antigens; a few 1 complex antigens are considered phase 3/R. Flagellar antigens of the EN complex are listed in both phase 1 and phase 2. Ten of 34 1 complex alleles that were sequenced were found to be at the fliC locus (Tables 2 and 3). Most of these isolates were from subspecies II, and many had an EN complex allele at the fljB locus. The identity of the antigens encoded by fliC and fljB in these isolates was inferred from their homology to other 1 complex alleles that were encoded by fljB and to other EN complex alleles encoding H:e,h and H:e,n,x. To confirm that 1 complex antigens were encoded by fliC, primers corresponding to sequences in fliD and fliB, which flank fliC in the genome, were designed and used to generate sequencing template. The fliD-fliB PCR fragment was sequenced by using both the fliC and fliD/fliB primers and was shown by sequence homology to encode a 1 complex allele in the fliC locus. Alleles encoding flagellar antigens a, k, z6, z10, and z41 were also found in loci different from that predicted by the Kauffmann-White scheme and different from where they were typically found. H:a is always listed in phase 1 in the KauffmannWhite scheme but was identified in fljB in serotype II 45:a:z10 (Table 3). H:z6 is listed primarily in phase 2 in the KauffmannWhite scheme; however, an H:z6 allele was found at fliC in a subspecies II isolate with an H:e,n,x allele at fljB. Flagellar antigen d is commonly found in fliC but was identified at flpA and at fljB in the newly described Salmonella serotype Houston (20).
Flagellar antigens not sequenced. The genes encoding most of the phase 1 flagellar antigens, predicted to be encoded by fliC, were found and sequenced (Table 2). However, genes encoding five flagellar antigens that are listed in phase 1 of the Kauffmann-White scheme and expected to be encoded by fliC were not found. The fliC allele found in isolates expressing these flagellar antigens contained a premature stop codon or an insertion sequence that inactivated the fliC gene (Table 4). These sequences, with the insertion or stop codon deleted, had greater than 99% nucleic acid identity to alleles that were determined for other flagellar antigens. For example, an isolate of serotype Aesch, expressing flagellar antigens z60 and 1,2 contained a fliC H:e,h allele that contained a premature stop codon. An isolate of serotype Delmenhorst, expressing flagellar antigen H:z71, had a H:z4,z23 fliC allele with an insertion that was 99.9% identical to insertion sequence 30B family of insertion elements (26). The locations of the stop codons were confirmed by sequencing three independently amplified templates. Attempts were made to amplify phase 3/R antigen alleles by PCR. Since many of the phase 3/R antigens are listed in phase 2 in some serotypes, diphasic isolates expressing phase 3/R antigens were tested in the fljB PCR. No amplification product was obtained with the fljB primers for diphasic strains expressing flagellar antigens z27, z33, z42, z43, z45, z49, z54, z55, z57, z59, and z79. To determine whether additional flagellar antigens were encoded by the one phase 3/R locus that has been identified (flpA), three different forward primers corresponding to the sequences 5⬘ to that which has been sequenced were designed (data not shown). Sequence information for the 3⬘ end of this allele was not available at the time of the experiments, so reverse primers were based on the sequences overlapping the stop codon specific to phase 2 or to the fljB signature sequence. This sequence was used because the homology between flpA and fljB at the 5⬘ end suggested that their 3⬘ sequences may also be related. Using these primers, we were able to amplify a flagellin gene from four isolates expressing a phase 3/R flagellar antigen d. We tested nine isolates expressing five other phase 3/R antigens and were unable to amplify any other antigens with the primers. DISCUSSION We performed comparative DNA sequence analysis of flagellin alleles as a first step in the development of a molecular assay for the determination of flagellar antigens in Salmonella. Comparison of flagellin alleles has identified similarities and differences between sequences that typically correspond with
VOL. 42, 2004
the flagellar antigen that they encode and indicates that molecular determination of flagellar antigen type should be technically feasible. We determined the sequences of 280 flagellin alleles from Salmonella, representing approximately 90% of the phase 1 and phase 2 antigenic types. Comparative analysis of fliC and fljB alleles showed that flagellin alleles that are encoded by fliC, fljB, and flpA were homologous, particularly in the conserved region, and upon alignment, they clustered together irrespective of their genomic location. However, most fliC and fljB/flpA alleles could be distinguished by unique, phase-specific sequences at the extreme 5⬘ and 3⬘ ends of the gene. Smith also noted these conserved sequences, but as yet, no biological role has been ascribed to them (24). Most fliC and fljB alleles reported here and in the literature were fairly homogenous in size, ranging from 1,488 to 1,536 bp in size, except for fliC alleles from isolates of the Z4 complex, which were 200 to 300 nucleotides shorter. Genes encoding flagellin are typically conserved, particularly at the 5⬘ and 3⬘ ends, across most bacterial species. This was also true of the Salmonella flagellin alleles. All flagellin sequences from Salmonella fell into one of four groups based on amino acid identity in the conserved region. Three groups contained multiple flagellar antigen types; these were designated the ␣ cluster, the G complex, and the Z4 complex. Flagellar antigens z36; z38; and z36,z38 were included in the Z4 complex. Although these flagellar antigens appear to be genetically related to the Z4 complex, no antigenic cross-reactivity between these groups has been noted. The fourth group contained a single flagellar antigen type, z29. Within the four clusters, the sequences typically grouped by the antigens they encoded; antigens that were immunologically related were also genetically related. The genus Arizona was originally described as distinct from Salmonella, and an independent serotyping scheme was developed for these organisms. Subsequently, DNA-DNA hybridization studies showed that arizonae belonged in the genus Salmonella (3), and Arizona serotypes were merged into the Kauffmann-White scheme (5). Most Arizona antigens fit well into the Kauffmann-White scheme; however, a few antigens were not completely compatible. For example, flagellar antigen k is weakly expressed in some serotypes, represented by “(k)” in the Kauffmann-White scheme, where the parentheses indicate a weak seroagglutination reaction. Sequence comparisons from H:k alleles from subspecies I, II, and IIIb revealed three distinct clusters of alleles with 60 to 80% amino acid identity in the variable region (Fig. 3a). In contrast, alleles encoding other flagellar antigens, such as H:i, were highly conserved across the subspecies. This observation suggests that flagellar antigen k could be considered multiple flagellar antigen types. The Kauffmann-White scheme places most 1 complex antigens in phase 2, a few are listed as a third phase, and none are listed in phase 1. Ten of 34 of the 1 complex antigens sequenced were encoded by the fliC locus. Other antigens were also found at a locus not predicted by the Kauffmann-White scheme. Most of these instances were in less-common serotypes, and only one isolate of each serotype was characterized. It is unknown whether the location of the alleles encoding those antigens is unique to the serotype or to the isolates
ANALYSIS OF FLAGELLIN GENES OF SALMONELLA
tested; however, it may be of interest that most of the isolates (14 of 18) belonged to subspecies II. The observation that the genetic location of the antigen-encoding gene does not always correspond to that predicted from the Kauffmann-White scheme should be considered when using DNA sequence data to determine flagellar antigen type. Genes encoding several of the Salmonella flagellar antigens were not found at either the fliC or fljB locus; either the fliC locus contained an inactivated flagellin allele or the fljB locus was absent in a diphasic strain. The Salmonella serotype Aesch isolate, antigenic formula I 6,8:z60:1,2, had an inactivated H:e,h allele at fliC, suggesting the possibility that it is a variant of the more common Salmonella serotype Anatum (antigenic formula I 6,8:e,h:1,2) with the H:z60 antigen expressed from an uncharacterized genetic locus. Similarly, the Salmonella serotype Delmenhorst isolate, antigenic formula I 18:z71:⫺, has an inactivated z4,z23 allele at fliC and may be a variant of Salmonella serotype Cerro, antigenic formula I 18:z4,z23:⫺. In contrast, H:z48, predicted to be an R phase based on the Kauffmann-White scheme, was found at fliC. We found the gene for only one flagellar antigen that was not located at fliC or fljB, a previously characterized flagellar antigen d allele in a triphasic isolate of Salmonella serotype Rubislaw that was carried on a plasmid (24). Alternate genomic locations for flagellar antigen genes has also been described for E. coli, where several flagellar antigens have been shown to be encoded at loci other than fliC (21, 28). Sequencing flagellin genes that are in undetermined loci in Salmonella may prove difficult; amplifying these alleles with primers to the conserved regions will result in multiple amplicons in diphasic and triphasic strains. Cloning the intact flagellin allele may be required; this approach may have the added benefit of providing information regarding the genomic location of the flagellin allele, but the possibility exists that these genes are located in regions of the genome that are unique to a particular Salmonella serotype(s). Comparative DNA and amino acid sequence analysis of our sequences and those available in GenBank identified regions that appear to be unique to specific flagellar antigen groups and types. It remains to be determined whether these amino acid substitutions are simply markers for a particular antigen or if they are responsible for the antigenic differences that are detected by serotyping. In either case, the sequence differences will be useful for the molecular identification of alleles encoding different flagellar antigen types. These unique sequences are being targeted in the development of probes that are specific for a particular antigen or group of antigens. We are developing a PCR combined with a DNA enzyme immunoassay to differentiate phase, complex, and specific antigen types. Combination of this approach with the molecular identification for O antigen type may prove to be a useful method for determination of serotype of Salmonella and can complement or largely replace traditional serotyping methods. ACKNOWLEDGMENTS The use of trade names is for identification purposes only and does not imply endorsement by the CDC or by the U.S. Department of Health and Human Services. REFERENCES 1. Baumler, A. J., and F. Heffron. 1998. Mosaic structure of the smpB-nrdE intergenic region of Salmonella enterica. J. Bacteriol. 180:2220–2223.
MCQUISTON ET AL.
2. Boyd, E. F., F. S. Wang, T. S. Whittam, and R. K. Selander. 1996. Molecular genetic relationships of the salmonellae. Appl. Environ. Microbiol. 62:804– 808. 3. Brenner, F. W., R. G. Villar, F. J. Angulo, R. Tauxe, and B. Swaminathan. 2000. Salmonella nomenclature. J. Clin. Microbiol. 38:2465–2467. 4. Centers for Disease Control and Prevention. 2003. 2002 Salmonella surveillance summary. U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, Atlanta, Ga. 5. Crosa, J. H., D. J. Brenner, W. H. Ewing, and S. Falkow. 1973. Molecular relationships among the Salmonelleae. J. Bacteriol. 115:307–315. 6. de Vries, N. 1998. Production of monoclonal antibodies specific for the i and 1,2 flagellar antigens of Salmonella typhimurium and characterization of their respective epitopes. J. Mol. Biol. 284:521–530. 7. Ewing, W. H. 1986. Edwards and Ewing’s identification of Enterobacteriaceae, 4th ed. Elsevier, New York, N.Y. 8. Ewing, W. H. 1972. The nomenclature of Salmonella, its usage, and definitions for the three species. Can. J. Microbiol. 18:1629–1637. 9. Ewing, W. H., M. M. Ball, S. F. Bartes, and A. C. McWhorter. 1970. The biochemical reactions of certain species and bioserotypes of Salmonella. J. Infect. Dis. 121:288–294. 10. Frankel, G. 1989. Intragenic recombination in a flagellin gene: characterization of the H1-j gene of Salmonella typhi. EMBO J. 8:3149–3152. 11. Frankel, G. 1989. Unique sequences in region VI of the flagellin gene of Salmonella typhi. Mol. Microbiol. 3:1379–1383. 12. Guinee, P. A., W. H. Jansen, H. M. Maas, L. Le Minor, and R. Beaud. 1981. An unusual H antigen (Z66) in strains of Salmonella typhi. Ann. Microbiol. (Paris) 132:331–334. 13. Joys, T. M. 1985. The covalent structure of the phase-1 flagellar filament protein of Salmonella typhimurium and its comparison with other flagellins. J. Biol. Chem. 260:15758–15761. 14. Kanto, S. 1991. Amino acids responsible for flagellar shape are distributed in terminal regions of flagellin. Science 252:1544–1546. 14a.Kholodii, G., Z. Gorlenko, S. Mindlin, J. Hobman, and V. Nikiforov. 2002. Tn5041-like transposons: molecular diversity, evolutionary relationships and distribution of distinct variants in environmental bacteria. Microbiology 148: 3569–3582. 15. Li, J., K. Nelson, A. C. McWhorter, T. S. Whittam, and R. K. Selander. 1994. Recombinational basis of serovar diversity in Salmonella enterica. Proc. Natl. Acad. Sci. USA 91:2552–2556.
J. CLIN. MICROBIOL. 16. Macnab, R. M. 1992. Genetics and biogenesis of bacterial flagella. Annu. Rev. Genet. 26:131–158. 17. Masten, B. J. 1993. Molecular analyses of the Salmonella g flagellar antigen complex. J. Bacteriol. 175:5359–5365. 18. Mead, P. S., L. Slutsker, V. Dietz, L. F. McCaig, J. S. Bresee, C. Shapiro, P. M. Griffin, and R. V. Tauxe. 1999. Food-related illness and death in the United States. Emerg. Infect. Dis. 5:607–625. 19. Popoff, M. Y. 2001. Antigenic formulas of the Salmonella serovars, 8th ed. W. H. O. Collaborating Centre for Reference and Research on Salmonella, Institut Pasteur, Paris, France. 20. Popoff, M. Y., J. Bockemuhl, and L. L. Gheesling. 2003. Supplement 2001 (no. 45) to the Kauffmann-White scheme. Res. Microbiol. 154:173–174. 21. Ratiner, Y. A. 1998. New flagellin-specifying genes in some Escherichia coli strains. J. Bacteriol. 180:979–984. 22. Reeves, M. W., G. M. Evins, A. A. Heiba, B. D. Plikaytis, and J. J. Farmer III. 1989. Clonal nature of Salmonella typhi and its genetic relatedness to other salmonellae as shown by multilocus enzyme electrophoresis, and proposal of Salmonella bongori comb. nov. J. Clin. Microbiol. 27:313–320. 23. Silverman, M. 1979. Phase variation in Salmonella: genetic analysis of a recombinational switch. Proc. Natl. Acad. Sci. USA 76:391–395. 24. Smith, N. H. 1991. Molecular genetic basis for complex flagellar antigen expression in a triphasic serovar of Salmonella. Proc. Natl. Acad. Sci. USA 88:956–960. 25. Smith, N. H., P. Beltran, and R. K. Selander. 1990. Recombination of Salmonella phase 1 flagellin genes generates new serovars. J. Bacteriol. 172:2209–2216. 26. Umeda, M., and E. Ohtsubo. 1990. Mapping of insertion element IS30 in the Escherichia coli K12 chromosome. Mol. Gen. Genet. 222:317–322. 27. Vanegas, R. A. 1995. Molecular analyses of the phase-2 antigen complex 1,2,.. of Salmonella spp. J. Bacteriol. 177:3863–3864. 28. Wang, L., D. Rothemund, H. Curd, and P. R. Reeves. 2003. Species-wide variation in the Escherichia coli flagellin (H-antigen) gene. J. Bacteriol. 185:2936–2943. 29. Wei, L. N. 1985. Covalent structure of three phase-1 flagellar filament proteins of Salmonella. J. Mol. Biol. 186:791–803. 30. Zieg, J., M. Silverman, M. Hilmen, and M. Simon. 1977. Recombinational switch for gene expression. Science 196:170–172.