Advances in Understanding Bacterial Pathogenesis Gained from ...

149 downloads 621 Views 2MB Size Report
May 11, 2016 - Lower-cost, high-throughput NGS methods have made it possible to ... Advances in web-based, automated phylogeny building from sequence to ... sity in terms of host range and pathogenicity (Valdez et al., 2009). Historically ...
Cell Host & Microbe

Review Advances in Understanding Bacterial Pathogenesis Gained from Whole-Genome Sequencing and Phylogenetics Elizabeth Klemm1 and Gordon Dougan1,* 1Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK *Correspondence: [email protected] http://dx.doi.org/10.1016/j.chom.2016.04.015

The development of next-generation sequencing as a cost-effective technology has facilitated the analysis of bacterial population structure at a whole-genome level and at scale. From these data, phylogenic trees have been constructed that define population structures at a local, national, and global level, providing a framework for genetic analysis. Although still at an early stage, these approaches have yielded progress in several areas, including pathogen transmission mapping, the genetics of niche colonization and host adaptation, as well as gene-to-phenotype association studies. Antibiotic resistance has proven to be a major challenge in the early 21st century, and phylogenetic analyses have uncovered the dramatic effect that the use of antibiotics has had on shaping bacterial population structures. An update on insights into bacterial evolution from comparative genomics is provided in this review. Introduction As bacteria grow and replicate DNA, sequence errors and damage inevitably introduce changes into their genomes. Bacteria are also often promiscuous and can regularly exchange DNA using legitimate or illegitimate recombination mediated by mechanisms including transduction, transformation, or conjugation, bringing further changes. The accumulation of genomic alterations generates a phylogenetic signature over time within a bacterial population that can be used to interrogate recent evolution and decipher older evolutionary signatures representative of ancestral bacteria (Achtman, 2008; Pearson et al., 2009). The stability of the phylogenetic signature over time is influenced by selective pressures, and some DNA changes will be maintained in the population while others will disappear. Environmental and geographical isolation facilitates the independent evolution of distinct lineages that appear as branches on the phylogeny, facilitating comparative analysis within and between populations. Thus, phylogeny presents an opportunity to identify favorable characteristics within populations and to design hypothesis-driven experiments to determine mechanism and function. In the 1990s, Sanger sequencing was used to build high-quality, complete reference genomes for many important bacterial pathogens, with Haemophilus influenzae being the first (Fleischmann et al., 1995). These complete reference genomes were invaluable resources for further experimental work as they were often carefully annotated on a gene-by-gene basis using trained annotators. This work gave an indication of average genome size and organization and provided useful blueprints for functional analysis. The ability to build reference genomes has been accelerated by next-generation sequencing (NGS) approaches, such as the long-read Pacific Biosciences (PacBio) technology, and more automated annotation tools (Jeong et al., 2016). Lower-cost, high-throughput NGS methods have made it possible to sequence large numbers of isolates. These platforms, including Illumina, Roche 454, and Ion Torrent, produce short reads with high coverage of the genome. Down-

stream analyses use the raw, unassembled reads or draft genome assemblies. As costs decrease, sequencing entire collections of historical, global, and clinical isolates is now feasible and becoming routine. Comparative genome analysis based on larger numbers of isolates led to the recognition that bacterial genomes are organized into so-called ‘‘core’’ regions significantly conserved at a gene level within members of a species or lineage and an ‘‘accessory’’ genome that is more variable, highlighting regions of more frequent genetic exchange (Medini et al., 2005; Vernikos et al., 2015). Before whole-genome sequencing (WGS) data were readily available, our knowledge of bacterial phylogeny was low resolution, resulting in a fragmented and incomplete view of the relationships between pathogens. Early studies on microbial populations relied on the analysis of DNA fragments that were poorly representative of the whole phylogenetic signature. Nevertheless, DNA sequence-based approaches such as multilocus sequence typing (MLST) proved to be valuable tools for building preliminary phylogeny and for determining the genetic organization of complex microbial species (Maiden et al., 1998). However, many circulating bacterial pathogens are monophyletic in the sense that they are clonal and relatively highly conserved in terms of sequence diversity. This is because they emerged recently or were genetically isolated. Such pathogens are thus relatively intractable to MLST as an approach for building useful phylogenies. To overcome this barrier, an approach was required to identify rare mutations including single nucleotide polymorphisms (SNPs) and insertions or deletions (Indels) at a whole-genome level (Achtman, 2008; Pearson et al., 2009). Currently there are several different approaches to building phylogenies using WGS data. The first step is determining variation between the genomes, which can be achieved using whole-genome MLST (wgMLST) and SNP- or k-mer-based approaches. Each of these approaches has pros and cons in terms of resolution, data requirements (curated database and/or alignment to reference genome), and ability to deal with known Cell Host & Microbe 19, May 11, 2016 ª 2016 Elsevier Inc. 599

Cell Host & Microbe

Review confounders of phylogenetic analysis. Such confounders include repetitive regions, mobile elements, and recombination regions, which can introduce multiple linked mutations in a single event. wgMLST involves a gene-by-gene analysis, using a curated database to assign an allele designation to each gene (Maiden et al., 2013), and is much higher resolution than traditional MLST. However, SNP-based variant calling is currently the most widely used technique. Here, the reads are aligned to a closely related reference genome, and then the SNPs are detected. Tools have been developed for recognizing and stripping out putative recombination sites (Croucher et al., 2013b, 2015; Marttinen et al., 2012). SNP-based methods have the power to distinguish between closely related genomes that differ by only a few SNPs; this ability is important in facilitating local transmission studies or for detecting in-host evolution. In cases where a reference genome is unavailable or the dataset is very diverse, it is possible to perform SNP analysis on a core genome alignment (Page et al., 2015). Another reference-free method is the k-merbased approach, which does not require an alignment. In this method, short nucleotide ‘‘words’’ of sub-read length are extracted and compared to compute a distance matrix based on the quantity of shared k-mers (Chan and Ragan, 2013; Sims et al., 2009). Once the genetic variation between genomes has been determined, the next step is to construct a phylogenetic tree illustrating their evolutionary relationships. This can be done using several different tree-building methods, including maximum likelihood, Bayesian inference, or neighbor-joining. Advances in web-based, automated phylogeny building from sequence to tree are also being made (http://wgsa.net and Leekitcharoenphon et al., 2012). Improved phylogenetic analysis based on WGS has enabled studies on the geographical distribution of bacterial lineages and has facilitated the identification of emerging strains with distinct genotypes. We can now follow the evolution and transmission of bacteria globally (through human travel and migration), locally (within a hospital or community), and even within a single host (Croucher and Didelot, 2015; Didelot et al., 2016). In the modern era, antibiotics have had a big impact on bacterial populations, driving the emergence of dominant strains, and this evolution will have important implications for how we will manage future epidemics and design vaccines. In this review, we will highlight some of the different types of analyses that have been made feasible by NGS and whole-genome-based analysis and provide relevant examples covering different bacterial species (Figure 1). Bacterial Diversity Reflected in Phylogenies WGS has improved our ability to accurately assign phylogenetic relationships between the different clades within a species. We will use S. enterica as a case study in these advances because it is a genetically and phenotypically broad species that can show adaptation to multiple environments and hosts. This species has been of interest to evolutionary biologists for many years, as it displays considerable genetic and phenotypic diversity in terms of host range and pathogenicity (Valdez et al., 2009). Historically, isolates of the species were assigned to thousands of serovars using serological typing that exploited antigenically diverse surface markers such as flagella and lipopolysaccharide side chains (Grimont and Weill, 2007). This typing both facilitated 600 Cell Host & Microbe 19, May 11, 2016

the diagnosis of disease and provided some limited insight into the epidemiology of outbreaks. Analysis within subspecies enterica using MLST has been proposed as a DNA sequencebased alternative to serotyping (Achtman et al., 2012). However, some sequence types harbor more than one serovar, and serovars can be polyphyletic, distributed across dispersed sequence types. This discrepancy between serovar and phylogeny was previously underappreciated and compromised approaches to interrogate the epidemiology and pathogenicity of S. enterica. Thus, MLST analysis provided an early evolutionary framework, lacking in serotyping, as well as a basis for more detailed sequence-based phylogenetic analysis within particular lineages or serovars. Salmonella enterica serovar Typhi (S. Typhi) is a medically important serotype of S. enterica that is a cause of human typhoid or enteric fever. Early phenotypic and genetic analysis of S. Typhi indicated the serovar was monophyletic; this was later proven by MLST analysis and a sub-genomic SNP-based study on multiple S. Typhi collected from different geographical locations (Roumagnac et al., 2006). This study reported the first SNP-based phylogenetic tree of S. Typhi, defining a series of haplotypes harboring distinct SNP profiles. Although it had limited genomic resolution, the study identified a potentially expanding haplotype, H58, associated with multiple drug resistance (MDR) and provided evidence for a single common ancestor for all S. Typhi, indicating the disease crossed into humans once, perhaps at a time when we started to live in stable communities. Other phenotypically distinct S. Typhi, such as those that express the rare flagella variant z66 and are predominantly associated with the Indonesian archipelago, nicely clustered into haplotype H52, further validating the approach (Baker et al., 2007; Hatta et al., 2011). The Roumagnac et al., 2006 work prompted one of the very first studies on multiple pathogen isolates using a combination of Solexa/Illumina and 454 NGS technologies (Holt et al., 2008) that largely confirmed the early phylogeny but provided many more SNPs through which to type other S. Typhi isolates. Comparative analysis of the 20 sequenced genomes in this study provided limited evidence for purifying selection or recombination within the sequenced isolates, suggesting that an insular, small population was causing typhoid across the world. A number of homoplasies, mutations present on different terminal branches but not in the common ancestor, were identified, indicating that such mutations may offer a selective advantage, for example in antibiotic resistance. The analysis also supported a role for asymptomatic carriers as the reservoir of this pathogen, potentially by harboring largely non-replicating forms over time and geographical space. The lack of widespread antigenic variation in S. Typhi was not anticipated and was indicative that the pathogen invades its host using immune-evasion strategies, since further defined (Valdez et al., 2009). S. Typhi is not the only serovar of S. enterica associated with enteric fever. S. Paratyphi A causes a clinically similar disease and also exhibits strict restriction to the human host (McClelland et al., 2004). Recent phylogenetic analysis of S. Paratyphi A confirmed that these two host-restricted S. enterica serovars evolved independently from within the species S. enterica and that S. Paratyphi A, like S. Typhi, also emerged when a distinct common ancestor crossed into the human population, likely

Cell Host & Microbe

Review

Figure 1. Flow of Genetic Information from Whole-Genome Sequencing Data Complete reference genome for a representative isolate of a clade or species of bacteria provides the blueprint for general genome organization and a DNA sequence for comparison. Whole-genome sequencing data from multiple bacterial isolates allows comparative genomic analyses. The isolates can be from a restricted geographical region (hospital, country, etc.) or from a global dataset. Numerous analyses are possible, including phylogeny construction, pangenome analysis, determination of the effective population size, transmission mapping, identification of recombination hotspots, and genome-wide association studies (GWAS). Circular genome image adapted by permission from Macmillan Publishers Ltd (Parkhill et al., 2001); world map via Wikimedia Commons.

around 450 years ago (Zhou et al., 2014). However, phylogenetic interrogation of the population yielded little evidence of the emergence of a dominant MDR lineage, such as H58, within S. Paratyphi A. Indeed, MDR appears to be sporadic across the serovar, although it is often associated with the same IncHI1 plasmid type that is linked to MDR in S. Typhi, suggesting genetic exchange is occurring between the two serovars (Phan et al., 2009). Interestingly, comparative analysis of the genomes of S. Paratyphi A and S. Typhi indicates that approximately a quarter of their genomes may have originated from a common ancestor (Didelot et al., 2007). This is likely significant as proteins such as typhoid toxin are encoded on this shared region (Haghjoo and Gala´n, 2004). Analysis of the S. Paratyphi A genomes using different evolutionary models suggests some genes are under Darwinian selection but that most genetic changes are transient or offer a shorter-term advantage for the pathogen (Zhou et al., 2014). Thus, S. Paratyphi A may be persisting in the human population without significantly adapting virulence over a longer time period. In contrast to S. Typhi and S. Paratyphi A, isolates of serovar S. Typhimurium are regarded as exhibiting more genotypic and phenotypic diversity. Nevertheless, recent analysis has identified distinct phylogenetic lineages of S. Typhimurium that exhibit geographical restriction, host restriction, or host adaptation. In the past two decades, certain S. enterica lineages,

in particular isolates of S. Typhimurium and S. Enteritidis, have emerged as a major cause of invasive bacterial disease in sub-Saharan Africa, with multiple reports from many countries (Feasey et al., 2012; Kariuki et al., 2006). Since S. Typhimurium and S. Enteritidis are classically associated with self-limiting gastroenteritis, these data were significant. WGS and phylogenetic analysis identified two related lineages of a then relatively novel ST313 S. Typhimurium, not common outside of Africa, that were responsible for waves of disease across sub-Sahara (Kingsley et al., 2009; Okoro et al., 2012). The arrival of ST313 in a particular geographical region was associated temporally with the previous spread of HIV to the same region, linking the two diseases. Both ST313 lineages are MDR, but one wave is associated with resistance to chloramphenicol, a clinical drug of choice when the clade was spreading. Genome analysis of ST313 isolates provided evidence of partial genome degradation, and many genes degraded in ST313 are also degraded in S. Typhi and S. Paratyphi A, a finding that may support some degree of human adaptation. However, although ST313 was dominant in different parts of the region at different times, the ability of other S. enterica to cause invasive disease in the same populations indicates host factors, such as malnutrition and HIV and malaria infections, are likely the dominant drivers of susceptibility, not bacterial genotype. Cell Host & Microbe 19, May 11, 2016 601

Cell Host & Microbe

Review Lessons in Host Adaptation from Comparative Genomics Host adaptation or restriction is a classical phenotype of many human and animal pathogens. Host restriction is often accompanied by a niche change where the focus of infection moves from one tissue, e.g., the intestine, to another systemic tissue, such as the reticuloendothelial system. Host-restricted/adapted monophyletic clones can originate from within broad, generally promiscuous species or families. S. Typhi and S. Paratyphi A are examples of such clones. However, there are many others, including Mycobacterium leprae, an ancient clone originating from within the Mycobacterium (Singh et al., 2015); Bordetella pertussis, originating from within Bordetella bronchiseptica (Parkhill et al., 2003); and Yersinia pestis, originating from within Yersinia pseudotuberculosis (Achtman et al., 1999). The origins of these monophyletic clones can be traced phylogenetically, with clear evidence for gene acquisition and loss throughout the evolution of the clades. Gene acquisition can be pivotal in defining the pathogenic potential of the host-adapted clade. Examples include the acquisition of the virulence-associated Vi capsule and typhoidal toxin by S. Typhi. However, dramatic gene loss in the form of gene inactivation or pseudogene formation is now regarded as a classical signature of adaptation. Host- or niche-adapted pathogens such as Y. pestis, M. leprae, B. pertussis, Shigella, and S. Typhi all display wide-scale gene disruption that can involve hundreds of genes. Phylogenetic analysis indicates that gene loss is often ongoing in these populations, generating distinct lineages with different pseudogene composition. Particular classes of genes are often statistically more likely to be inactivated, such as genes associated with intestinal colonization and persistence in the case of host-adapted enteric bacteria. Three anaerobic metabolism pathways linked to cobalamin biosynthesis, tetrathionate respiration, and 1,2-propanediol metabolism are degraded in host-restricted Salmonella and Y. pestis. These pathways are known to confer a competitive advantage in the inflamed gut over commensal bacteria. Outside of the gut, these pathways are dispensable, and their loss indicates metabolic streamlining representative of niche restriction (Dougan and Baker, 2014; Winter et al., 2010). Interestingly, multiple genes classically associated with virulence can be inactivated, for example the type III secretion system effectors SseJ and SopA in S. Typhi. Although the inactivation of virulence-associated genes in a pathogen may appear counter-intuitive, such genes may normally contribute to survival in a broader range of hosts or in tissue systems that do not favor survival within the targeted host. Thus, specialization may favor targeted survival through refined pathogenesis. Y. pestis exhibits aspects of niche change, although it infects a range of hosts including fleas, rodents, and humans. A comprehensive study using whole-genome sequences for multiple Yersinia species revealed that Y. enterocolitica and Y. pseudotuberculosis acquired two key virulence determinants, the ail attachment and invasion locus and a type III secretion system encoded on the pYV plasmid in independent events (McNally et al., 2016; Reuter et al., 2014). Subsequently, Y. pestis emerged from the Y. pseudotuberculosis lineage. Comparison of the genomes of Y. pestis and other Yersinia revealed that the emergence of Y. pestis had involved gene acquisition as well as gene degradation (Chain et al., 2004). Approximately 602 Cell Host & Microbe 19, May 11, 2016

10% of the Y. pestis genome is functionally inactivated by ISI insertions, gene deletions, and SNP-based gene disruptions. Like S. Typhi, many genes in the metabolic pathways mentioned above are impaired, reflecting the shift from the human gut to the lymphatic system as the primary site of residence. Additional gene loss events facilitate the insect stage of the life cycle, including SNPs in transcription factors that promote biofilm formation in the flea foregut and SNPs that reversibly inactivate a gene that produces an insect toxin. The genomes of the human-restricted Shigella genus also harbor many more pseudogenes compared to their ancestor, Escherichia coli. These pseudogenes have impacted characteristics such as flagellin biosynthesis, phagosomal escape, and resistance to oxidative stress (Jin et al., 2002; Wei et al., 2003). A founding event of Shigella speciation was the acquisition of a Shigella virulence plasmid (The et al., 2016). However, the plasmid is variable and mosaic and appears to have been transmitted into E. coli of different backgrounds on different occasions, resulting in four phylogenetically distinct Shigella species (Yang et al., 2005). In addition to the acquisition of genes on the plasmid, Shigella species also harbor novel pathogenicity islands, likely acquired by horizontal gene transfer events, which encode toxins and other virulence-associated factors. Notably, S. dysenteriae I acquired the Stx Shiga toxin via a prophage. Similar phages have shaped the evolution of Enterohemorrhagic E. coli such as E. coli O157. Thus, a combination of gene acquisition and convergent genome degradation shaped the evolution of Shigella (Feng et al., 2011). Emergence of Dominant Antibiotic-Resistant Clades Arguably, the most significant phenotypic change in bacteria over the past 50 years has been the emergence of MDR, which has been relentless, often appearing relatively quickly after the introduction of a new antibiotic. Resistance determinants are often encoded on highly mobile DNA elements such as transposons and plasmids, and elements such as integrons have evolved that can capture resistance determinants and build multiple resistance loci (Phan et al., 2009). The level of resistance in clinically important bacteria has recently led to calls for new approaches to be taken for both the use and development of antibiotics (A˚rdal et al., 2016; Stanton, 2013). The appearance of multiple MDR in a particular microbial species or in a geographical location can appear to be random and spontaneous. However, phylogenetic analysis of microbial populations can be used to develop an understanding of how the MDR phenotype is evolving and spreading. As discussed above, early phylogenetic analysis identified the S. Typhi haplotype H58, which appeared to be expanding in comparison to other lineages, a finding confirmed by a recent comprehensive phylogenetic analysis of 2,000 global S. Typhi (Wong et al., 2015) (Figure 2A). Over 800 isolates within the study were H58, and many of these had been isolated within the past two decades. Although the MDR phenotype emerged elsewhere in the S. Typhi tree, there was no evidence that these lineages were spreading beyond relatively restricted geographical regions. In contrast, the H58 lineage has spread widely within Asia and has entered Africa on at least three separate occasions, subsequently spreading widely within East and Southern Africa (Figure 2B). Thus, MDR within S. Typhi is being driven by a single

Cell Host & Microbe

Review

Figure 2. Phylogenetic and Geographical Mapping of Dominant Lineages Associated with MDR (A) A global collection of S. Typhi isolates shows that H58 has become the dominant lineage. Outer-circle colors represent regions within continents (Wong et al., 2015; reprinted by permission from Macmillan Publishers Ltd: Nature Genetics, copyright 2015). (B) Inter-region transfers were inferred from metadata combined with maximum-likelihood phylogeny and suggest that South Asia was an early hub for H58 (Wong et al., 2015). (C) A global collection of M. tuberculosis isolates shows that the Beijing sub-lineage has become a dominant family (Coll et al., 2014). (D) Circle size indicates percentage of tuberculosis cases due to Beijing sub-lineage, and red color indicates associated drug resistance (European Concerted Action, 2006).

lineage rather than emerging across the tree. Interestingly, the 800 H58 isolates are separated from the nearest neighboring non-H58 cluster by only 151 SNPs, and the individual H58 isolates differed from each other by a median of just six SNPs, indicating rapid expansion of a genetically clonal population. Further analysis of MDR within H58 indicated that integrons are driving the MDR phenotype and that these are able to move from the classical IncHI1 plasmids normally associated with S. Typhi onto the chromosome. Fluoroquinolones have been the antibi-

otic of choice for S. Typhi in many areas over the past two decades, and resistance has emerged within H58 through mutations in gyrA and gyrB, targets for this class of antibiotic, and parA, which is involved in transport (Chau et al., 2007; Holt et al., 2012). Highly quinolone-resistant lineages of H58 S. Typhi have now emerged in India and Nepal that are compromising treatment and recovery (Pham Thanh et al., 2016). MDR was also associated with the emergence of the ST313 S. Typhimurium that spread across sub-Saharan Africa following Cell Host & Microbe 19, May 11, 2016 603

Cell Host & Microbe

Review the HIV epidemic. In these cases, the MDR was encoded by integrons on the so-called virulence-associated plasmid (Kingsley et al., 2009). This virulence plasmid encodes the spv operon that plays a key role in the systemic persistence of S. Typhimurium in experimental animals. The co-location of MDR and virulence determinants on the same replicon is particularly concerning as MDR could be potentially be indirectly selected for by the virulence phenotype and vice versa. MDR tuberculosis has reached epidemic levels in South Africa and parts of Eurasia, and phylogenetic analysis is being used to gain insight into this phenomenon. The dominant phylogenetic lineage that harbors MDR isolates in Eurasia belongs to the socalled Beijing sub-lineage, which can be divided into seven clonal complexes (Coll et al., 2014; European Concerted Action, 2006) (Figures 2C and 2D). The Beijing lineage emerged between 6,600 and 8,000 years ago (Comas et al., 2013; Merker et al., 2015) during the introduction of farming in Asia (European Concerted Action, 2006). Sharp population growth of the Beijing sub-lineage occurred at key periods in recent history, e.g., during the industrial revolution and the First World War (Merker et al., 2015). Interestingly, this analysis did not identify any obvious decrease in the population size of the Beijing lineage corresponding to the introduction of the BCG vaccine, but did see a reduction coinciding to the widespread usage of antibiotics in the 1960s. However, there has been a recent increase in population size, which is thought to reflect a combination of the rise in antibiotic-resistant tuberculosis, the HIV epidemic, and the collapse of the Soviet Union health system. It is important to note that the two clonal complexes associated with MDR have been expanding over thousands of years, but these have only started spreading epidemically in the last few decades. It has been proposed that the Beijing lineage has an increased capacity to acquire drug resistance determinants or compensatory mutations to mitigate the reduced fitness linked to resistance, which could have led to its emergence as a dominant MDR lineage (Casali et al., 2014; Ford et al., 2013). Phylogenetic analysis has shown that the seventh cholera pandemic, caused by Vibrio cholerae El Tor, can be mapped back to a common ancestor that emerged some 60 years ago (Chun et al., 2009; Spagnoletti et al., 2014; Orata et al., 2014). These V. cholerae include both O1 and O139 serotypes, are monophyletic, and have acquired the so-called pandemic genomic islands I or II, which may enhance intestinal colonization (Davies et al., 2012; Dziejman et al., 2002). Isolates within this lineage can be phylogenetically mapped back to waves of cholera that have spread from the Bay of Bengal, a region that thus serves as a reservoir (Mutreja et al., 2011). Interestingly, most recent isolates map into wave three, and these predominantly harbor the SXT/R391 integrative and conjugative element associated with MDR (Didelot et al., 2015; Dutilh et al., 2014; Eppinger et al., 2014; Marin et al., 2014; Spagnoletti et al., 2014). These SXT elements appear to have moved into V. cholerae on multiple occasions, indicating resistance as a driver of the disease. Importantly, antibiotics are not usually used to treat cholera so this selection is likely indirect, perhaps through exposure to environmental antibiotics rather than through targeted treatment. Hence the MDR phenotype is likely facilitating the persistence of V. cholerae rather than evasion of therapy. 604 Cell Host & Microbe 19, May 11, 2016

Clostridium difficile has emerged in the past 40 years as a serious problem in patients entering the heathcare systems of multiple countries (Didelot et al., 2012; Eyre et al., 2013). C. difficile can cause serious intestinal infections following prior treatment with antibiotics, and relapse of the disease is common. C. difficile isolates are often MDR and can persist as spores within the intestine and the environment, compounding the threat. Unlike MDR S. Typhi and V. cholerae, multiple C. difficile lineages have recently emerged with the potential to cause disease and resist treatment. Evidence is accumulating that C. difficile can move relatively readily between humans and animals without displaying obvious signatures of host adaptation. Nevertheless, different lineages of C. difficile may have different pathogenic potential. For example, isolates of lineage 027/B1 are more pathogenic for the mouse than other lineages, such as 017, and furthermore, 027 isolates are highly transmissible in comparison to many other lineages (Lawley et al., 2012). Phylogeographical analysis of C. difficile 027 indicates at least two major introductions into the human population, both in North America (He et al., 2013). These introductions were followed by a dramatic global spread of each lineage that can be traced through phylogenetic analysis across continents. Interestingly, the expansion of each of these lineages can be linked to the introduction of mutations mediating fluoroquinolone resistance in the early expansion phase, indicating antibiotic use as a key driver of spread. Evidence for Co-transmission with Human Migration The global distribution of different bacterial lineages can theoretically be mapped from the location of origin through paths of subsequent dissemination. As we begin to understand more about the evolution and migration of modern humans, the origin of particular diseases can potentially be mapped onto these data. Historically, human movement was much more limited, and thus early pathogen spread may be reflected in major migrations. Helicobacter pylori is a Gram-negative bacterium that infects the stomachs of 50% of the human population and is a risk factor for gastric cancer (Cover, 2016). Given that H. pylori tends to be transmitted within families and has high sequence diversity that is primarily due to recombination, it is has been used as a tool to trace human migration. The first human acquisition of H. pylori is predicted to have occurred 100,000 years ago, predating the first migration of humans out of Africa (Moodley et al., 2012). Based on MLST analysis of H. pylori isolates across different continents, six ancestral sources were identified that potentially gave rise to modern populations (Falush et al., 2003; Linz et al., 2007). Remarkably, human mtDNA haplotype distributions are generally replicated by the phylogeographical distribution of H. pylori, indicating an ancient relationship between pathogen and host (Moodley et al., 2012). The European subpopulation resulted from admixture between Asian and African H. pylori and thus indicates two migrations of humans and H. pylori out of Africa (Moodley et al., 2012). The timing of the second migration is unclear in light of the recent sequencing of H. pylori from a 5,300-yearold European glacier ‘‘Iceman,’’ with a genome almost purely derived from a H. pylori of predicted Asian origin (Maixner et al., 2016), although it is impossible to make generalizations with a single historical specimen.

Cell Host & Microbe

Review Further phylogenetic studies suggest that H. pylori followed two prehistoric human migrations in the Pacific, one of which reached New Guinea and Australia and a second more recent wave that moved via Taiwan to Melanesia and the Polynesian Islands (Moodley et al., 2009). These H. pylori populations have remained isolated and show founder effects from repeated bottlenecks associated with island hopping. Another study suggested that the Amerind H. pylori sub-population present in the Americas came from an Asian source involving humans that crossed the Bering Strait and spread across the continent (Moodley et al., 2012). Modern migration is also evident in the distribution of H. pylori in the Americas, with African ancestry brought in by the slave trade displacing the Amerind H. pylori subpopulation. Interestingly, there is evidence that the longerterm co-existence of H. pylori and human sub-populations has driven co-evolution, leading to modulation of disease. For instance, cross-mixing (that is, Amerindian human hosts infected with African H. pylori) may be linked to an increased risk of gastric disease (Kodaman et al., 2014). The spread of M. tuberculosis has also been linked to ancient human migration patterns. Like H. pylori, M. tuberculosis likely originated as a human pathogen in Africa prior to the out-ofAfrica migration. Coalescent analyses place the emergence of M. tuberculosis at 70,000 years ago (Comas et al., 2013). Two lineages remained in Africa while another lineage followed the outof-Africa migration and then branched into five lineages that spread to the rest of the world and subsequently back into Africa (Hershberg et al., 2008). Again, the mtDNA and M. tuberculosis phylogenies align with each other and show deeply branched lineages corresponding to the African source population in each tree (macrohaplogroup L3 in the mtDNA data and African lineages 5 and 6 in the M. tuberculosis data) (Comas et al., 2013). Furthermore, the prevalence of mtDNA haplogroups is statistically linked to specific M. tuberculosis lineages in the same country. In contrast to H. pylori, which is transmitted vertically through prolonged contact in families, M. tuberculosis is transmitted horizontally among populations via aerosol routes. Thus, changes in human population structure and societal organization will potentially impact the ability of M. tuberculosis to transmit. The advent of farming in the Neolithic Demographic Transition led to settlements with increased population densities compared to hunter-gatherer communities. Indeed, Bayesian analysis shows an expansion of M. tuberculosis population size 10,000 years ago, accompanying the advent of farming (Comas et al., 2013), and the modern lineages that expanded during this time may be better adapted as ‘‘crowd diseases’’ (Comas et al., 2013). Ancient lineages had to survive in dispersed host communities, so longer latency periods and moderated virulence were likely advantageous to prevent killing or disadvantaging the entire host community—a likely dead end for the pathogen (Gagneux, 2012). Modern lineages can exhibit increased host virulence (Reiling et al., 2013), which can be tolerated in areas of higher host density that are not a limiting factor for pathogen persistence. Recombination Hotspots Define Regions under Diversifying Selection Recombination in bacteria proceeds via the introduction of genetic material through mechanisms such as transduction, conju-

gation, or transformation. Some bacterial species, such as H. pylori, recombine quite readily and frequently, whereas relatively low rates of recombination are reported for other species, such as M. tuberculosis. Recombination can lead to the simultaneous transfer of multiple SNPs en bloc from the donor to the recipient. The mixture of vertically acquired point mutations and horizontally acquired regions can confound phylogenetic reconstruction. Thus, homoplasies associated with recombination should be identified and taken into account when building trees and to correctly calculate the mutation rate in highly recombinogenic bacteria. Diversification in the face of selective pressures leads to a selective sweep in which the fittest isolates survive. Several recent studies have highlighted how the identification of recombination hotspots can be used to understand immune-evasion strategies and acquisition of an MDR phenotype. Streptococcus pneumoniae is a highly recombinogenic human respiratory pathogen. Sequencing a global collection of isolates from the PMEN1 MDR lineage identified recombination hotspots at genes encoding immunogenic surface proteins (pspA, pspC, psrP) and the capsule biosynthesis (cps) locus (Croucher et al., 2011). These loci appear to be under diversifying selection and likely facilitate evasion of the human antibodymediated immune response. Antigenic variation can also help a bacterial population adapt to selective pressures driven by vaccination. Capsular polysaccharide conjugate vaccines have been introduced for the pneumococci in several countries, but these only provide protection to a subset of the circulating serotypes that are targeted by the vaccine. Following vaccination, the targeted serotypes are reduced or even eliminated, but since the capsule biosynthesis locus is a recombination hotspot, an existing sub-population within the lineage with a different capsule type can expand in its place (Brueggemann et al., 2007; Croucher et al., 2013a; Roca et al., 2015). A study of S. pneumoniae in a Thai refugee camp documented how recombination can also be important following clinical antimicrobial interventions (Chewapreecha et al., 2014a). The seven major pneumococcal lineages circulating in the camp exhibited different rates of recombination, and yet they shared many of the same hotspots. In addition to two hotspots found in the PMEN1 lineage, 4 of the 6 hotspots of recombination in the Thai isolates were genes associated with antibiotic resistance. These genes encoded either penicillin-binding proteins (encoded by phb1a, pbp2b, and pbp2x) that confer b-lactam resistance or a dihydrofolate reductase (folA) that confers resistance to cotrimoxazole. Another case of clonal replacement has been described for Neisseria meningitidis in the so-called meningitis belt of Africa. Here, the likely selective pressure may be natural herd immunity. N. meningitidis causes periodic epidemics that coincide with the dry season in this region. A single serotype of N. meningitidis is frequently responsible for several seasonal epidemics in a row before being replaced by another dominant serotype. Although herd immunity was believed to be a driving force for clonal replacement, until recently it was unclear what the herd immunity was targeting in the pathogen. Phylogenetic analysis identified recombination hotspots in different N. meningitidis from this region that harbored genes encoding several noncapsular cellsurface components such as the pilin glycosylation locus (pgl), Cell Host & Microbe 19, May 11, 2016 605

Cell Host & Microbe

Review adhesions (mafA and mafB), and a type IV pilus-regulating (pilU) locus (Lamelas et al., 2014). In these two examples, genes encoding surface-related structures were found to be encoded on recombination hotspots. In fact, this is a common theme that has been established in a study that compared recombination in ten different bacterial species (Yahara et al., 2016). Hotspots were found in all ten bacterial species, although the number and the length of the hotspots varied from species to species. The genes that were found in hotspots belonged to categories ‘‘surface structures,’’ ‘‘membranes, lipoproteins, and porins,’’ and ‘‘cell envelope.’’ In contrast, housekeeping genes such as those encoding ribosomal proteins are generally ‘‘cold’’ and have low nucleotide diversity. The rule of thumb seems to be that genes encoding surface structures show diversifying selection. An interesting exception to the rule is found in M. tuberculosis. The nucleotide diversity was calculated for experimentally verified T cell epitopes and was found to be lower than housekeeping genes (Comas et al., 2010). This was repeated in a larger strain collection and with an expanded list of predicted epitopes (Coscolla et al., 2015). The hyperconservation of the majority of T cell epitopes may indicate that it is actually advantageous for M. tuberculosis to be recognized by the host immune system. Indeed, T cell immunity mediates latency and contributes to the formation of lung cavities that promote aerosol transmission. Thus, by looking for both the presence and absence of antigenic variation at a whole-genome level, we can learn about how both the evasion and triggering of host immunity can be beneficial to different bacteria. Genome-wide Association Studies Genome-wide association studies (GWAS) have been used in human genetics to identify causative SNPs for various traits, and similar methods are being developed for bacteria (Chen and Shapiro, 2015; Falush and Bowden, 2006; Read and Massey, 2014). GWAS can be used to detect changes in the accessory genome, SNPs, and small indels that are responsible for a change in phenotype. A key advantage of GWAS is that they are unbiased and can be used to study a wide range of phenotypic differences. A clear measurable phenotype, adequate sample size, and correction for population structure must be considered when designing a GWAS study. Bacterial GWAS have been used to identify factors important for host range (Sheppard et al., 2013), virulence (Laabei et al., 2014, 2015), and antibiotic resistance (Alam et al., 2014; Chewapreecha et al., 2014b; Farhat et al., 2013; Salipante et al., 2015; Earle et al., 2016). Study sample sizes ranged from as few as 29 genomes to over 3,000. Antibiotic resistance is amenable to GWAS because it is relatively straightforward to assign a phenotype based on accepted clinical break points. While the identification of recombination hotspots proved very informative in the study of pneumococci in the Thai refugee camp, a follow-up GWAS was required to pinpoint the causal variants (Chewapreecha et al., 2014b). Since recombination delivers several SNPs at once, it is difficult to distinguish the functionally important SNPs from ‘‘hitchhikers’’ unless you have a large diverse sample set to break up the linkage disequilibrium. One ‘‘holy grail’’ in infectious disease research is identifying the determinants of disease manifestation in humans. What accounts for differences in disease severity? Will infection result 606 Cell Host & Microbe 19, May 11, 2016

in carriage or acute disease? This has been a difficult question to address, given that bacterial genetics, host microbiota, and host genetics likely all influence infection outcome. Several GWAS have focused on human genetic markers affecting susceptibility to bacterial infection (Chapman and Hill, 2012; Ko and Urban, 2013). Future GWAS studies incorporating genomic sequence data from both the host and pathogen will likely give a more complete picture of the interplay of the various factors. Within-Host Evolution WGS has enabled studies on evolution in bacterial populations derived from the same host over the course of a single infection. Depending on the mutation rate and the length of infection, the number of SNPs within such populations may be comparatively small. Thus, many within-host studies have involved long-term infections, including M. tuberculosis or chronic lung infections in cystic fibrosis (CF) patients (Didelot et al., 2016). Some of the genetic evolution that has been documented has been related to the development of antimicrobial resistance and/or host adaptation. It has been possible to catalog the evolution of extensively drug-resistant (XDR) M. tuberculosis in just 3.5 years in a single patient (Eldholm et al., 2014). The authors were able to correlate the timeline of different clinical interventions with the acquisitions of known SNPs conferring resistance to the antibiotics being administered. In some cases multiple resistance-conferring SNPs are observed early after a new drug regiment is begun, and then one of the SNPs is fixed in the population, reflecting the rapid expansion and collapse of different clones evolving in parallel. A retrospective study of a Burkholderia dolosa epidemic in CF patients identified a number of genes with multiple mutations, which was statistically unexpected (Lieberman et al., 2011). The genes that were preferentially disrupted fell into similar categories in the different hosts, inferring convergent evolution to a host-adapted phenotype. These genes belonged to categories involved with host-pathogen interactions, such as membrane synthesis, secretion, and antibiotic resistance. A longitudinal study of P. aeruginosa from several different CF patients infected with strains from different lineages showed convergent evolution in a common group of 52 genes (Marvig et al., 2015). Again the genes were enhanced for extracellular virulence factors and antibiotic resistance genes as well as transcription factors. Interestingly, a set of transcription factors acquired mutations in a shared ordered succession, whereby the upstream regulators were mutated before downstream regulators in the same regulatory network. Another study on P. aeruginosa in a single patient over 32 years showed that the original infecting strain simultaneously evolved into several distinct sub-lineages that were spatially separated in different parts of the lung (Markussen et al., 2014). Another unusual case involved a chronic bacteremic infection of hypermutating S. Enteritidis in an immunocompromised patient with an IL-12 receptor deficiency (Klemm et al., 2016). S. Enteritidis is normally associated with self-limiting gastroenteritis that is resolved in a short time, but in this patient the infection went systemic and lasted 16 years. The S. Enteritidis had a hypermutating phenotype due to an impaired mismatch repair system, and WGS analysis showed that the founding S. Enteritidis developed different lineages harboring a large number of distinct pseudogene sets. The level and pattern of genome degradation mirrored

Cell Host & Microbe

Review aspects of the host adaptation signature of S. Typhi, including degradation of genes encoding adhesins, anaerobic metabolism pathways, and virulence factors.

Insights into the evolution of Yersinia pestis through whole-genome comparison with Yersinia pseudotuberculosis. Proc. Natl. Acad. Sci. USA 101, 13826–13831.

Future Directions Although the field of microbial phylogenomics is relatively young, it is clear that it has the potential to add a new dimension to support work on host-pathogen interactions. By examining the detailed genetic structure of pathogen populations, it is possible to identify regions of the genome that are under selection or are driving evolutionary change. Clearly antibiotic usage is driving rapid evolution, almost at an unprecedented level, and we will need to search beyond simple resistance determinants to understand why some lineages are successful. One of the challenges will be to link genotype to phenotype, and we may need to consider specific mutations in the context of the whole bacterial genotype, perhaps by combining GWAS with more gene-targeted analysis. Also, our ability to measure fitness and competitiveness in real settings outside of laboratory presents challenges. Nevertheless, as more data accumulate through studies in different settings driven by public health-related investigations, further opportunities for analysis will likely emerge.

Chapman, S.J., and Hill, A.V. (2012). Human genetic susceptibility to infectious disease. Nat. Rev. Genet. 13, 175–188.

ACKNOWLEDGMENTS We would like to thank Robert Kingsley and Ewan Harrison for critical reading of the manuscript and Vanessa Wong for contributing to the figures. The work was supported by The Wellcome Trust.

Chan, C.X., and Ragan, M.A. (2013). Next-generation phylogenomics. Biol. Direct 8, 3.

Chau, T.T., Campbell, J.I., Galindo, C.M., Van Minh Hoang, N., Diep, T.S., Nga, T.T., Van Vinh Chau, N., Tuan, P.Q., Page, A.L., Ochiai, R.L., et al. (2007). Antimicrobial drug resistance of Salmonella enterica serovar typhi in asia and molecular mechanism of reduced susceptibility to the fluoroquinolones. Antimicrob. Agents Chemother. 51, 4315–4323. Chen, P.E., and Shapiro, B.J. (2015). The advent of genome-wide association studies for bacteria. Curr. Opin. Microbiol. 25, 17–24. Chewapreecha, C., Harris, S.R., Croucher, N.J., Turner, C., Marttinen, P., Cheng, L., Pessia, A., Aanensen, D.M., Mather, A.E., Page, A.J., et al. (2014a). Dense genomic sampling identifies highways of pneumococcal recombination. Nat. Genet. 46, 305–309. Chewapreecha, C., Marttinen, P., Croucher, N.J., Salter, S.J., Harris, S.R., Mather, A.E., Hanage, W.P., Goldblatt, D., Nosten, F.H., Turner, C., et al. (2014b). Comprehensive identification of single nucleotide polymorphisms associated with beta-lactam resistance within pneumococcal mosaic genes. PLoS Genet. 10, e1004547. Chun, J., Grim, C.J., Hasan, N.A., Lee, J.H., Choi, S.Y., Haley, B.J., Taviani, E., Jeon, Y.S., Kim, D.W., Lee, J.H., et al. (2009). Comparative genomics reveals mechanism for short-term and long-term clonal transitions in pandemic Vibrio cholerae. Proc. Natl. Acad. Sci. USA 106, 15442–15447. Coll, F., McNerney, R., Guerra-Assunc¸a˜o, J.A., Glynn, J.R., Perdiga˜o, J., Viveiros, M., Portugal, I., Pain, A., Martin, N., and Clark, T.G. (2014). A robust SNP barcode for typing Mycobacterium tuberculosis complex strains. Nat. Commun. 5, 4812. Comas, I., Chakravartti, J., Small, P.M., Galagan, J., Niemann, S., Kremer, K., Ernst, J.D., and Gagneux, S. (2010). Human T cell epitopes of Mycobacterium tuberculosis are evolutionarily hyperconserved. Nat. Genet. 42, 498–503.

REFERENCES Achtman, M. (2008). Evolution, population structure, and phylogeography of genetically monomorphic bacterial pathogens. Annu. Rev. Microbiol. 62, 53–70. Achtman, M., Zurth, K., Morelli, G., Torrea, G., Guiyoule, A., and Carniel, E. (1999). Yersinia pestis, the cause of plague, is a recently emerged clone of Yersinia pseudotuberculosis. Proc. Natl. Acad. Sci. USA 96, 14043–14048. Achtman, M., Wain, J., Weill, F.X., Nair, S., Zhou, Z., Sangal, V., Krauland, M.G., Hale, J.L., Harbottle, H., Uesbeck, A., et al.; S. Enterica MLST Study Group (2012). Multilocus sequence typing as a replacement for serotyping in Salmonella enterica. PLoS Pathog. 8, e1002776. Alam, M.T., Petit, R.A., 3rd, Crispell, E.K., Thornton, T.A., Conneely, K.N., Jiang, Y., Satola, S.W., and Read, T.D. (2014). Dissecting vancomycin-intermediate resistance in staphylococcus aureus using genome-wide association. Genome Biol. Evol. 6, 1174–1185. A˚rdal, C., Outterson, K., Hoffman, S.J., Ghafur, A., Sharland, M., Ranganathan, N., Smith, R., Zorzet, A., Cohn, J., Pittet, D., et al. (2016). International cooperation to improve access to and sustain effectiveness of antimicrobials. Lancet 387, 296–307.

Comas, I., Coscolla, M., Luo, T., Borrell, S., Holt, K.E., Kato-Maeda, M., Parkhill, J., Malla, B., Berg, S., Thwaites, G., et al. (2013). Out-of-Africa migration and Neolithic coexpansion of Mycobacterium tuberculosis with modern humans. Nat. Genet. 45, 1176–1182. Coscolla, M., Copin, R., Sutherland, J., Gehre, F., de Jong, B., Owolabi, O., Mbayo, G., Giardina, F., Ernst, J.D., and Gagneux, S. (2015). M. tuberculosis T Cell Epitope Analysis Reveals Paucity of Antigenic Variation and Identifies Rare Variable TB Antigens. Cell Host Microbe 18, 538–548. Cover, T.L. (2016). Helicobacter pylori Diversity and Gastric Cancer Risk. MBio 7, 7. Croucher, N.J., and Didelot, X. (2015). The application of genomics to tracing bacterial pathogen transmission. Curr. Opin. Microbiol. 23, 62–67. Croucher, N.J., Harris, S.R., Fraser, C., Quail, M.A., Burton, J., van der Linden, M., McGee, L., von Gottberg, A., Song, J.H., Ko, K.S., et al. (2011). Rapid pneumococcal evolution in response to clinical interventions. Science 331, 430–434. Croucher, N.J., Finkelstein, J.A., Pelton, S.I., Mitchell, P.K., Lee, G.M., Parkhill, J., Bentley, S.D., Hanage, W.P., and Lipsitch, M. (2013a). Population genomics of post-vaccine changes in pneumococcal epidemiology. Nat. Genet. 45, 656–663.

Baker, S., Hardy, J., Sanderson, K.E., Quail, M., Goodhead, I., Kingsley, R.A., Parkhill, J., Stocker, B., and Dougan, G. (2007). A novel linear plasmid mediates flagellar variation in Salmonella Typhi. PLoS Pathog. 3, e59.

Croucher, N.J., Harris, S.R., Grad, Y.H., and Hanage, W.P. (2013b). Bacterial genomes in epidemiology–present and future. Philos. Trans. R. Soc. Lond. B Biol. Sci. 368, 20120202.

Brueggemann, A.B., Pai, R., Crook, D.W., and Beall, B. (2007). Vaccine escape recombinants emerge after pneumococcal vaccination in the United States. PLoS Pathog. 3, e168.

Croucher, N.J., Page, A.J., Connor, T.R., Delaney, A.J., Keane, J.A., Bentley, S.D., Parkhill, J., and Harris, S.R. (2015). Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res. 43, e15.

Casali, N., Nikolayevskyy, V., Balabanova, Y., Harris, S.R., Ignatyeva, O., Kontsevaya, I., Corander, J., Bryant, J., Parkhill, J., Nejentsev, S., et al. (2014). Evolution and transmission of drug-resistant tuberculosis in a Russian population. Nat. Genet. 46, 279–286.

Davies, B.W., Bogard, R.W., Young, T.S., and Mekalanos, J.J. (2012). Coordinated regulation of accessory genetic elements produces cyclic di-nucleotides for V. cholerae virulence. Cell 149, 358–370.

Chain, P.S., Carniel, E., Larimer, F.W., Lamerdin, J., Stoutland, P.O., Regala, W.M., Georgescu, A.M., Vergez, L.M., Land, M.L., Motin, V.L., et al. (2004).

Didelot, X., Achtman, M., Parkhill, J., Thomson, N.R., and Falush, D. (2007). A bimodal pattern of relatedness between the Salmonella Paratyphi A and

Cell Host & Microbe 19, May 11, 2016 607

Cell Host & Microbe

Review Typhi genomes: convergence or divergence by homologous recombination? Genome Res. 17, 61–68.

stantial differences in the emergence of drug-resistant tuberculosis. Nat. Genet. 45, 784–790.

Didelot, X., Eyre, D.W., Cule, M., Ip, C.L., Ansari, M.A., Griffiths, D., Vaughan, A., O’Connor, L., Golubchik, T., Batty, E.M., et al. (2012). Microevolutionary analysis of Clostridium difficile genomes to investigate transmission. Genome Biol. 13, R118.

Gagneux, S. (2012). Host-pathogen coevolution in human tuberculosis. Philos. Trans. R. Soc. Lond. B Biol. Sci. 367, 850–859.

Didelot, X., Pang, B., Zhou, Z., McCann, A., Ni, P., Li, D., Achtman, M., and Kan, B. (2015). The role of China in the global spread of the current cholera pandemic. PLoS Genet. 11, e1005072. Didelot, X., Walker, A.S., Peto, T.E., Crook, D.W., and Wilson, D.J. (2016). Within-host evolution of bacterial pathogens. Nat. Rev. Microbiol. 14, 150–162.

Grimont, P.A.D., and Weill, F.-X. (2007). Antigenic Formulae of the Salmonella Serovars. WHO Collaborating Centre for Reference and Research on Salmonella, 9th edition. http://www.scacm.org/free/Antigenic%20Formulae%20of %20the%20Salmonella%20Serovars%202007%209th%20edition.pdf. Haghjoo, E., and Gala´n, J.E. (2004). Salmonella typhi encodes a functional cytolethal distending toxin that is delivered into host cells by a bacterial-internalization pathway. Proc. Natl. Acad. Sci. USA 101, 4614–4619.

Dougan, G., and Baker, S. (2014). Salmonella enterica serovar Typhi and the pathogenesis of typhoid fever. Annu. Rev. Microbiol. 68, 317–336.

Hatta, M., Sultan, A.R., Pastoor, R., and Smits, H.L. (2011). New flagellin gene for Salmonella enterica serovar Typhi from the East Indonesian archipelago. Am. J. Trop. Med. Hyg. 84, 429–434.

Dutilh, B.E., Thompson, C.C., Vicente, A.C., Marin, M.A., Lee, C., Silva, G.G., Schmieder, R., Andrade, B.G., Chimetto, L., Cuevas, D., et al. (2014). Comparative genomics of 274 Vibrio cholerae genomes reveals mobile functions structuring three niche dimensions. BMC Genomics 15, 654.

He, M., Miyajima, F., Roberts, P., Ellison, L., Pickard, D.J., Martin, M.J., Connor, T.R., Harris, S.R., Fairley, D., Bamford, K.B., et al. (2013). Emergence and global spread of epidemic healthcare-associated Clostridium difficile. Nat. Genet. 45, 109–113.

Dziejman, M., Balon, E., Boyd, D., Fraser, C.M., Heidelberg, J.F., and Mekalanos, J.J. (2002). Comparative genomic analysis of Vibrio cholerae: genes that correlate with cholera endemic and pandemic disease. Proc. Natl. Acad. Sci. USA 99, 1556–1561.

Hershberg, R., Lipatov, M., Small, P.M., Sheffer, H., Niemann, S., Homolka, S., Roach, J.C., Kremer, K., Petrov, D.A., Feldman, M.W., and Gagneux, S. (2008). High functional diversity in Mycobacterium tuberculosis driven by genetic drift and human demography. PLoS Biol. 6, e311.

Earle, S.G., Wu, C.-H., Charlesworth, J., Stoesser, N., Gordon, N.C., Walker, T.M., Spencer, C.C.A., Iqbal, Z., Clifton, D.A., Hopkins, K.L., et al. (2016). Identifying lineage effects when controlling for population structure improves power in bacterial association studies. arXiv, arXiv:1510.06863, http://arxiv.org/ abs/1510.06863.

Holt, K.E., Parkhill, J., Mazzoni, C.J., Roumagnac, P., Weill, F.X., Goodhead, I., Rance, R., Baker, S., Maskell, D.J., Wain, J., et al. (2008). High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi. Nat. Genet. 40, 987–993.

Eldholm, V., Norheim, G., von der Lippe, B., Kinander, W., Dahle, U.R., Caugant, D.A., Mannsa˚ker, T., Mengshoel, A.T., Dyrhol-Riise, A.M., and Balloux, F. (2014). Evolution of extensively drug-resistant Mycobacterium tuberculosis from a susceptible ancestor in a single patient. Genome Biol. 15, 490. Eppinger, M., Pearson, T., Koenig, S.S., Pearson, O., Hicks, N., Agrawal, S., Sanjar, F., Galens, K., Daugherty, S., Crabtree, J., et al. (2014). Genomic epidemiology of the Haitian cholera outbreak: a single introduction followed by rapid, extensive, and continued spread characterized the onset of the epidemic. MBio 5, e01721. European Concerted Action on New Generation Genetic Markers and Techniques for the Epidemiology and Control of Tuberculosis (2006). Beijing/W genotype Mycobacterium tuberculosis and drug resistance. Emerg. Infect. Dis. 12, 736–743. Eyre, D.W., Cule, M.L., Wilson, D.J., Griffiths, D., Vaughan, A., O’Connor, L., Ip, C.L., Golubchik, T., Batty, E.M., Finney, J.M., et al. (2013). Diverse sources of C. difficile infection identified on whole-genome sequencing. N. Engl. J. Med. 369, 1195–1205. Falush, D., and Bowden, R. (2006). Genome-wide association mapping in bacteria? Trends Microbiol. 14, 353–355. Falush, D., Wirth, T., Linz, B., Pritchard, J.K., Stephens, M., Kidd, M., Blaser, M.J., Graham, D.Y., Vacher, S., Perez-Perez, G.I., et al. (2003). Traces of human migrations in Helicobacter pylori populations. Science 299, 1582–1585. Farhat, M.R., Shapiro, B.J., Kieser, K.J., Sultana, R., Jacobson, K.R., Victor, T.C., Warren, R.M., Streicher, E.M., Calver, A., Sloutsky, A., et al. (2013). Genomic analysis identifies targets of convergent positive selection in drugresistant Mycobacterium tuberculosis. Nat. Genet. 45, 1183–1189. Feasey, N.A., Dougan, G., Kingsley, R.A., Heyderman, R.S., and Gordon, M.A. (2012). Invasive non-typhoidal salmonella disease: an emerging and neglected tropical disease in Africa. Lancet 379, 2489–2499. Feng, Y., Chen, Z., and Liu, S.L. (2011). Gene decay in Shigella as an incipient stage of host-adaptation. PLoS ONE 6, e27754. Fleischmann, R.D., Adams, M.D., White, O., Clayton, R.A., Kirkness, E.F., Kerlavage, A.R., Bult, C.J., Tomb, J.F., Dougherty, B.A., Merrick, J.M., et al. (1995). Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–512. Ford, C.B., Shah, R.R., Maeda, M.K., Gagneux, S., Murray, M.B., Cohen, T., Johnston, J.C., Gardy, J., Lipsitch, M., and Fortune, S.M. (2013). Mycobacterium tuberculosis mutation rate estimates from different lineages predict sub-

608 Cell Host & Microbe 19, May 11, 2016

Holt, K.E., Dutta, S., Manna, B., Bhattacharya, S.K., Bhaduri, B., Pickard, D.J., Ochiai, R.L., Ali, M., Clemens, J.D., and Dougan, G. (2012). High-resolution genotyping of the endemic Salmonella Typhi population during a Vi (typhoid) vaccination trial in Kolkata. PLoS Negl. Trop. Dis. 6, e1490. Jeong, H., Lee, D.H., Ryu, C.M., and Park, S.H. (2016). Toward Complete Bacterial Genome Sequencing Through the Combined Use of Multiple NextGeneration Sequencing Platforms. J. Microbiol. Biotechnol. 26, 207–212. Jin, Q., Yuan, Z., Xu, J., Wang, Y., Shen, Y., Lu, W., Wang, J., Liu, H., Yang, J., Yang, F., et al. (2002). Genome sequence of Shigella flexneri 2a: insights into pathogenicity through comparison with genomes of Escherichia coli K12 and O157. Nucleic Acids Res. 30, 4432–4441. Kariuki, S., Revathi, G., Kariuki, N., Kiiru, J., Mwituria, J., Muyodi, J., Githinji, J.W., Kagendo, D., Munyalo, A., and Hart, C.A. (2006). Invasive multidrugresistant non-typhoidal Salmonella infections in Africa: zoonotic or anthroponotic transmission? J. Med. Microbiol. 55, 585–591. Kingsley, R.A., Msefula, C.L., Thomson, N.R., Kariuki, S., Holt, K.E., Gordon, M.A., Harris, D., Clarke, L., Whitehead, S., Sangal, V., et al. (2009). Epidemic multiple drug resistant Salmonella Typhimurium causing invasive disease in sub-Saharan Africa have a distinct genotype. Genome Res. 19, 2279–2287. Klemm, E.J., Gkrania-Klotsas, E., Hadfield, J., Forbester, J.L., Harris, S.R., Hale, C., Heath, J.N., Wileman, T., Clare, S., Kane, L., et al. (2016). Emergence of host-adapted Salmonella Enteritidis through rapid evolution in an immunocompromised host. Nature Microbiology 1, 15023. Ko, D.C., and Urban, T.J. (2013). Understanding human variation in infectious disease susceptibility through clinical and cellular GWAS. PLoS Pathog. 9, e1003424. Kodaman, N., Pazos, A., Schneider, B.G., Piazuelo, M.B., Mera, R., Sobota, R.S., Sicinschi, L.A., Shaffer, C.L., Romero-Gallo, J., de Sablet, T., et al. (2014). Human and Helicobacter pylori coevolution shapes the risk of gastric disease. Proc. Natl. Acad. Sci. USA 111, 1455–1460. Laabei, M., Recker, M., Rudkin, J.K., Aldeljawi, M., Gulay, Z., Sloan, T.J., Williams, P., Endres, J.L., Bayles, K.W., Fey, P.D., et al. (2014). Predicting the virulence of MRSA from its genome sequence. Genome Res. 24, 839–849. Laabei, M., Uhlemann, A.C., Lowy, F.D., Austin, E.D., Yokoyama, M., Ouadi, K., Feil, E., Thorpe, H.A., Williams, B., Perkins, M., et al. (2015). Evolutionary Trade-Offs Underlie the Multi-faceted Virulence of Staphylococcus aureus. PLoS Biol. 13, e1002229. Lamelas, A., Harris, S.R., Ro¨ltgen, K., Dangy, J.P., Hauser, J., Kingsley, R.A., Connor, T.R., Sie, A., Hodgson, A., Dougan, G., et al. (2014). Emergence of a

Cell Host & Microbe

Review new epidemic Neisseria meningitidis serogroup A Clone in the African meningitis belt: high-resolution picture of genomic changes that mediate immune evasion. MBio 5, e01974–e14. Lawley, T.D., Clare, S., Walker, A.W., Stares, M.D., Connor, T.R., Raisen, C., Goulding, D., Rad, R., Schreiber, F., Brandt, C., et al. (2012). Targeted restoration of the intestinal microbiota with a simple, defined bacteriotherapy resolves relapsing Clostridium difficile disease in mice. PLoS Pathog. 8, e1002995. Leekitcharoenphon, P., Kaas, R.S., Thomsen, M.C., Friis, C., Rasmussen, S., and Aarestrup, F.M. (2012). snpTree–a web-server to identify and construct SNP trees from whole genome sequence data. BMC Genomics 13 (Suppl 7 ), S6. Lieberman, T.D., Michel, J.B., Aingaran, M., Potter-Bynoe, G., Roux, D., Davis, M.R., Jr., Skurnik, D., Leiby, N., LiPuma, J.J., Goldberg, J.B., et al. (2011). Parallel bacterial evolution within multiple patients identifies candidate pathogenicity genes. Nat. Genet. 43, 1275–1280. Linz, B., Balloux, F., Moodley, Y., Manica, A., Liu, H., Roumagnac, P., Falush, D., Stamer, C., Prugnolle, F., van der Merwe, S.W., et al. (2007). An African origin for the intimate association between humans and Helicobacter pylori. Nature 445, 915–918. Maiden, M.C., Bygraves, J.A., Feil, E., Morelli, G., Russell, J.E., Urwin, R., Zhang, Q., Zhou, J., Zurth, K., Caugant, D.A., et al. (1998). Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc. Natl. Acad. Sci. USA 95, 3140–3145. Maiden, M.C., Jansen van Rensburg, M.J., Bray, J.E., Earle, S.G., Ford, S.A., Jolley, K.A., and McCarthy, N.D. (2013). MLST revisited: the gene-by-gene approach to bacterial genomics. Nat. Rev. Microbiol. 11, 728–736. Maixner, F., Krause-Kyora, B., Turaev, D., Herbig, A., Hoopmann, M.R., Hallows, J.L., Kusebauch, U., Vigl, E.E., Malfertheiner, P., Megraud, F., et al. (2016). The 5300-year-old Helicobacter pylori genome of the Iceman. Science 351, 162–165. Marin, M.A., Fonseca, E.L., Andrade, B.N., Cabral, A.C., and Vicente, A.C. (2014). Worldwide occurrence of integrative conjugative element encoding multidrug resistance determinants in epidemic Vibrio cholerae O1. PLoS ONE 9, e108728. Markussen, T., Marvig, R.L., Go´mez-Lozano, M., Aanæs, K., Burleigh, A.E., Høiby, N., Johansen, H.K., Molin, S., and Jelsbak, L. (2014). Environmental heterogeneity drives within-host diversification and evolution of Pseudomonas aeruginosa. MBio 5, e01592–e14. Marttinen, P., Hanage, W.P., Croucher, N.J., Connor, T.R., Harris, S.R., Bentley, S.D., and Corander, J. (2012). Detection of recombination events in bacterial genomes from large population samples. Nucleic Acids Res. 40, e6. Marvig, R.L., Sommer, L.M., Molin, S., and Johansen, H.K. (2015). Convergent evolution and adaptation of Pseudomonas aeruginosa within patients with cystic fibrosis. Nat. Genet. 47, 57–64. McClelland, M., Sanderson, K.E., Clifton, S.W., Latreille, P., Porwollik, S., Sabo, A., Meyer, R., Bieri, T., Ozersky, P., McLellan, M., et al. (2004). Comparison of genome degradation in Paratyphi A and Typhi, human-restricted serovars of Salmonella enterica that cause typhoid. Nat. Genet. 36, 1268–1274. McNally, A., Thomson, N.R., Reuter, S., and Wren, B.W. (2016). ‘Add, stir and reduce’: Yersinia spp. as model bacteria for pathogen evolution. Nat. Rev. Microbiol. 14, 177–190. Medini, D., Donati, C., Tettelin, H., Masignani, V., and Rappuoli, R. (2005). The microbial pan-genome. Curr. Opin. Genet. Dev. 15, 589–594. Merker, M., Blin, C., Mona, S., Duforet-Frebourg, N., Lecher, S., Willery, E., Blum, M.G., Ru¨sch-Gerdes, S., Mokrousov, I., Aleksic, E., et al. (2015). Evolutionary history and global spread of the Mycobacterium tuberculosis Beijing lineage. Nat. Genet. 47, 242–249. Moodley, Y., Linz, B., Yamaoka, Y., Windsor, H.M., Breurec, S., Wu, J.Y., Maady, A., Bernho¨ft, S., Thiberge, J.M., Phuanukoonnon, S., et al. (2009). The peopling of the Pacific from a bacterial perspective. Science 323, 527–530. Moodley, Y., Linz, B., Bond, R.P., Nieuwoudt, M., Soodyall, H., Schlebusch, C.M., Bernho¨ft, S., Hale, J., Suerbaum, S., Mugisha, L., et al. (2012). Age of the association between Helicobacter pylori and man. PLoS Pathog. 8, e1002693.

Mutreja, A., Kim, D.W., Thomson, N.R., Connor, T.R., Lee, J.H., Kariuki, S., Croucher, N.J., Choi, S.Y., Harris, S.R., Lebens, M., et al. (2011). Evidence for several waves of global transmission in the seventh cholera pandemic. Nature 477, 462–465. Okoro, C.K., Kingsley, R.A., Connor, T.R., Harris, S.R., Parry, C.M., Al-Mashhadani, M.N., Kariuki, S., Msefula, C.L., Gordon, M.A., de Pinna, E., et al. (2012). Intracontinental spread of human invasive Salmonella Typhimurium pathovariants in sub-Saharan Africa. Nat. Genet. 44, 1215–1221. Orata, F.D., Keim, P.S., and Boucher, Y. (2014). The 2010 cholera outbreak in Haiti: how science solved a controversy. PLoS Pathog. 10, e1003967. Page, A.J., Cummins, C.A., Hunt, M., Wong, V.K., Reuter, S., Holden, M.T., Fookes, M., Falush, D., Keane, J.A., and Parkhill, J. (2015). Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31, 3691–3693. Parkhill, J., Dougan, G., James, K.D., Thomson, N.R., Pickard, D., Wain, J., Churcher, C., Mungall, K.L., Bentley, S.D., Holden, M.T.G., et al. (2001). Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18. Nature 413, 848–852. Parkhill, J., Sebaihia, M., Preston, A., Murphy, L.D., Thomson, N., Harris, D.E., Holden, M.T., Churcher, C.M., Bentley, S.D., Mungall, K.L., et al. (2003). Comparative analysis of the genome sequences of Bordetella pertussis, Bordetella parapertussis and Bordetella bronchiseptica. Nat. Genet. 35, 32–40. Pearson, T., Okinaka, R.T., Foster, J.T., and Keim, P. (2009). Phylogenetic understanding of clonal populations in an era of whole genome sequencing. Infect. Genet. Evol. 9, 1010–1019. Pham Thanh, D., Karkey, A., Dongol, S., Ho Thi, N., Thompson, C.N., Rabaa, M.A., Arjyal, A., Holt, K.E., Wong, V., Tran Vu Thieu, N., et al. (2016). A novel ciprofloxacin-resistant subclade of H58 Salmonella Typhi is associated with fluoroquinolone treatment failure. eLife 5, e14003. Phan, M.D., Kidgell, C., Nair, S., Holt, K.E., Turner, A.K., Hinds, J., Butcher, P., Cooke, F.J., Thomson, N.R., Titball, R., et al. (2009). Variation in Salmonella enterica serovar typhi IncHI1 plasmids during the global spread of resistant typhoid fever. Antimicrob. Agents Chemother. 53, 716–727. Read, T.D., and Massey, R.C. (2014). Characterizing the genetic basis of bacterial phenotypes using genome-wide association studies: a new direction for bacteriology. Genome Med. 6, 109. Reiling, N., Homolka, S., Walter, K., Brandenburg, J., Niwinski, L., Ernst, M., Herzmann, C., Lange, C., Diel, R., Ehlers, S., and Niemann, S. (2013). Cladespecific virulence patterns of Mycobacterium tuberculosis complex strains in human primary macrophages and aerogenically infected mice. MBio 4, 4. Reuter, S., Connor, T.R., Barquist, L., Walker, D., Feltwell, T., Harris, S.R., Fookes, M., Hall, M.E., Petty, N.K., Fuchs, T.M., et al. (2014). Parallel independent evolution of pathogenicity within the genus Yersinia. Proc. Natl. Acad. Sci. USA 111, 6768–6773. Roca, A., Bojang, A., Bottomley, C., Gladstone, R.A., Adetifa, J.U., Egere, U., Burr, S., Antonio, M., Bentley, S., Kampmann, B., et al.; Pneumo13 Study Group (2015). Effect on nasopharyngeal pneumococcal carriage of replacing PCV7 with PCV13 in the Expanded Programme of Immunization in The Gambia. Vaccine 33, 7144–7151. Roumagnac, P., Weill, F.X., Dolecek, C., Baker, S., Brisse, S., Chinh, N.T., Le, T.A., Acosta, C.J., Farrar, J., Dougan, G., and Achtman, M. (2006). Evolutionary history of Salmonella typhi. Science 314, 1301–1304. Salipante, S.J., Roach, D.J., Kitzman, J.O., Snyder, M.W., Stackhouse, B., Butler-Wu, S.M., Lee, C., Cookson, B.T., and Shendure, J. (2015). Large-scale genomic sequencing of extraintestinal pathogenic Escherichia coli strains. Genome Res. 25, 119–128. Sheppard, S.K., Didelot, X., Meric, G., Torralbo, A., Jolley, K.A., Kelly, D.J., Bentley, S.D., Maiden, M.C., Parkhill, J., and Falush, D. (2013). Genomewide association study identifies vitamin B5 biosynthesis as a host specificity factor in Campylobacter. Proc. Natl. Acad. Sci. USA 110, 11923–11927. Sims, G.E., Jun, S.R., Wu, G.A., and Kim, S.H. (2009). Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proc. Natl. Acad. Sci. USA 106, 2677–2682. Singh, P., Benjak, A., Schuenemann, V.J., Herbig, A., Avanzi, C., Busso, P., Nieselt, K., Krause, J., Vera-Cabrera, L., and Cole, S.T. (2015). Insight into

Cell Host & Microbe 19, May 11, 2016 609

Cell Host & Microbe

Review the evolution and origin of leprosy bacilli from the genome sequence of Mycobacterium lepromatosis. Proc. Natl. Acad. Sci. USA 112, 4459–4464. Spagnoletti, M., Ceccarelli, D., Rieux, A., Fondi, M., Taviani, E., Fani, R., Colombo, M.M., Colwell, R.R., and Balloux, F. (2014). Acquisition and evolution of SXT-R391 integrative conjugative elements in the seventh-pandemic Vibrio cholerae lineage. MBio 5, 5. Stanton, T.B. (2013). A call for antibiotic alternatives research. Trends Microbiol. 21, 111–113. The, H.C., Thanh, D.P., Holt, K.E., Thomson, N.R., and Baker, S. (2016). The genomic signatures of Shigella evolution, adaptation and geographical spread. Nat. Rev. Microbiol. 14, 235–250. Valdez, Y., Ferreira, R.B., and Finlay, B.B. (2009). Molecular mechanisms of Salmonella virulence and host resistance. Curr. Top. Microbiol. Immunol. 337, 93–127. Vernikos, G., Medini, D., Riley, D.R., and Tettelin, H. (2015). Ten years of pangenome analyses. Curr. Opin. Microbiol. 23, 148–154. Wei, J., Goldberg, M.B., Burland, V., Venkatesan, M.M., Deng, W., Fournier, G., Mayhew, G.F., Plunkett, G., 3rd, Rose, D.J., Darling, A., et al. (2003). Complete genome sequence and comparative genomics of Shigella flexneri serotype 2a strain 2457T. Infect. Immun. 71, 2775–2786.

610 Cell Host & Microbe 19, May 11, 2016

Winter, S.E., Thiennimitr, P., Winter, M.G., Butler, B.P., Huseby, D.L., Crawford, R.W., Russell, J.M., Bevins, C.L., Adams, L.G., Tsolis, R.M., et al. (2010). Gut inflammation provides a respiratory electron acceptor for Salmonella. Nature 467, 426–429. Wong, V.K., Baker, S., Pickard, D.J., Parkhill, J., Page, A.J., Feasey, N.A., Kingsley, R.A., Thomson, N.R., Keane, J.A., Weill, F.X., et al. (2015). Phylogeographical analysis of the dominant multidrug-resistant H58 clade of Salmonella Typhi identifies inter- and intracontinental transmission events. Nat. Genet. 47, 632–639. Yahara, K., Didelot, X., Jolley, K.A., Kobayashi, I., Maiden, M.C., Sheppard, S.K., and Falush, D. (2016). The Landscape of Realized Homologous Recombination in Pathogenic Bacteria. Mol. Biol. Evol. 33, 456–471. Yang, F., Yang, J., Zhang, X., Chen, L., Jiang, Y., Yan, Y., Tang, X., Wang, J., Xiong, Z., Dong, J., et al. (2005). Genome dynamics and diversity of Shigella species, the etiologic agents of bacillary dysentery. Nucleic Acids Res. 33, 6445–6458. Zhou, Z., McCann, A., Weill, F.X., Blin, C., Nair, S., Wain, J., Dougan, G., and Achtman, M. (2014). Transient Darwinian selection in Salmonella enterica serovar Paratyphi A during 450 years of global spread of enteric fever. Proc. Natl. Acad. Sci. USA 111, 12199–12204.