Supplementary Information for - PNAS

5 downloads 0 Views 2MB Size Report
II.) Incoming phosphoramidite, dissolved in 0.055 M ..... different spiking levels of nucleoside-phosphoramidite monomers: 0.1, 0.5, 1, 2 % spiking, and.
Supplementary Information for Directed evolution of multiple genomic loci allows the prediction of antibiotic resistance Ákos Nyerges1*, Bálint Csörgő1,6, Gábor Draskovits1, Bálint Kintses1, Petra Szili1, Györgyi Ferenc2, Tamás Révész1, Eszter Ari1,5, István Nagy3,4, Balázs Bálint3, Bálint Márk Vásárhelyi3, Péter Bihari3, Mónika Számel1, Dávid Balogh1, Henrietta Papp1, Dorottya Kalapis1, Balázs Papp1, Csaba Pál1*

1

Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre of

the Hungarian Academy of Sciences, Szeged, Hungary; 2

Nucleic Acid Synthesis Laboratory, Institute of Biochemistry, Biological Research Centre of

the Hungarian Academy of Sciences, Szeged, Hungary; 3

SeqOmics Biotechnology Ltd., Mórahalom, Hungary

4

Sequencing Platform, Institute of Biochemistry, Biological Research Centre of the Hungarian

Academy of Sciences, Szeged, Hungary; 5

Department of Genetics, Eötvös Loránd University, Budapest, Hungary

6

Current affiliation: Department of Microbiology and Immunology, University of California, San

Francisco, USA

*Correspondence should be addressed to Á.N. ([email protected]) or C.P. ([email protected]).

This PDF file includes: Supplementary Materials and Methods Figs. S1 to S6 Tables S1 to S6 References for SI reference citations Other supplementary materials for this manuscript include the following: Supplementary Datasets S1 to S7 www.pnas.org/cgi/doi/10.1073/pnas.1801646115

SI Materials and Methods Strains and oligonucleotides All strains and sequences of all synthetic DNA oligonucleotides (oligos) are listed in Supplementary Dataset S3. PCR primers were obtained from Integrated DNA Technologies with standard desalting. High-Throughput (HT) sequencing primers were purified with highperformance liquid chromatography (HPLC). DIvERGE oligonucleotides DIvERGE oligos were designed to have minimized secondary structure (>-12 kcal/mol) and targeted the replicating lagging strand. For the targeted mutagenesis of the landing pad, two sets of DNA oligos (TETRM1, TETRM3) with equimolar spiking levels of 0, 0.1, 0.5, 2, 5, 10, and 20% were synthesized. For targeted mutagenesis of folA, DIvERGE oligos were designed to cover the entire locus in E. coli K-12 MG1655, E. coli CFT073, and Salmonella enterica subsp. serovar Typhimurium str. LT2 in which mutations have been previously observed to confer trimethoprim resistance (Supplementary Dataset S2). In order to mutagenize gyrA in E. coli K-12, DIvERGE oligos were designed to cover the entire protein-coding sequence and the corresponding upstream promoter sequence. In vivo ssDNA processing results in the removal of the 9 terminal mismatching nucleobases during the course of integration and subsequent DNA repair, therefore an overlap of 9 nucleobases in both directions ensured that all targeted genomic positions had an equally high probability of mutagenesis. Synthesis of the spiked DIvERGE and reference oligos was performed on an ABI3900 DNA synthesizer (Applied Biosystems), according to a modified phosphoramidite chemistrybased protocol. Besides custom synthesis, soft-randomized oligos are also obtainable from commercial vendors, e.g. Eurofins Genomics GmbH at a modest price of 0.17 USD / nucleotide. Controlled pore glass (CPG) was used as a solid support and the following synthesis cycles were applied: I.) Deprotection was achieved with 3% (w/v) trichloroacetic acid (TCA) in dichloromethane (DCM). II.) Incoming phosphoramidite, dissolved in 0.055 M

concentration in anhydrous acetonitrile and premixed with the other three amidites in the defined spiking ratio, was coupled by activation with 5-ethylthio-1H-tetrazole. III.) Capping was done with 10% (v/v) acetic anhydride in anhydrous tetrahydrofuran (THF) and 16% (v/v) Nmethyl-imidazole and 10% (v/v) pyridine containing anhydrous THF solution. IV.) The oxidation step was accomplished with iodine (5 g per Liter of pyridine: water: THF = 0.5:2:97.5 mixture). Cycles were repeated until the 90th position and DNA strands were cleaved from the solid support with concentrated ammonia. Crude oligos were purified by high-performance liquid chromatography (HPLC) on a Shimadzu ADVP-10 HPLC system. After concentration from HPLC fractions, the dimethoxytrityl (5´-DMTr) protecting group was removed using a PolyPak column (Glen Research) according to the manufacturer’s protocol. Following elution and subsequent lyophilization, purified oligos were resuspended in 1x TE buffer (pH 8.0) from Integrated DNA Technologies. Sequences of all DIvERGE oligos are listed in Supplementary Dataset S3.

DIvERGE cycling process DIvERGE in E. coli K-12 MG1655, E. coli UPEC CFT073, Salmonella enterica LT2, and Citrobacter freundii ATCC 8090 was performed based on the previously described pORTMAGE3 (Addgene plasmid #72678)) protocol(1) with the corresponding spiked oligos (see Supplementary Dataset S3). Briefly, cells were subjected to pORTMAGE cycles in 10 ml culture volumes of Lysogeny-Broth-Lennox (LBL) media (10 g of tryptone, 5 g of yeast extract, 5 g of sodium chloride per 1 L of water) with 50 g/ml kanamycin. 40 l of the electrocompetent cell suspension was mixed with the DIvERGE oligos at a 2.5 M final concentration. Electroporation was done on a BTX (Harvard Apparatus) CM-630 Exponential Decay Wave Electroporation System in 1 mm gap electroporation cuvettes (1.8 kV, 200 , 25 F). Immediately after electroporation, cells were suspended in 5 ml TB media (24 g yeast extract,

12 g tryptone, 9.4 g K2HPO4, and 2 g KH2PO4 per 1 L of water) to allow for cells to recover. The cells could later be transferred to larger volumes for further growth and cycling. Multiplex iterative DIvERGE cycles were carried out to mutagenize either folA or gyrA by equimolarly mixing each set of spiked oligos, covering the entire target region. 4 l of these oligo mixtures were electroporated into competent cells which were then suspended in 5 ml fresh TB media to allow for recovery at 30°C for 1 hour under continuous agitation. An extra 5 ml LBL media was then added along with kanamycin to maintain pORTMAGE3. At this point, cells were either subjected to additional pORTMAGE cycles by growing to mid-log phase and preparing electrocompetent cells again or allowed to reach stationary phase and aliquoted into 1 ml portions to which 0.5 ml 50% glycerol was added. The aliquoted samples were then frozen and stored at -80°C for subsequent phenotypic or genotypic analysis. A single round of multiplex DIvERGE cycle was carried out to simultaneously mutagenize gyrA, gyrB, parE and parC by equimolarly mixing 130 individual DIvERGE oligos. Separately, 4 l of these oligo mixtures were electroporated into E. coli K-12 MG1655 cells in 10 parallel replicates and the resulting cell libraries were combined in 50 ml fresh TB media. Following recovery, cells were diluted by the addition of 50 ml LBL media and allowed to reach stationary phase at 30°C under continuous agitation. Aliquoted samples were then frozen and stored at -80°C. Library generation experiments were performed in triplicates. To assess the single step mutational landscape of folA, scanning DIvERGE was carried out by introducing each of the 8 folA-targeting oligos separately in triplicates. Following the addition of 1 ml TB recovery media, 300 l of each of the individual populations were combined for all replicates, separately and supplemented with additional TB for a final volume of 5 ml. These combined populations were then allowed to recover for 1 hour at 30°C, after which 5 ml LB media was added. The combined populations were then grown to stationary phase overnight at 30°C. Aliquots were then prepared and stored in the aforementioned manner. This approach permits the mutational scanning of the entire folA, with every nucleotide position mutant statistically present in the population after only 1 cycle.

In order to measure the integration dynamics in different species of oligos spiked at various levels, TETRM1 and TETRM3 oligos spiked at all positions at 0.1%, 0.5%, 1%, 2%, 5%, 10% and 20% levels were targeted to the landing pad sequence integrated in the genomes of E. coli K-12 MG1655, S. enterica LT2, and C. freundii ATCC 8090. A total of 5 consecutive DIvERGE cycles were carried out with all different spiked oligos. After the final cycles, populations were frozen and stored in the manner described above. From the aliquoted frozen populations recovered after each DIvERGE cycle, genomic DNA was extracted from ~2 × 109 cells using GenElute™ Bacterial Genomic DNA kit (Sigma-Aldrich).

Determination of mutation frequencies To determine mutation frequencies in E. coli K-12 MG1655 at the targeted folA and at the untargeted rpoB loci, the 5-cycle DIvERGE populations were assessed for resistance to trimethoprim and rifampicin, respectively. For this assay, a wild-type E. coli K-12 MG1655 starter culture was grown overnight at 30°C in LBL and diluted 1000-fold into 12 parallel samples in 1 ml LBL media. Cultures were grown overnight at 30°C. As a next step, 1 ml samples were harvested from the overnight grown cultures and from a 5-cycle folAmutagenized DIvERGE population and plated onto LBL agar plates containing 100 g/ml rifampicin. Plates were grown at 30°C for 48 hours. Total cell numbers were determined by plating appropriate dilutions onto LBL plates and growing overnight at 30°C. Mutation frequencies were determined by the fraction of average resistant cells from the average total cell number. To assess folA mutation frequencies, the fraction of resistant cells to different concentrations of trimethoprim was determined for the 5-cycle DIvERGE populations and for the wild-type K-12 MG1655 populations. The frozen samples of each strain were thawed, harvested and subsequently washed 3 times in 1 ml minimal salts (MS) media. Resistant cell numbers were determined at 4, 11, 67 and 267-times the wild-type trimethoprim minimum

inhibitory concentration (MIC) for all strains. Briefly, different dilutions of the washed cells were plated onto MS + casamino acid (-thiamine) agar plates containing the various trimethoprim concentrations and grown at 30°C for 48 hours. Population sizes were determined by plating appropriate dilutions onto LBL agar plates and growing overnight at 30°C. Mutation frequencies were determined as described above.

Selection of DIvERGE libraries For high throughput (HT) sequencing of the folA libraries, the frozen populations that had undergone DIvERGE mutagenesis were thawed and washed 3 times in 1 ml minimal salts (MS) media. Appropriate dilutions of the 5-cycle DIvERGE population were plated onto MS + casamino acid (without thiamine) agar plates containing various trimethoprim concentrations (4, 11, 67 and 267-times the wild-type MIC). To assess the single step mutational landscape of folA with Illumina HT sequencing, scanning DIvERGE libraries were selected in three replicates on agar plates containing 4-times the wild-type MIC concentration of trimethoprim. Highly trimethoprim-resistant variants were selected on agar plates containing 1000 g/ml trimethoprim. Individual resistant clones were then isolated for further genotype- and trimethoprim susceptibility-analysis and all colonies, numbering from 800 to 1000 per sample for SMRT or 1000 to 3000 per sample for Illumina HT sequencing, were scraped off from plates in 5 ml MS media, from which 0.5 ml was used to extract genomic DNA (using GeneElute Bacterial Genomic DNA Kit, Sigma). GyrA, gyrB, parE, parC libraries were selected on LBL agar plates containing 2-times the wildtype MIC concentration of ciprofloxacin, while gyrA libraries were plated in three replicates to LBL agar plates containing 5-times the wild-type MIC concentration of ciprofloxacin. Following incubation for 72 h at 30°C, 1000, and 3000 resistant clones were isolated from the gyrA libraries and the gyrA, gyrB, parE, parC library, respectively. Genomic DNA was extracted according to the above-mentioned protocol.

Gepotidacin (GSK2140944) selection experiments were performed on Mueller Hinton II (MHBII) (Sigma-Aldrich) agar plates containing 12-times the wild-type MIC concentration of gepotidacin (CAS: 1075236-89-3; HY-16742 from MedChemExpress). Frequencies of resistant cells were assayed by plating approximately 1010 cells to 145 mm agar plates (Greiner CELLSTAR) containing 50 ml MHBII agar. Colony counts were determined after 72 h incubation at 30°C. Selection experiments were performed in triplicates. DIvERGE-generated, resistant variants were isolated from cell libraries on cation-adjusted Mueller Hinton II Agar (Sigma-Aldrich) plates containing 12-times the wild-type MIC concentration (140 ng/ml) of gepotidacin.

Sequencing of folA target regions Capillary sequencing was performed from individual isolates on the folA locus. The locus was amplified by colony PCR with the corresponding primers (Supplementary Dataset S3) in DreamTaq PCR Master Mix (Thermo Fisher Scientific), 5 l of this PCR product was purified using USB ExoSAP-IT (Affymetrix) according to the manufacturer’s protocol. Amplicons were subjected to sequencing with the corresponding forward primer. Determination of allele composition at the target site was achieved by amplicon HT sequencing. To create Illumina HT sequencing libraries, genomic DNA was subjected to 19 cycles of PCR for targeted amplification according to the previously described Phusion HighFidelity PCR protocol(1) with the following modifications: 200 ng of the isolated genomic DNA samples were used as templates for PCR reactions to amplify either the TET landing pad or the folA genomic region with the corresponding primer pairs without barcodes. PCR amplicons, covering the entire target site, were digested using NEBNext dsDNA Fragmentase (New England Biolabs) according to the manufacturer’s instructions (12-15 minutes of fragmentation; 190 bp average fragment length). Fragments were purified with AMPure XP and eluted in water; subsequent end repair was performed, dA tailing and ligation steps were unaltered.

Library preparation and sequencing were done using a MiSeq Reagent Kit v2 for a 250 bp paired-end (PE) sequencing run on a MiSeq (Illumina).

Nucleotide composition analysis in Landing Pad libraries and DIvERGE oligo pools Illumina MiSeq sequencing reads were analyzed with an in-house pipeline to increase accuracy and reduce sequencing noise in DIvERGE library analysis. To remove sequencing read-ends that have higher error probability, paired-end Illumina reads were first trimmed to 190 nucleotides and were subsequently imported to CLC Genomics Workbench Tool (CLC Bio, Version 9.0) for pre-processing. Quality trim with an error threshold of 0.001 (Phred Q value of 30) was then applied to remove any erroneous sequence region and overlapping read pairs were then merged. Two different alignment strategies were applied depending on the source of the sequencing library. When assessing TETRM Landing Pad regions after consecutive DIvERGE cycles, alignments were carried out using CLC Genomics Workbench. To compare the mutational spectrum and mutation frequency on DIvERGE oligos and at the corresponding genomic targets, the BWA-MEM algorithm(2) (version 0.7.5a-r405) was called with default parameters. To increase fidelity, supplementary alignments and soft trim regions were removed from the binary alignments with SAMtools(3) (version: 0.1.19-96b5f2294a) and NGSUtils(4) (version 0.5.7-e98ddfa), respectively. JVarkit(5) was then applied to select reads that exactly spanned the entire target TETRM target region. Blastn (version 2.2.28+(6)) was finally used to report the number of alterations between mapped reads and their target regions. In order to assess mutation frequencies and yield a nucleotide composition table for each reference position within the Landing Pad, binary alignments were summarized with Pysamstats (version 0.24.2, Alistair Miles). Mutation frequency at each nucleotide position was calculated as reads with substitutions/total reads covering the given base (%). Mutation bias indicators were calculated using the Mutanalyst online tool(7). Diversified target positions were defined as nucleotide positions within the Landing Pad where mutation frequency exceeded 6fold the standard deviation (SD) of the background sequencing noise. Incorporation

efficiencies were normalized based on the background sequencing noise. Sequencing noise was assessed at an untargeted region between the target site of TETRM1 and TETRM3 within the Landing Pad. Oligo incorporation efficiencies were determined after one DIvERGE cycle in E. coli K-12 MG1655. Incorporation rate was measured as the ratio of reads which cover the entire oligo-target and differ by at least 1 mutation from the wild-type reference sequence.

Assessment of mutation profiles in folA libraries with Illumina sequencing Mutational and nucleotide composition analysis for each folA library was performed with a custom Python script with built-in sequencing error reduction capabilities. Illumina paired-end reads were first trimmed to 190 nucleotides and BBduk (version 10 December 2015, Bushnell B) was called to carry out quality trim with an error probability threshold of 0.001 (Phred Q > 30) and the resulting overlapping read pairs were then merged. As a next step reads that contained any ambiguous nucleotide as well as reads shorter than 72 bases were removed. BWA-MEM algorithm was then applied with default parameters to map reads to their corresponding genomic targets (on Escherichia coli K-12 MG1655 (NCBI Reference Sequence:

NC_000913.3);

Escherichia

coli

CFT073

(NCBI

Reference

Sequence:

NC_004431.1) or Salmonella enterica subsp. enterica serovar Typhimurium str. LT2 (NCBI Reference Sequence: NC_003197.1) genomes) and supplementary alignments and unmapped read segments (soft trims) were removed. Finally, Pysamstats (version 0.24.2 Alistair Miles) was used to measure mutation frequency and to generate a nucleotide composition table for each targeted reference position. To assess translated, missense mutational profiles and determine mutational hot-spots in scanning DIvERGE libraries, only reads longer than the possible target site of a single DIvERGE oligo were included in the downstream analysis. These reads were then aligned to their genomic reference. The single-step mutational landscape of folA from scanning DIvERGE libraries was determined based on DNA reads which displayed exactly one nucleotide alteration when compared to their reference. These reads were translated to peptide sequences and peptides were then compared to their corresponding coding sequence to

assess amino acid composition for each reference position. See “Assessment of the singlestep adaptive mutational landscape of folA by DIvERGE and Illumina high-throughput sequencing” for the detailed description of the folA single-step adaptive mutational landscape analysis protocol.

Assessment of the single-step adaptive mutational landscape of folA by DIvERGE and Illumina high-throughput sequencing Following sequencing, Illumina MiSeq paired-end reads were de-multiplexed, filtered, aligned to the reference, and analyzed for mutation compositions according to SI Materials and Methods for each mild trimethoprim selection pressure (4-times the wild-type MIC) selected E. coli K-12 MG1655, E. coli CFT073 (UPEC), Salmonella enterica LT2 scanning folA libraries. For each strain, 6 sequencing libraries were analyzed, 3 replicates in which only the regulatory region was targeted for each, and 3 replicates in which only the protein-coding region of folA was targeted with DIvERGE mutagenesis. Strict read-quality control ensured that only high quality (an error probability threshold 0.001 was applied (Phred value = 30)) reads were applied in the downstream analysis. Mutation summary on the amino acid level for the protein-coding and mutation summary on the DNA level for the regulatory region was carried out as described in SI Materials and Methods. When assessing the single-step DNA level mutational landscape of folA, only reads that displayed exactly one DNA mismatch compared to the reference sequence were used to determine mismatch counts and mismatch abundance for each mapping positions. These reads were mapped to the corresponding (E. coli K-12 MG1655 (NCBI Reference Sequence: NC_000913.3); Escherichia coli CFT073 (NCBI Reference Sequence: NC_004431.1) or the Salmonella enterica subsp. enterica serovar Typhimurium str. LT2 (NCBI Reference Sequence: NC_003197.1) genomic sequence. When assessing the single-step amino acid level mutational landscape of the protein-coding region of folA, aligned DNA reads were translated into peptide sequences and were compared to the coding sequence of the target

reference and only reads that displayed exactly one amino acid mismatch were used to determine amino acid mismatch counts and mismatch abundance for each mapping position. Single step mutational abundance data for each mapping position was then used to determine the average abundance of every single step mutational event for each library (n = 3). The threshold of detection was determined based on the background noise level for each biological sample. PCR and sequencing noise was quantified by assessing the Illumina MiSeq false base-call rate from non-mutagenized, wild-type folA control amplicons for each strain. The averaged background error rate at the DNA level after read quality filtering was 0.0003. Based on error probability, amino acid mutations above the threshold value of 0.002 were marked as detected mutational hot-spots and missense amino acid mutations above the threshold value of 0.005 were marked as adaptive, resistance-conferring single-step mutations. Mutations with an average DNA mutation abundance in the folA regulatory region above the threshold value of 0.002 were marked as detected mutational hot-spots and DNA mutations above the threshold value of 0.01 were marked as adaptive resistance-conferring SNPs. See Supplementary Dataset S4 for mutational data.

Assessment of mutation profiles with Single Molecule Real-Time sequencing To assess the allelic composition of the naïve and antibiotic-selected folA, gyrA, and gyrA, gyrB, parE, parC libraries along their entire target, Pacific Biosciences RSII Single Molecule Real-Time (SMRT) circular-consensus amplicon sequencing was applied. 200 ng of the isolated genomic DNA served as the template for Phusion High-Fidelity PCR with the corresponding species and sample specific barcoded primer pairs (Supplementary Dataset S3) to prepare barcoded amplicon libraries. PCR reactions were performed in 50 l volumes with the following settings: 98°C 3 min, 18 - 22 cycles of (98°C 20 sec; 63°C 30 sec; 72°C 90 sec), with a final extension of 5 min at 72°C. To avoid overamplification and amplicon-chimera formation, PCR reactions were stopped at mid-exponential phase (based on the semiquantitative measurement of the PCR product) and amplicons were purified using a Zymo DNA

Clean and Concentrator Kit (Zymo Research). Barcoded amplicons were then mixed at an equimolar ratio and sequencing libraries were prepared and sequenced on Pacific Biosciences RSII SMRTcells by the Norwegian Sequencing Centre (University of Oslo). Each amplicon was sequenced for more than 10 rounds to reach an average circular-consensus error rate of >Q30. Pacific Biosciences CCS reads were imported into CLC Genomics Workbench Tool (CLC Bio, Version 9.0). Reads with any ambiguous nucleotide as well as reads shorter than 80% of the target region were discarded. Each read was individually mapped against its target sequence (on Escherichia coli K-12 MG1655 (NCBI Reference Sequence: NC_000913.3); Escherichia coli CFT073 (NCBI Reference Sequence: NC_004431.1) or Salmonella enterica subsp. enterica serovar Typhimurium str. LT2 (NCBI Reference Sequence: NC_003197.1) genomes) using CLC considering only those alignments that displayed at least 90% sequence similarity over at least 80% of the length of the query read. Single nucleotide variants for each mapped read were then called together with any associated amino acid change within the protein coding region of the reference. Variant calling was performed at a base call error probability threshold of 0.1. A custom R script was applied to summarize allele frequencies and nucleotide composition for each target reference position. In the case of folA, averaged allele frequency from two individual measurements was plotted on the surface of E. coli FolA (Protein Data Bank ID: 1RH3) as complexed with methotrexate, a FolA inhibitor, and NADPH. DNA gyrase allele frequency was plotted on the crystal structure of the DNA gyrase complex from Mycobacterium tuberculosis (Protein Data Bank: 5BS8)(8).

High-throughput DIvERGE oligo sequencing Characterization of the nucleobase composition of each TETRM DIvERGE oligo was achieved by Illumina high-throughput sequencing. Oligos were made double-stranded by annealing each to their reverse complement. 650 ng of this starting material was 5’-end labeled with T4 polynucleotide kinase. Subsequent cleanup was performed with AMPure XP magnetic beads (Beckman Coulter) and DNA was eluted in water. dA tailing was performed without end repair using a NEBNext DNA Library Prep kit. After cleanup with AMPure XP beads, 10 µl of dA-

tailed DNA was used in the subsequent ligation reaction. In order to minimize adaptor dimer formation, the dA-tailed DNA to adaptor concentration was set to 1:2. Library preparation and sequencing was then performed using the NEBNext DNA Library Prep Master Mix Set for Illumina (New England Biolabs) according to the manufacturer’s instructions. Sequencinglibrary generation results in the removal of mismatching nucleobases from the terminal 2 positions, due to the inefficient adaptor-ligation if mismatches are destabilizing the end termini, therefore these positions were excluded from further analyses. Sequencing was done using a MiSeq Reagent Kit v2 for a 250 bp PE sequencing run on a MiSeq (Illumina) and each position was covered with at least 105 reads. Nucleotide composition for each oligo was determined as described above.

Isolation of individual genotypes Defined genotypes were either isolated directly from DIvERGE libraries or were reconstructed individually within the parental strains using pORTMAGE3 recombineering(1). To reconstruct genotypes, ssDNA oligonucleotides, carrying the mutation of interest, were designed to target the replicating lagging-strand of gyrA, parC, and folA. Mutants were generated by performing pORTMAGE genome editing according to a previously described pORTMAGE protocol(1). Clones were then isolated and the presence of the corresponding mutations was confirmed by colony-PCR

and

subsequent

capillary-sequencing.

Oligonucleotides

for

mutant

reconstructions and allele confirmations are listed in Supplementary Dataset S3.

Fitness measurements Growth rate measurements to assess the fitness effect of DIvERGE were performed by growing replicates of E. coli K-12 MG1655 wild-type (n = 6) and randomly chosen, DIvERGE generated (n = 30) individual, trimethoprim-resistant isolates in LBL medium. Cultures of the studied mutants were incubated at 30 °C until early stationary phase, followed by the transfer of ~103 cells from each into 96-well shallow plates containing 100 l LBL medium. Separately,

the growth rate of the newly identified GyrA Gly288Asp mutant was compared to the clinicallyoccurring mutants of GyrA, Ser83Leu and Ser83Leu + Asp87Asn. Growth rate measurements of E. coli K-12 MG1655 GyrA Gly288Asp, Ser83Leu, and Ser83Leu + Asp87Asn were performed in cation-adjusted Mueller Hinton II Broth (Sigma-Aldrich) at 37 °C for each strain (n = 24). Growth curves were recorded by measuring OD600 every 7 min for 24 h at 30°C using a Biotek Powerwave XS2 automated plate reader. The growth rate was calculated from the obtained growth curves following a previously reported procedure(9, 10). Two-tailed t-tests were conducted for each strains compared to the wild-type E. coli K-12 MG1655.

Antibiotic resistance measurements Minimum inhibitory concentrations (MIC) of trimethoprim for E. coli K-12 MG1655, E. coli CFT073, and Salmonella enterica LT2 were quantified with E-test strips (bioMerieux) on MS + casamino acid (without thiamine) agar plates. E-test MIC determinations were carried out in accordance with the manufacturer’s instructions. Trimethoprim susceptibility was determined in MS media + casamino acid (without thiamine). Trimethoprim resistance, quantified as the 75% inhibitory concentration of trimethoprim (IC75), was calculated from the function of growth versus trimethoprim concentrations. Specifically, the IC75 value was calculated as the trimethoprim concentration at which the area under the growth curve of the given cell population was equal to one-quarter of an uninhibited control. As a measure of the effect of each individual genotype, relative IC75 values for each of the corresponding mutants were determined and compared to the IC75 of the wild-type. Measurements were performed in triplicates. Ciprofloxacin and gepotidacin MICs of the wild-type and the corresponding mutant strains were compared by a standard microdilution method in 96-well shallow plates(11, 12). MIC values were defined as the lowest concentration of the drug where no visible growth can be observed, i.e. the background-normalized optical density of the culture at 600 nm was below 0.05. Ciprofloxacin MICs were determined in LBL Broth and gepotidacin MICs were assayed

in cation-adjusted Mueller Hinton II Broth (Sigma-Aldrich) according to the EUCAST guidelines(12).

Supplementary Figure S1 (A) Correlation between the frequency of mutations with the level of spiking over the course of chemical DNA synthesis. The level of mutagenesis within a synthetic 90-mer oligonucleotide (TETRM3) at each nucleotide position within a pool of sequences is strictly controllable by the level of spiking over the course of chemical DNA synthesis. Values represented on the graph were obtained by synthesizing TETRM3 with different spiking levels of nucleoside-phosphoramidite monomers: 0.1, 0.5, 1, 2 % spiking, and a non-spiked control, respectively. Values and standard-deviation values (SDs) are determined by Illumina HT oligonucleotide sequencing (see SI Materials and Methods). (B) Analysis of the sequence divergence within the TETRM1 soft-randomized oligonucleotide pool and mutational complexity at different spiking ratios. The average number of mutations within a pool of chemically spiked 90-mer oligonucleotide (TETRM1) strand is precisely controllable by the level of spiking over the course of DNA synthesis. Control indicates results from an oligo pool which was synthesized without spiking. Values are based on Illumina HT sequencing of at least 50 000 individual oligonucleotide strands for each sample (see SI Materials and Methods). Supplementary Figure S1 A

B

Supplementary Table S1 The prediction of the distribution of the number of point mutations for each nucleotide spiking ratio within 90 nucleotide-long DNA oligos.

Number of mismatches within a 90 nucleotide-long oligonucleotide, compared to the target DNA

Ratio of the given fraction within the softrandomized oligonucleotide pool (%) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

Nucleotide spiking ratio = 0.5 %

Nucleotide spiking ratio = 2 %

Nucleotide spiking ratio = 5 %

25.7 35.2 23.8 10.6 3.5 0.9 0.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0.4 2.2 6.2 11.7 16.2 17.8 16.1 12.3 8.1 4.7 2.5 1.1 0.5 0.2 0.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0.1 0.3 0.8 1.8 3.2 5.2 7.4 9.6 11.1 11.8 11.4 10.2 8.4 6.5 4.6 3.1 1.9 1.1 0.6 0.3 0.2 0.1 0 0 0 0 0 0 0 0 0

0

0

0

Supplementary Figure S2 The relationship between the spiking ratio of a 90-mer oligonucleotide, the number of mismatches, and the efficiency of ssDNA-mediated genome editing25. Hard randomization (25 % spiking) with chemical synthesis of a 90-mer oligonucleotide on its entire length, generates a pool of sequences with limited homology to their target site, while soft randomization (0.5 and 2 % spiking) mainly produces highly homologous sequences towards their target. As the number of mismatching bases is logarithmically related to the allelic-replacement (AR) efficiency(13), increasing spiking ratio in DNA synthesis rapidly abolishes ssDNA-mediated genome editing’s efficiency.

Supplementary Table S2 (A-C) Single-step adaptive mutational landscape of the folA proteincoding and regulatory region in E. coli MG1655 (A), E. coli CFT073 UPEC (B) and Salmonella enterica LT2 (C) under mild (3 mg / L) trimethoprim selection. Only amino-acid substitutions for the protein-coding, and only SNPs for the regulatory region with frequencies above 0.5 % and 1% are shown, respectively. Selection experiments were performed in triplicates. See Supplementary Dataset S4 and SI Materials and Methods for mutational data and for a detailed description of the assay. (D-E) Trimethoprim susceptibility of selected, resistance-conferring Escherichia coli K-12 MG1655 (D), and CFT073 UPEC (E) folA mutants. Star (*) denotes novel, resistance-conferring mutations. See Supplementary Figure S4 for antibiotic doseresponse data. A Escherichia coli K-12 MG1655 Position -58 -54 -32 Position 5 7 26 27 30 94 97 98 111 114 120 128 137 143 144 149 153

Reference nucleotide C G G Reference Amino-acid I A A D W I G R Y H E Y F A D H F

Detected mutation T A A A Detected missense mutation F T T V E C G R L S P N Q K F S V N L S

B Escherichia coli CFT073 UPEC Position -58 -54 -40 -32 Position 5 7 26 27 28 30 94 98 111 114 116 118 119 120 124 128 133 136 138 149 151 152 153 159

Reference nucleotide C G T G Reference Amino-acid I A A D L W I R Y H D E V E H Y W V S H Y C F R

Detected mutation T A A A C Detected missense mutation F S T T V E R C G R L P N Y N V V L M D K Y F R E G L N R L S W

S

C Salmonella enterica LT2 Position -61 -57 -35 Position 5 7 20 21 26 27 28 30 94 97 98 100 114 116 133 136 138 143 144 145 149 153

Reference nucleotide C G G Reference Amino-acid I A M P A D L W I G R Y H D W V S A D A H F

Detected mutation A T A A Detected missense mutation F T I L T E R C G R L S P N Y E R E R V N E L S

S

D Escherichia coli K-12 MG1655 Position

Reference nucleotide

Detected mutation

IC75 (Trimethoprim) g / ml

-58 -54 -32 -32

C G G G

T A C A

16.7 8.0 8.9 6.9

Position 5 7 20 20 21 21 21 27 28 30 30 30 30 94 97 98 153

Reference Detected missense IC75 (Trimethoprim) Amino-acid mutation g / ml I A M M P P P D L W W W W I G R F

F* T* I* V T* Q L E R C G R S* L S* P* S

3.5 4.2 5.1 6 5.1 5.1 6.9 6.9 93.8 6.9 9.6 6.9 6 9.6 1.5 16.8 11.4

IC75 (-fold change compared to wild-type) 11.9 5.7 6.3 4.9 IC75 (-fold change compared to wild-type) 2.5 3.0 3.6 4.3 3.6 3.6 4.9 4.9 67.0 4.9 6.9 4.9 4.3 6.9 1.1 12.0 8.1

E Escherichia coli CFT073 UPEC Position -58 -54 -32 -32 Position 5 7 7 20 21 27 28 30 30 30 94 98 153

Reference nucleotide C G G G Reference Aminoacid I A A M P D L W W W I R F

Detected mutation T A C A Detected missense mutation F S T I L E R R S G L P S

IC75 (Trimethoprim) g / ml 7.4 3.4 2.4 4.2 IC75 (Trimethoprim) g / ml 1.8 1.4 1.4 1.5 4.1 1.6 31.1 2.6 2.7 3.0 4.3 5.2 4.6

IC75 (-fold change compared to wild-type) 15.2 7.6 4.9 8.7 IC75 (-fold change compared to wild-type) 3.7 2.9 3.0 3.0 8.4 3.3 63.8 5.3 5.5 6.2 8.8 10.7 9.4

Supplementary Table S3 Naïve library composition of the E. coli K-12 folA locus after five cycles of DIvERGE mutagenesis. Library composition showed successful mutagenesis both at the regulatory (A) and at the protein-coding (B) region. Number of the detected mutations within individual alleles (n = 5567) are based on Pacific Biosciences SMRT sequencing data (Supplementary Dataset S5). Sequencing reads were mapped to the E. coli K-12 MG1655 (NCBI Reference Sequence: NC_000913.3) genome and base- (A) and amino-acid- (B) substitutions were determined and counted (as COUNTS) according to SI Materials and Methods. A Position -100 -99 -98 -97 -96 -95 -94 -93 -92 -91 -90 -89 -88 -87 -86 -85 -84 -83 -82 -81 -80 -79 -78 -77 -76 -75 -74 -73 -72 -71 -70 -69 -68 -67 -66 -65 -64 -63 -62 -61 -60 -59

Reference allele C A G C A G A A T A T A A A A T T T T C C T C A A C A T C A T C C T C G C A C C A G

Detected SNPs

COUNTS

T215A

1

T217C

1

A223G A224T C225T

2 1 2

T227G;T227C C228T;C228A A229G;A229T T230G C231A;C231G;C231T C232G;C232T T233A;T233C;T233G C234T;C234G;C234A G235A;G235C;G235T C236T A237C;A237T C238T;C238A;C238G C239A;C239G;C239T A240T;A240C;A240G

2;1 6;3 1;1 1 4;3;3 1;1 2;1;1 3;2;1 3;1;1 4 1;1 5;2;1 2;2;2 3;1;1

-58 -57 -56 -55 -54 -53 -52 -51 -50 -49 -48 -47 -46 -45 -44 -43 -42 -41 -40 -39 -38 -37 -36 -35 -34 -33 -32 -31 -30 -29 -28 -27 -26 -25 -24 -23 -22 -21 -20 -19 -18 -17 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0

T C G A C G A C G G T T T A C G C T T T A C G T A T A G T G G C G A C A A T T T T T T T T A T C G G G A A A T C T C A

T242G;T242C C243A;C243G;C243T G244T A245C;A245T C246A;C246T G247C;G247T;G247A A248T C249A;C249G;C249T G250A;G250T G251A;G251C T252A;T252C;T252G T253G;T253A;T253C T254A;T254C;T254G A255C C256G G257A;G257T;G257C C258A;C258T T259C T260G;T260A;T260C T261A;T261C;T261G A262G;A262T C263A;C263T G264A;G264T

4;1 6;3;1 2 1;1 3;2 3;3;2 1 3;1;1 1;1 3;3 2;1;1 2;1;1 2;2;2 1 1 2;2;1 2;1 2 4;1;1 3;2;2 3;2 3;1 1;1

T267C;T267A A268G;A268T G269A;G269T;G269C

3;1 1;1 3;3;1

G271A;G271C;G271T G272C;G272T C273T G274C;G274A A275C;A275T

3;1;1 1;1 3 2;1 2;1

A277C A278G T279A;T279C T280C;T280G T281C T282A T283G;T283A;T283C T284A T285G;T285A;T285C T286A;T286C;T286G A287C;A287T T288A C289A G290A;G290C G291A;G291C G292C;G292A;G292T A293C;A293T;A293G A294C;A294G A295C;A295T;A295G T296A;T296G

2 2 2;2 2;2 2 1 2;1;1 2 3;1;1 1;1;1 1;1 1 1 1;1 1;1 3;1;1 2;2;1 2;1 2;2;1 1;1

T298A;T298C C299A;C299T;C299G

1;1 3;2;1

B Position 1

Reference Amino Acid M

Detected missense mutation

COUNTS

2

I

Ile2Asn;Ile2Phe;Ile2Ser;Ile2Thr

1;1;1;1

3

S

Ser3Arg;Ser3Ile

2;1

4

L

5

I

Ile5Phe;Ile5Ser;Ile5Thr;Ile5Leu

3;3;3;1

6

A

Ala6Ser;Ala6Val

2;1

7

A

Ala7Thr

1

8

L

Leu8Phe;Leu8Ser;Leu8Val

2;1;1

9

A

Ala9Thr;Ala9Pro

2;1

10

V

Val10Leu;Val10Ile

2;1

11

D

Asp11Glu;Asp11Tyr;Asp11Ala

5;2;1

12

R

Arg12Pro

1

13

V

Val13Ala;Val13Asp;Val13Ile;Val13Leu

1;1;1;1

14

I

Ile14Ser;Ile14Val;Ile14Leu;Ile14Thr

2;2;1;1

15

G

Gly15Ala;Gly15Asp;Gly15Cys;Gly15Ser;Gly15Val

3;1;1;1;1

16

M

Met16Arg;Met16Ile

3;1

17

E

Glu17Asp;Glu17Ala;Glu17Gln

2;1;1

18

N

Asn18Ser;Asn18His;Asn18Thr

2;1;1

19

A

Ala19Ser;Ala19Gly;Ala19Val

3;2;1

20

M

Met20Val;Met20Ile;Met20Leu;Met20Thr

3;2;2;1

21

P

Pro21Leu;Pro21Ser;Pro21Thr

3;1;1

22

W

Trp22Cys;Trp22Leu;Trp22Gly

3;3;1

23

N

Asn23His;Asn23Lys

3;1

24

L

Leu24Arg;Leu24Gln;Leu24Val

1;1;1

25

P

Pro25His;Pro25Ala;Pro25Arg;Pro25Leu

2;1;1;1

26

A

Ala26Thr;Ala26Pro;Ala26Asp;Ala26Gly

5;2;1;1

27

D

28

L

Leu28Arg;Leu28Ile;Leu28Phe

1;1;1

29

A

Ala29Ser;Ala29Val

2;2

30

W

Trp30Cys;Trp30*;Trp30Leu

5;3;1

31

F

Phe31Leu;Phe31Cys;Phe31Ile

5;4;1

32

K

Lys32Asn;Lys32Glu

2;1

33

R

34

N

Asn34His;Asn34Asp

2;1

35

T

Thr35Ala

2

36

L

Leu36Phe;Leu36*;Leu36Ile

3;2;2

37

N

Asn37Asp;Asn37Lys

1;1

38

K

39

P

Pro39His

1

40

V

41

I

Ile41Leu;Ile41Val

1;1

42

M

Met42Ile;Met42Leu;Met42Lys;Met42Val

2;1;1;1

43

G

Gly43Asp

1

44

R

Arg44Leu

1

45

H

His45Gln;His45Leu

2;1

46

T

Thr46Ala;Thr46Ser

1;1

47

W

Trp47Cys;Trp47Arg;Trp47Ser

5;2;1

48

E

Glu48Asp;Glu48Lys;Glu48Ala;Glu48Gln;Glu48Val

2;2;1;1;1

49

S

Ser49Ala;Ser49Leu

2;1

50

I

Ile50Phe;Ile50Leu;Ile50Met;Ile50Val

2;1;1;1

51

G

Gly51Ser;Gly51Cys;Gly51Ala;Gly51Arg;Gly51Val

3;2;1;1;1

52

R

Arg52Pro;Arg52Cys;Arg52His

2;1;1

53

P

Pro53Gln;Pro53Ala;Pro53Arg;Pro53Ser;Pro53Thr

2;1;1;1;1

54

L

55

P

Pro55Ser;Pro55Thr;Pro55Ala;Pro55Arg

2;2;1;1

56

G

Gly56Ala;Gly56Glu;Gly56Val

1;1;1

57

R

Arg57His

1

58

K

Lys58Glu;Lys58Arg;Lys58Asn;Lys58Gln;Lys58Thr

2;1;1;1;1

59

N

Asn59Asp;Asn59Lys;Asn59Ser;Asn59Thr;Asn59Tyr

1;1;1;1;1

60

I

Ile60Met;Ile60Phe;Ile60Ser;Ile60Val

1;1;1;1

61

I

Ile61Met;Ile61Leu;Ile61Phe;Ile61Thr;Ile61Val

2;1;1;1;1

62

L

Leu62Arg;Leu62His;Leu62Phe;Leu62Val

1;1;1;1

63

S

Ser63Gly;Ser63Thr

1;1

64

S

Ser64Arg;Ser64Gly;Ser64Thr

2;1;1

65

Q

Gln65Arg;Gln65His

1;1

66

P

Pro66Ala;Pro66Leu;Pro66Ser;Pro66Thr

1;1;1;1

67

G

Gly67Val;Gly67Cys;Gly67Ser

5;2;2

68

T

Thr68Ala;Thr68Arg;Thr68Lys

1;1;1

69

D

Asp69Ala;Asp69Asn;Asp69Glu;Asp69Val

2;2;1;1

70

D

Asp70Ala;Asp70Gly;Asp70His

1;1;1

71

R

72

V

Val72Ile;Val72Ala;Val72Glu;Val72Gly;Val72Leu

2;1;1;1;1

73

T

Thr73Lys;Thr73Met;Thr73Arg;Thr73Pro

3;3;1;1

74

W

Trp74Arg;Trp74*;Trp74Cys;Trp74Leu

2;1;1;1

75

V

Val75Leu;Val75Glu;Val75Gly

3;1;1

76

K

Lys76Arg;Lys76Glu

1;1

77

S

Ser77Leu

1

78

V

Val78Glu;Val78Leu

2;1

79

D

Asp79Asn;Asp79Glu;Asp79His;Asp79Ala

2;2;2;1

80

E

Glu80Asp;Glu80Gly;Glu80Lys

4;1;1

81

A

Ala81Val;Ala81Thr;Ala81Pro

4;3;2

82

I

Ile82Phe;Ile82Asn;Ile82Ser

83

A

Ala83Pro;Ala83Ser;Ala83Val;Ala83Glu;Ala83Gly;Ala83Thr

84

A

Ala84Ser;Ala84Val

2;1;1 2;2;2;1;1; 1 2;1

85

C

Cys85Ser;Cys85Phe;Cys85Trp;Cys85Tyr

3;1;1;1

86

G

Gly86Ala;Gly86Arg;Gly86Ser

2;1;1

87

D

Asp87Asn;Asp87Glu;Asp87Gly;Asp87Tyr

2;1;1;1

88

V

Val88Glu

1

89

P

Pro89Ala

1

90

E

Glu90Ala;Glu90Asp;Glu90Lys

1;1;1

91

I

Ile91Val;Ile91Asn

2;1

92

M

Met92Ile;Met92Leu;Met92Val

3;2;1

93

V

Val93Leu;Val93Ala;Val93Met

5;1;1

94

I

Ile94Asn;Ile94Thr;Ile94Leu;Ile94Ser

2;2;1;1

95

G

96

G

Gly96Ala;Gly96Arg;Gly96Asp

1;1;1

97

G

Gly97Val;Gly97Ala;Gly97Asp;Gly97Ser

3;2;1;1

98

R

Arg98Leu;Arg98Pro;Arg98Cys;Arg98His

2;2;1;1

99

V

Val99Ala;Val99Asp;Val99Ile

2;1;1

100

Y

Tyr100His

1

101

E

Glu101Val

1

102

Q

Gln102Glu;Gln102His

1;1

103

F

Phe103Val

1

104

L

Leu104Met;Leu104Trp;Leu104*;Leu104Phe;Leu104Val

2;2;1;1;1

105

P

Pro105Thr

1

106

K

Lys106Asn;Lys106Glu

1;1

107

A

Ala107Ser

1

108

Q

Gln108His;Gln108Arg;Gln108Lys

3;2;2

109

K

Lys109Asn;Lys109Thr

1;1

110

L

Leu110Gln;Leu110Pro;Leu110Val

2;1;1

111

Y

Tyr111Phe;Tyr111Ser

1;1

112

L

113

T

Thr113Lys

1

114

H

His114Asp;His114Gln;His114Pro

2;1;1

115

I

Ile115Met;Ile115Phe;Ile115Ser

1;1;1

116

D

Asp116Glu

1

117

A

Ala117Glu;Ala117Thr

1;1

118

E

Glu118Ala;Glu118Asp

1;1

119

V

Val119Leu;Val119Glu

2;1

120

E

Glu120Ala;Glu120Asp;Glu120Gln

2;2;1

121

G

Gly121Val

1

122

D

Asp122Ala;Asp122Glu;Asp122Val

1;1;1

123

T

Thr123Ser;Thr123Ile;Thr123Pro

4;2;1

124

H

His124Gln;His124Asn;His124Leu;His124Arg;His124Tyr

3;2;2;1;1

125

F

Phe125Ile;Phe125Leu

2;2

126

P

Pro126Thr;Pro126Ala;Pro126Gln;Pro126Leu;Pro126Ser

2;1;1;1;1

127

D

Asp127Glu;Asp127Asn

3;2

128

Y

Tyr128Asn;Tyr128Asp

4;3

129

E

Glu129Ala;Glu129Gln;Glu129Lys;Glu129Asp

3;2;2;1

130

P

Pro130Thr;Pro130Leu

3;1

131

D

Asp131Ala;Asp131Gly;Asp131Tyr;Asp131His;Asp131Val

2;2;2;1;1

132

D

Asp132Glu;Asp132Tyr;Asp132Asn

2;2;1

133

W

Trp133Arg;Trp133Cys;Trp133Gly

3;2;2

134

E

Glu134Asp;Glu134Ala;Glu134Gln;Glu134Lys;Glu134Val

4;2;1;1;1

135

S

Ser135Ala;Ser135Pro

1;1

136

V

Val136Leu;Val136Ala

2;1

137

F

Phe137Cys;Phe137Ile;Phe137Leu;Phe137Val

4;3;2;1

138

S

Ser138Asn

1

139

E

Glu139Asp

1

140

F

Phe140Leu;Phe140Cys;Phe140Val

2;1;1

141

H

His141Pro;His141Arg;His141Asn;His141Asp

2;1;1;1

142

D

Asp142Glu

1

143

A

Ala143Asp;Ala143Thr

1;1

144

D

Asp144Tyr;Asp144Asn;Asp144Glu;Asp144Ala;Asp144His

4;2;2;1;1

145

A

Ala145Thr;Ala145Glu;Ala145Gly

2;1;1

146

Q

Gln146Leu;Gln146Lys;Gln146Pro

2;2;1

147

N

Asn147Ser;Asn147Thr;Asn147Asp;Asn147Lys;Asn147Tyr

2;2;1;1;1

148

S

Ser148Pro

3

149

H

His149Gln;His149Tyr;His149Arg;His149Asn;His149Asp

150

S

Ser150Cys;Ser150Ile;Ser150Arg;Ser150Asn;Ser150Gly;Ser150Thr

151

Y

Tyr151Asn;Tyr151Asp;Tyr151Cys;Tyr151Ser

3;3;1;1;1 2;2;1;1;1; 1 1;1;1;1

152

C

153

F

154

E

Cys152Phe;Cys152Ser;Cys152Arg;Cys152Gly Phe153Leu;Phe153Ile;Phe153Cys;Phe153Ser;Phe153Tyr;Phe153V al Glu154Ala;Glu154Asp;Glu154Lys

3;3;1;1 6;4;2;1;1; 1 1;1;1

155

I

Ile155Leu;Ile155Met;Ile155Phe;Ile155Thr

3;1;1;1

156

L

Leu156Met;Leu156Gln;Leu156Arg;Leu156Val

3;2;1;1

157

E

Glu157Ala;Glu157Asp;Glu157Gly;Glu157Lys;Glu157Val

1;1;1;1;1

158

R

Arg158Leu;Arg158Pro;Arg158Trp

1;1;1

159

R

Arg159Gln

3

160

*

*160Leu;*160Tyr;*160Gln;*160Lys;*160Ser

2;2;1;1;1

Supplementary Figure S3 Library composition of an E. coli K-12 MG1655 folA allelic variant, differing by 3 SNPs (C-58A; W30C and C132G, a same-sense mutation, at positions -58, 90, and 132) compared to the wild-type sequence, after five cycles of DIvERGE mutagenesis using an oligo pool designed for the wild-type sequence. Allelic composition was determined using Illumina HT sequencing.

Supplementary Figure S4 Trimethoprim dose-response curves of the selected folA mutants (A) and the corresponding wild type, ancestral strains of E. coli K-12 MG1655, E. coli CFT073 and Salmonella enterica LT2 (B). Individual genotypes and strain identifiers (IDs) are indicated within chart legends. 75% inhibition of bacterial growth (IC75), representing 25 % of the area under the growth curve measured in the absence of the drug according to SI Materials and Methods, is marked with an orange line. Growth measurements were performed in MS + casamino acid (without thiamine) medium in triplicates; growth indicates averaged area under the growth curve for n = 3. A

B

Supplementary Table S4 (A) Mutagenic TETRM1 DIvERGE oligonucleotides with 0.5 and 2% spiking levels show high incorporation rates after one DIvERGE cycle in E. coli K-12 MG1655, Salmonella enterica LT2, and Citrobacter freundii ATCC 8090. (B) Comparison of the mutation rate achieved by 5 cycles of DIvERGE with 2% spiked oligonucleotides at the targeted locus, over the basal background mutation rate in Escherichia coli K-12 MG1655, Salmonella enterica LT2 and Citrobacter freundii ATCC 8090. A Spiking level (%)

Incorporation efficiency (%)

0.5

21.0

2

30.1

0.5

15.9

2

23.1

0.5

21.7

2

25.0

E. coli K-12 MG1655

S. enterica LT2

C. freundii ATCC 8090

B Host strain E. coli K-12 MG1655 Salmonella enterica LT2 Citrobacter freundii ATCC 8090

Estimated basal mutation rate

DIvERGE mutation rate (x1E-2)

-fold increase by DIvERGE

1.10*10-8

1.225

1.114*106

6.37*10-8

0.922

1.448*105

1.13*10-8

1.00

8.903*105

Supplementary Figure S5 Locus-specific mutation rate in Salmonella enterica LT2, Escherichia coli K-12 MG1655 and C. freundii ATCC 8090 shows positive correlation with the number of mutagenesis cycles, while the background mutation rate at untargeted nucleotide positions remains unchanged. Mutation rates are background-normalized mutation rates and were measured for 5 consecutive mutagenesis cycles with at the corresponding level of oligonucleotide spiking (TETRM1 and TETRM3, with 0.5 and 2% spiking) using Illumina HT sequencing. Background mutation rate indicates raw sequencing noise, determined at untargeted nucleotide positions (n = 90) for each sample. To assess background-normalized mutation rates at each mutagenesis cycles, background mutation rate was subtracted from locus specific mutation rate (see SI Materials and Methods).

Supplementary Figure S6 (A) DIvERGE maps trimethoprim resistance mutations with high resolution at folA in both model organisms and clinically relevant pathogens. The analysis focused on the promoter (DNA sites below 0) and protein-coding regions (amino acid sites above 0) of folA in E. coli K-12 MG1655 (marked MG), E. coli CFT073 (marked UP), and S. enterica (marked SE). Figure shows mutational hot-spots (marked in Red) based on SMRT sequencing of folA in all three strains under mild trimethoprim selection (4-times the wild-type MIC). (B) Trimethoprim resistance-conferring mutational hot-spots in E. coli K-12 MG1655 FolA (Protein Data Bank ID: 1RH3) as complexed with methotrexate (a FolA inhibitor, in green) and NADPH (in blue). Most trimethoprim resistance-conferring mutations are located at the active site cavity (in the vicinity of amino acid position 28 and 98). Heat-map shows mutation frequency (%). (C) Sequence composition of 1000 trimethoprim resistant variants, generated by DIvERGE mutagenesis in E. coli K-12 MG1655, E. coli CFT073, and in S. enterica LT2 revealed a diverse set of variants in DIvERGE-generated cell libraries. Libraries were selected at 4-times (Green); 67-times (Blue) and 267-times (Yellow) the wild-type trimethoprim MIC. Genotypic analysis was performed by amplicon SMRT sequencing and CCS-reads were mapped to the Escherichia coli K-12 MG1655 (NCBI Reference Sequence: NC_000913.3); Escherichia coli CFT073 (NCBI Reference Sequence: NC_004431.1) or the Salmonella enterica subsp. enterica serovar Typhimurium str. LT2 (NCBI Reference Sequence: NC_003197.1) genome. For extended dataset, see Supplementary Dataset S6. A

B

C

Supplementary Table S5 Trimethoprim susceptibility of individually selected E. coli CFT073 UPEC and Salmonella enterica LT2 folA mutants. For a detailed description of antibiotic susceptibility testing, see SI Materials and Methods.

Strain ID

E. coli CFT0753 wild-type E. coli CFT073#1

FolA mutation

Trimethoprim IC75 (g/ml)

IC75 (-fold change compared to wild-type)

-

-

0.3

1

-

L28R, R98P L28R, A26T, H45R L28R, R98P, V119V, S150N, E157Q L28R, A26T L28R L28R L28R, R44R, H45Y

487.5

1625.0

235

783.3

352.5

1175.0

235 325 280

783.3 1083.3 933.3

415

1383.3

-

-

0.3

1

C-61T, G-29C C-61T, C-61T, T-77G C-61T, T-71A C-61T

R98P, F153S E17D, L28R L28R L28R A26D, L28R

610 1162 794 287.5 382.5

2033.3 3873.3 2650.0 958.3 1275.0

folA regulatory mutation

E. coli CFT073#2

A-8G

E. coli CFT073#3

-

E. coli CFT073#4 E. coli CFT073#5 E. coli CFT073#6

C-58T C-38T, G-32A

E. coli CFT073#7

C-58T

S. enterica LT2 wild-type S. enterica LT2#1 S. enterica LT2#2 S. enterica LT2#3 S. enterica LT2#4 S. enterica LT2#5

Supplementary Table S6 Time-frame and cell-generation calculation for a DIvERGE mutagenesis cycle. The native, basal mutation rate of wild-type E. coli K-12 MG1655 has been experimentally measured by whole-genome sequencing to be 2.2*10-10 mutations per nucleotide per generation(14). The sequencing data for the landing pad assay-based DIvERGE mutagenesis was obtained from strains that had undergone five cycles. Using pORTMAGE (see Methods), we estimate that there were 10 cell-generations per each DIvERGE cycle (Supplementary Table S6). Based on the basal mutation rate of wild-type E. coli K-12 MG1655. 5 cycles of DIvERGE mutagenesis (50 generations) would yield 1.1*10-8 mutations per nucleotide as the basal background mutation rate. As for Salmonella enterica LT2 and Citrobacter freundii ATCC 8090, whole-genome sequencing based mutation rate measurement data is not available. Therefore, to assess the estimated background mutation rate in DIvERGE mutagenesis, we calculated the basal mutation rate based on the available mutation rate measurements from rifampicin-resistance fluctuation analysis for these species.

Step

Time-frame

Number of cell generations

Growth to mid-log phase (to reach OD600 = 0.4-0.6)

1.5 – 2.5 h (host strain growth rate dependent)

8

pORTMAGE induction at 42°C

15 min

0

Cell wash, electrocompetent cell preparation and oligo delivery at 0°C

30 min

0

Initial recovery

1h

2

Total timeframe/ DIvERGE cycle: 3.5 – 4.5 h

Number of cell generations/each DIvERGE cycle: 10

Supplementary References 1.

Nyerges Á, et al. (2016) A highly precise and portable genome engineering method allows comparison of mutational effects across bacterial species. Proc Natl Acad Sci:201520040.

2.

Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWAMEM. ArXiv Prepr ArXiv13033997. Available at: http://arxiv.org/abs/1303.3997 [Accessed November 13, 2016].

3.

Li H, et al. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25(16):2078–2079.

4.

Breese MR, Liu Y (2013) NGSUtils: a software suite for analyzing and manipulating nextgeneration sequencing datasets. Bioinformatics 29(4):494–496.

5.

Lindenbaum P (2015) JVarkit: doi:10.6084/m9.figshare.1425030.v1.

6.

Camacho C, et al. (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421.

7.

Ferla MP (2016) Mutanalyst, an online tool for assessing the mutational spectrum of epPCR libraries with poor sampling. BMC Bioinformatics 17:152.

8.

Blower TR, Williamson BH, Kerns RJ, Berger JM (2016) Crystal structure and stability of gyrase-fluoroquinolone cleaved complexes from Mycobacterium tuberculosis. Proc Natl Acad Sci U S A 113(7):1706–1713.

9.

Warringer J, Blomberg A (2003) Automated screening in environmental arrays allows analysis of quantitative phenotypic profiles in Saccharomyces cerevisiae. Yeast 20(1):53–67.

java-based

utilities

for

Bioinformatics.

10. Karcagi I, et al. (2016) Indispensability of Horizontally Transferred Genes and Its Impact on Bacterial Genome Streamlining. Mol Biol Evol 33(5):1257–1269. 11. Lázár V, et al. (2014) Genome-wide analysis captures the determinants of the antibiotic cross-resistance interaction network. Nat Commun 5. doi:10.1038/ncomms5352. 12. ISO 20776-1:2006 - Clinical laboratory testing and in vitro diagnostic test systems -Susceptibility testing of infectious agents and evaluation of performance of antimicrobial susceptibility test devices -- Part 1: Reference method for testing the in vitro activity of antimicrobial agents against rapidly growing aerobic bacteria involved in infectious diseases ISO. 13. Wang HH, Church GM (2011) Multiplexed genome engineering and genotyping methods applications for synthetic biology and metabolic engineering. Methods Enzymol 498:409– 426. 14. Lee H, Popodi E, Tang H, Foster PL (2012) Rate and molecular spectrum of spontaneous mutations in the bacterium Escherichia coli as determined by whole-genome sequencing. Proc Natl Acad Sci 109(41):E2774–E2783.