BMC Genomics

4 downloads 0 Views 6MB Size Report
After filtering missing data, heterozygosity, and minor allele frequency, a. 36 ... of 11,811 SNPs and 275 soybean genotypes were obtained for association analyses. ...... The model chosen to conduct GWAS should be carefully decided based on ...... Li H: Aligning sequence reads, clone sequences and assembly contigs with.
BMC Genomics Genome-wide association mapping of resistance to a Brazilian isolate of Sclerotinia sclerotiorum in soybean genotypes mostly from Brazil --Manuscript Draft-Manuscript Number: Full Title:

Genome-wide association mapping of resistance to a Brazilian isolate of Sclerotinia sclerotiorum in soybean genotypes mostly from Brazil

Article Type:

Research article

Section/Category:

Plant genomics

Funding Information:

USDA CRIS (5012-21000-026-00D)

Abstract:

Sclerotinia Stem Rot (SSR), caused by the fungal pathogen Sclerotinia sclerotiorum, is ubiquitous in cooler climates where soybean crops are grown. Breeding for resistance to SSR remains challenging in crops like soybean, where no single gene provides strong resistance, but instead, multiple genes work together to provide partial resistance. In this study, genome-wide association study (GWAS) was performed to dissect the complex genetic architecture of soybean quantitative resistance to SSR and to provide effective molecular markers that could be used in breeding programs. A collection of 420 soybean genotypes were selected based on either reports of resistance, or from one of three different breeding programs in Brazil, two commercial, one public. Plant genotype sensitivity to SSR was evaluated by the cut stem inoculation method, and lesion lengths were measured at 4 days post inoculation. Genotyping-by-sequencing was conducted to genotype the 420 soybean lines. The TASSEL 5 GBSv2 pipeline was used to call SNPs under optimized parameters, and with the extra step of trimming adapter sequences. After filtering missing data, heterozygosity, and minor allele frequency, a total of 11,811 SNPs and 275 soybean genotypes were obtained for association analyses. Using a threshold of FDR-adjusted p-values < 0.1, the Compressed Mixed Linear Model (CMLM) with Genome Association and Prediction Integrated Tool (GAPIT), and the Fixed and Random Model Circulating Probability Unification (FarmCPU) methods, both approaches identified SNPs with significant association to disease response on chromosomes 1, 11, and 18. The CMLM also found significance on chromosome 19, whereas FarmCPU also identified significance on chromosomes 4, 9, and 16. These similar and yet different results show that the computational methods used can impact SNP associations in soybean, a plant with a high degree of linkage disequilibrium, and in SSR resistance, a trait that has a complex genetic basis. A total of 125 genes were located within linkage disequilibrium of the three loci shared between the two models. Their annotations and gene expressions in previous studies of soybean infected with S. sclerotiorum were examined to narrow down the candidates.

Corresponding Author:

Steven J. Clough, PhD USDA-ARS Midwest Area UNITED STATES

Dr Steven J. Clough

Corresponding Author Secondary Information: Corresponding Author's Institution:

USDA-ARS Midwest Area

Corresponding Author's Secondary Institution: First Author:

Wei Wei, MS

First Author Secondary Information: Order of Authors:

Wei Wei, MS Ana Carolina Oliveira Mesquita, MS Adriana de A. Figueiró, PhD

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation

Xing Wu, MS Shilpa Manjunatha Daniel P. Wickland Matthew E. Hudson, PhD Fernando C. Juliatti, PhD Steven J. Clough, PhD Order of Authors Secondary Information:

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation

Manuscript

Click here to download Manuscript GBS_Manuscript_May10_2017_Submitted.docx

Click here to view linked References 1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 1 Genome-wide association mapping of resistance to a Brazilian isolate of 5 6 7 2 Sclerotinia sclerotiorum in soybean genotypes mostly from Brazil 8 9 10 3 Wei Weia* 11 12 4 Ana Carolina Oliveira Mesquitab* 13 14 5 Adriana de A. Figueirób 15 16 17 6 Xing Wua 18 19 7 Shilpa Manjunathaa 20 21 22 8 Daniel P. Wicklanda 23 24 9 Matthew E. Hudsona 25 26 27 10 Fernando C. Juliattib 28 29 11 Steven J. Clougha,c 30 31 12 32 33 34 13 a University of Illinois, Department of Crop Sciences, Urbana, IL 61801, USA. 35 36 14 b Universidade Federal de Uberlândia, Umuarama campus, Uberlândia, MG, Brazil. 37 38 39 15 c United States Department of Agriculture—Agricultural Research Service, Urbana, IL 40 41 16 61801, USA 42 43 44 17 45 46 18 * These two authors contributed equally to this work 47 48 49 19 Corresponding author: Steven J. Clough 50 51 20 52 53 21 54 55 56 57 58 59 60 61 62 63 1 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 22 ABSTRACT (350 words max) 5 6 7 23 Sclerotinia Stem Rot (SSR), caused by the fungal pathogen Sclerotinia 8 9 24 sclerotiorum, is ubiquitous in cooler climates where soybean crops are grown. Breeding 10 11 12 25 for resistance to SSR remains challenging in crops like soybean, where no single gene 13 14 26 provides strong resistance, but instead, multiple genes work together to provide partial 15 16 27 resistance. In this study, genome-wide association study (GWAS) was performed to 17 18 19 28 dissect the complex genetic architecture of soybean quantitative resistance to SSR and to 20 21 29 provide effective molecular markers that could be used in breeding programs. A 22 23 24 30 collection of 420 soybean genotypes were selected based on either reports of resistance, 25 26 31 or from one of three different breeding programs in Brazil, two commercial, one public. 27 28 29 32 Plant genotype sensitivity to SSR was evaluated by the cut stem inoculation method, and 30 31 33 lesion lengths were measured at 4 days post inoculation. Genotyping-by-sequencing was 32 33 34 34 conducted to genotype the 420 soybean lines. The TASSEL 5 GBSv2 pipeline was used 35 36 35 to call SNPs under optimized parameters, and with the extra step of trimming adapter 37 38 36 sequences. After filtering missing data, heterozygosity, and minor allele frequency, a 39 40 41 37 total of 11,811 SNPs and 275 soybean genotypes were obtained for association analyses. 42 43 38 Using a threshold of FDR-adjusted p-values < 0.1, the Compressed Mixed Linear Model 44 45 46 39 (CMLM) with Genome Association and Prediction Integrated Tool (GAPIT), and the 47 48 40 Fixed and Random Model Circulating Probability Unification (FarmCPU) methods, both 49 50 51 41 approaches identified SNPs with significant association to disease response on 52 53 42 chromosomes 1, 11, and 18. The CMLM also found significance on chromosome 19, 54 55 43 whereas FarmCPU also identified significance on chromosomes 4, 9, and 16. These 56 57 58 44 similar and yet different results show that the computational methods used can impact 59 60 61 62 63 2 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 45 SNP associations in soybean, a plant with a high degree of linkage disequilibrium, and in 5 6 46 SSR resistance, a trait that has a complex genetic basis. A total of 125 genes were located 7 8 9 47 within linkage disequilibrium of the three loci shared between the two models. Their 10 11 48 annotations and gene expressions in previous studies of soybean infected with S. 12 13 14 49 sclerotiorum were examined to narrow down the candidates. 15 16 50 17 18 19 51 Keywords: GWAS, genotype-by-sequencing, SNP, marker, white mold, QTL, GAPIT, 20 21 52 CMLM, FarmCPU 22 23 24 53 25 26 27 54 BACKGROUND 28 29 55 Many advances are being made in soybean (Glycine max (L.) Merrill) breeding 30 31 32 56 that have been reflected by continual increased production worldwide[1]. For example, in 33 34 57 the US there has been a steady approximate 1.1% average increase in yield (kg/ha) per 35 36 37 58 year from 1924 to 2016 (http://www.nass.usda.gov/). However, although yields are 38 39 59 increasing, yields are still vulnerable to a variety of environmental restrictions, such as 40 41 42 60 the constraint caused by the constant attack by pests and pathogens. The pathogen 43 44 61 Sclerotinia sclerotiorum (Lib.) de Bary is one such yield-limiting pathogen, causing the 45 46 47 62 disease Sclerotinia Stem Rot (SSR), and causing considerable damage in regions that 48 49 63 have had several weeks of cool wet weather during the flowering period [2]. 50 51 64 Resistance to S. sclerotiorum is not complete in most dicotyledon crops such as 52 53 54 65 soybean. Instead, resistance in soybean to SSR is only partial, with multiple alleles 55 56 66 providing a small amount of enhanced defense. Due to this lack of adequate resistance in 57 58 59 67 commercial varieties, and the heavy influence of weather conditions, losses due to SSR 60 61 62 63 3 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 68 can be zero one year, but then account for nearly half of all disease-associated losses the 5 6 69 next, as seen in Iowa 2003 and 2004 [3]. Even though complete resistance to SSR is 7 8 9 70 currently not available for soybean, there are clear differences in the susceptibility of 10 11 71 tested genotypes [4-11]. And although not ideal, partial resistance to SSR does increase 12 13 14 72 the yield potential, as slowing of the disease progression can be successful enough to 15 16 73 allow plants to recover with minimal damage, especially if the weather changes to be 17 18 19 74 more favorable to the plant, and less favorable to the pathogen. 20 21 75 Molecular studies looking into genes that are differentially expressed during SSR 22 23 76 development in soybean identified thousands of genes responding within the first 24 24 25 26 77 hours [12, 13]. These results, in addition to those of physiological and biochemical 27 28 78 studies of S. sclerotiorum infection [14-16], show that S. sclerotiorum – plant molecular 29 30 31 79 interactions are very complex, involving numerous pathogen-released proteins and oxalic 32 33 80 acid for the plant to cope with [17]. The complexity of SSR partial resistance has led 34 35 36 81 many researchers to use QTL analyses to identify markers associated with SSR resistance. 37 38 82 Some early QTL mapping studies used biparental populations with limited genetic 39 40 83 variation, or with populations of limited size [9, 18-21]. But due to limits of mapping 41 42 43 84 studies involving only a few genotypes, and due to the subtle effect of these QTLs, it has 44 45 85 been a challenge to find markers consistent enough to be satisfactorily applied to marker46 47 48 86 assisted selection on a commercial scale. 49 50 87 In recent years the use of next generation sequencing technologies has led to a 51 52 53 88 drastic reduction in the cost of genotyping, facilitating the identification of vast numbers 54 55 89 of SNPs from large numbers of genotypes that can then be used for more accurate 56 57 90 association mapping [22, 23]. The technique of genotype-by-sequencing (GBS) is one 58 59 60 61 62 63 4 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 91 such approach to rapidly and cheaply produce a large number of single base 5 6 92 polymorphisms (SNPs) based on the comparison of restriction fragment sequences that 7 8 9 93 are held in common within the population [24, 25]. In this sense, studies of genomic 10 11 94 association, proposed by Meuwissen et al. [26], are analyzed based on the evaluation of a 12 13 14 95 large number of markers widely distributed throughout the genome. This type of 15 16 96 association mapping involves the search for genotype-phenotype correlations in unrelated 17 18 19 97 individuals and is often faster and more profitable than traditional biparental mapping 20 21 98 [27]. Genotypic and phenotypic data are collected from a population in which kinship is 22 23 99 not strongly controlled by the researcher, and correlations between genetic markers and 24 25 26 100 phenotypes are sought within this population. In this context, the Genome Wide 27 28 101 Association Studies (GWAS) may be a promising strategy for the identification of QTLs 29 30 31 102 for genes of interest. Several GWAS reports have been published recently on soybean 32 33 103 response to S. sclerotiorum, highlighting the common view that GWAS is a valuable 34 35 36 104 approach to identify key genes and regions providing some enhanced resistance to this 37 38 105 importance disease. Using GWAS to look at soybean response to SSR, Iquira et al. [28] 39 40 106 found SNPs of high significance on chromosomes 1, 3, 8, and 20 with the strongest being 41 42 43 107 on 1; Bastein et al. [29] found significance on chromosomes 1, 15, 19, and 20 with the 44 45 108 strongest being on 15. By measuring chemical changes to pigments that might be 46 47 48 109 associated with SSR resistance[30], Zhao et al. [31] found significance on chromosomes 49 50 110 6, 10, and 13 with the strongest being on 13. Of these two pathogen-based studies 51 52 53 111 conducted by the same lab [28, 29], only two of the six QTL were identified in both 54 55 112 studies, highlighting that conducting only one GWAS is not 100% effective in identifying 56 57 113 all the variable nature of the SSR disease interaction, and that there is a need for 58 59 60 61 62 63 5 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 114 additional analyses to identify more SNPs that are significantly associated with enhanced 5 6 115 SSR resistance. 7 8 9 116 In this manuscript we present the results of another GWAS of soybean response 10 11 117 to SSR, with a focus on genotypes from three different SSR resistance breeding programs 12 13 14 118 in Brazil, and with the use of a Brazilian isolate for inoculations. Previously published 15 16 119 SSR GWAS studies were conducted on genotypes largely used in breeding programs in 17 18 19 120 Canada, US, and China, and with pathogen isolates collected in the Northern hemisphere 20 21 121 [28, 29, 31]. Additionally, we compared significance using the GWAS analysis methods 22 23 122 CMLM [32] and FarmCPU [33], as these programs handle testing markers, population 24 25 26 123 structure, and kinship differently. The results provide soybean breeders with additional 27 28 124 SNPs that they can have high confidence are associated with enhanced resistance to SSR 29 30 31 125 and that could be used in marker-assisted selection. 32 33 126 34 35 36 127 37 38 128 39 40 41 129 METHODS 42 43 130 Plant material, inoculation, and phenotypic scoring 44 45 46 131 Soybean [Glycine max (L.) Merrill] plants were grown in 500 ml plastic cups 47 48 132 filled with Bioflora® (Bioflora LTDA-ME, Prata, MG, Brazil) organic plant growth 49 50 51 133 substrate based on pine, other natural fibers, minerals and nutrient enriched, in a 52 53 134 greenhouse at approximately 25-30°C under natural light cycle in Uberlandia Brazil from 54 55 56 135 December 2015 through February 2016 (approximately 12 hour days). At least nine seed 57 58 136 were planted for each genotype. Soybean genotypes originated from the breeding 59 60 61 62 63 6 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 137 programs of FT Sementes (Ponta Grossa, PR, Brazil), Tropical Melhuramente & 5 6 138 Genetica (TMG, Cambé, PR, Brazil), or the Federal Universidade de Uberlandia (MG, 7 8 9 139 Brazil), as well as various plant introductions (PIs) of the Embrapa Soja collection chosen 10 11 140 from the literature. Immature shoot tips were removed at the V2 stage, and within 2-4 12 13 14 141 days, the plants were taken to the lab and cut just below the second trifoliate node. The 15 16 142 freshly cut stems were inoculated with 2-3 day old S. sclerotinorum cultures of isolate 17 18 19 143 Jataí [11] grown on potato dextrose agar plates for cut stem inoculation, a method shown 20 21 144 to have low variability [34], with mycelial plugs picked up with a 200 μl pipette tip, 22 23 145 similar to the straw test [35]. Plants were placed in an 18-20°C, low light, growth 24 25 26 146 chamber immediately after inoculation. After 4 days incubation, plants were phenotyped 27 28 147 by measuring the length of necrotic lesion in centimeters. 29 30 31 148 32 33 149 DNA extraction, GBS library construction and sequencing 34 35 36 150 Total DNA was extracted using a CTAB based method as previously described 37 38 151 [36]. DNA integrity was verified on an agarose gel. Samples that were determined to be 39 40 41 152 intact and of high quality, were pipetted into 96 well plates. Aliquots were removed to 42 43 153 new plates to quantify with picogreen (Molecular Probes, Eugene Oregon, USA) on a 44 45 46 154 Synergy HT (BioTek, Winooski, Vermont, USA) microplate reader. Based on both DNA 47 48 155 amounts and plant samples, 352 samples were chosen, together with 32 controls (10 49 50 156 water blanks and 22 random repeats) for GBS library construction. GBS library 51 52 53 157 construction was based on the method described in Poland et al. (2012) [25]. In summary, 54 55 158 after adjusting DNA concentrations to be approximately 50 ng/μl, five microliters (250 56 57 58 159 ng) were pipetted into a new set of 96-well plates containing 2.5 μl 0.1 μM specific DNA 59 60 61 62 63 7 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 160 barcoded HindIII adaptors. Pretesting of restriction enzyme pairs PstI-HinP1I, PstI-MspI, 5 6 161 PstI-MseI, HindIII-HinP1I, HindIII-MspI, and HindIII-MseI indicated that the HindIII7 8 9 162 MseI pair gave the best digestion results of having an even smear, with a broad peak near 10 11 163 300 nt (Supplemental Figure S1). Samples were therefore restriction digested using 1U 12 13 14 164 HindIII and 1U MseI in a buffer mix at 37°C two hours, followed by 80°C for 20 minutes 15 16 165 (all enzymes and buffers used for the GBS library construction were purchased from New 17 18 19 166 England Biolabs, Ipswich, Massachusetts, USA). Following digestion, a common MseI 20 21 167 adaptor was added, in addition to 40U T4 Ligase and 1 mM ATP and 2x Cutsmart buffer 22 23 168 were added, and the samples incubated at 25°C for 2h, followed by a 65°C incubation for 24 25 26 169 20m. After ligation, 8 μl of each well from each row were collected in a strip of 8 PCR 27 28 170 tubes. Then 25 μl from each of the tubes of a strip were transferred to a 1.5 ml microfuge 29 30 31 171 tube, totaling 200 μl that originated from one 96-well plate. Then 50 μl of samples were 32 33 172 cleaned using Agencourt AMPure XP beads (Beckman Coulter Life Sciences, 34 35 36 173 Indianapolis Indiana, USA), dried, and suspended in 15μl resuspension buffer. PCR 37 38 174 enrichment was performed using master mix containing Illumina primers and NEB 39 40 175 Phusion Master Mix. Following PCR settings were used for enrichment reaction: 98°C 41 42 43 176 30s, 15 cycles (98°C 10s, 68°C 30s, 72°C 30s), 72°C 5m, 4°C forever. After PCR, 44 45 177 samples were purified using the Agencourt AMPure XP beads, and then pooled such that 46 47 48 178 all the wells of a single 96-plate were now represented in a single microcentrifuge tube, 49 50 179 and run on a Bioanalyzer 2100 (Agilent, Santa Clara, CA) using a DNA7500 chip to 51 52 53 180 verify correct size, general success of amplification, and estimation of DNA amounts. 54 55 181 Concentrations were adjusted to have approximately 10 nM DNA in 10 mM TRIS-HCl, 56 57 182 0.05% Tween-20 and run on a single lane of an Illumina HiSeq4000 using a HiSeq SBS 58 59 60 61 62 63 8 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 183 sequencing kit version 4 at the Roy J. Carver Biotechnology Center on the University of 5 6 184 Illinois (Urbana, IL) campus. The Fastq files sequence files were demultiplexed with 7 8 9 185 bcl2fastq v2.17.1.14 conversion software (Illumina). 10 11 186 12 13 14 187 Processing the data from the Illumina sequence readings and selection of SNPs 15 16 188 The TASSEL 5 GBS v2 SNP-calling pipeline [37], together with the soybean 17 18 19 189 reference genome assembly, Gmax_Wm82.v2 [38], were used for SNP identification. 20 21 190 Prior to running TASSEL, the adapter sequences were removed from reads using the 22 23 191 software Cutadapt [39]. To run the TASSEL pipeline, the default settings were used, 24 25 26 192 except for the following parameters: minimum base quality score (mnQS 20), minimum 27 28 193 mapping quality score (minMAPQ 20), and kmer length (kmerLength 80). The alignment 29 30 31 194 step was performed using BWA-MEM [40] with default settings. SNPs with more than 32 33 195 50% missing data were removed, as were genotypes with more than 75% missing SNPs, 34 35 36 196 prior to the imputation step, which was accomplished using Beagle 4.1 [41] with the 37 38 197 parameter window=1000, overlap=200, and ne=1000. After imputation, SNPs or 39 40 198 genotypes with higher than 10% heterozygosity were removed from the dataset. 41 42 43 199 44 45 200 Genome-Wide Association Analysis 46 47 48 201 For the association analysis of SNPs to phenotypes, the compressed mixed linear 49 50 202 model (CMLM) was selected and performed within the GAPIT package of R, which 51 52 53 203 assigned similar individuals into 259 groups to estimate the kinship matrix [30, 39]. The 54 55 204 reduced kinship matrix and three principal components (PCs) generated from principal 56 57 205 component analysis (PCA) (Supplemental Figure S2) were included in the model to 58 59 60 61 62 63 9 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 206 control population structure and individual relatedness. A batch effect was also included 5 6 207 in the model as a covariate to control for possible variation between different 7 8 9 208 phenotyping dates. The SNPs with a minor allele frequency (MAF) higher than 0.01 were 10 11 209 used to estimate the population structure and the kinship. Only SNPs with a MAF higher 12 13 14 210 than 0.1 were used for association tests. The cutoff of significant association was a False 15 16 211 Discovery Rate (FDR) adjusted p-value less than 0.1 using the Benjamini and Hochberg 17 18 19 212 procedure to control for multiple testing [40]. 20 21 213 Another computational method named FarmCPU [31] was also implemented to 22 23 214 do an association analysis, which separated the mixed linear model into a fixed effect 24 25 26 215 model and a random effect model to reduce false negatives that might result from 27 28 216 confounding population structure, kinship, and SNPs. The same batch effect and three 29 30 31 217 PCs generated from GAPIT were included as covariates. Likewise, only SNPs with a 32 33 218 minor allele frequency higher than 0.1 were used for association tests, and the cutoff for 34 35 36 219 significant association was a FDR-adjusted p-value less than 0.1. 37 38 220 In addition to the two main models described above, three other models were also 39 40 221 performed to test marker-trait associations. They were a naïve model without any control 41 42 43 222 for population structure, a general linear model (GLM) with three PCs, and a mixed 44 45 223 linear model (MLM) with three PCs and a kinship matrix without any compression. 46 47 48 224 49 50 225 Linkage disequilibrium (LD) analysis 51 52 53 226 A function built into the TASSEL 5.0 program was used to determine squared 54 55 227 allele-frequency correlations (r2) between pairs of SNPs with MAF greater than 0.1. The 56 57 228 local LD was viewed, and LD plots were built, using Haploview 4.2 [42]. The LD blocks 58 59 60 61 62 63 10 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 229 were defined with the Confidence Interval methods and the default parameters [43]. The 5 6 230 LD decay plot was built based on the r2 values and distances between each pair of SNPs. 7 8 9 231 To calculate the physical distance of LD decay (r2 < 0.2), a nonlinear model was used to 10 11 232 estimate the expected values of E(r2) [44, 45]. 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 11 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 233 RESULTS 5 6 7 234 Phenotyping 8 9 235 Plants that germinated and grew healthily were used for DNA extraction and 10 11 12 236 phenotyping. Most (108 of the 352 total utilized) genotypes were represented by nine 13 14 237 plants, 86 samples had eight plants, 52 had seven, 48 had six, 26 had five, 20 had four, 15 16 238 and only eight genotypes relied on just three plants. Following stem inoculation with S. 17 18 19 239 sclerotiorum, the size distribution of necrotic lesions after 4 days incubation ranged from 20 21 240 0.67 to 8.43 cm, and all genotypes developed a lesion, even the most resistant genotypes, 22 23 24 241 indicating that an infection had successfully occurred in all the plants assayed. Samples 25 26 242 were removed from analysis if: the standard deviation of their phenotypic values was 27 28 29 243 greater than 2.0 cm (19 genotypes), the range of their major and minor phenotypic values 30 31 244 was greater than 5.0 cm (five genotypes), viral infected or miss-labeled (four genotypes). 32 33 34 245 The remaining 324 genotypes had an approximate normal distribution of phenotypes 35 36 246 (Supplemental Figure S3) and an average lesion length of 4.65 cm (Supplemental Table 37 38 247 S1), and were used in for SNP analysis. The genotypes that were the most resistant, 39 40 41 248 including genotypes like EMGOPA 316 [11], PI194639 [8, 9], and NK S19-90 [46] 42 43 249 which have been reported to have partial resistance, are shown in Supplemental Table S2. 44 45 46 250 The most susceptible, including genotypes like BRSGO Ipameri, BRS RAISSA RES 1/3 47 48 251 NCS, and MSOY 7908 RR which have previously been shown to be very susceptible 49 50 51 252 [10], are listed in Supplemental Table S3. 52 53 253 54 55 254 Genotyping 56 57 58 59 60 61 62 63 12 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 255 The HindIII and MseI restriction enzyme double digestions produced a wide peak 5 6 256 of fragments in the range of 300-500 bp with little strong banding (Supplemental Figure 7 8 9 257 S1.F.), and therefore this enzyme pair was used for construction of the GBS libraries. The 10 11 258 HindIII adaptors were barcoded, allowing all the libraries to be pooled and sequenced on 12 13 14 259 a single lane of an Illumina flow cell, yielding 373 million high-quality, single-stranded 15 16 260 readings of 100 nucleotides in length. 17 18 19 261 With the TASSEL 5 GBS v2 SNP-calling pipeline, 68,242 SNPs were identified. 20 21 262 After filtering for less than 50% missing data 14,637 SNPs remained. After imputation 22 23 263 and removing high heterozygosity, 12,411 SNPs remained. Among them, 11,811 SNPs 24 25 26 264 met the MAF threshold at 0.01, and 6,478 SNPs at MAF >0.1. The 11,8111 SNPs spread 27 28 265 across all chromosomes (Figure 1), with chromosome 18 containing the most SNPs 29 30 31 266 (1,179) and chromosome 5 the fewest (276) SNPs. Filtering of genotypes that had 32 33 267 missing data at 75% or more of the SNP sites, or that showed more than 10% 34 35 36 268 heterozygous SNPs, reduced the samples to 275 genotypes for use in association of SNPs 37 38 269 to phenotypes. The phenotypes of these 275 genotypes were fairly normally distributed 39 40 270 (Figure 2) and the most resistant (Table 1) and most susceptible (Table 2) are listed. 41 42 43 271 44 45 272 Association analysis using SNP data from the TASSEL 5 GBS v2 SNP-calling 46 47 48 273 pipeline 49 50 274 A principal component (PC) analysis was performed with the 11,811 SNPs with a 51 52 53 275 MAF >0.01 and the 275 soybean genotypes to estimate the population structure. Based 54 55 276 on the resulting scree plot (Supplemental Figure S2), the first three PCs were included in 56 57 277 the association analysis. PC1 explained 10.3% of the genetic variation, PC2 explained 58 59 60 61 62 63 13 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 278 7.8%, and PC3 explained 5.5%. The PC analysis scatter plot (Figure 3) showed that the 5 6 279 first and second PCs were composed largely of three subpopulations of genotypes 7 8 9 280 originiaing from different sources. The subpopulation “LAGER-UFU’ was more 10 11 281 obviously separated while the subpopulations ‘FT-Sementes’ and ‘TMG’ were more 12 13 14 282 close to each other. Soybean genotypes in the ‘miscellaneous’ group were collected from 15 16 283 different sources, including many that were selected based on the literature. 17 18 19 284 Four different models were implemented and compared in the marker-trait 20 21 285 association tests. Q-Q plots were generated to visualize the effectiveness of controlling 22 23 286 population structure and familial relatedness (Figure 4). The expected distribution of p24 25 26 287 values is a uniform [0,1] distribution under the assumption that no associations exist 27 28 288 between SNP markers and the trait. The naïve model and the GLM model resulted in 29 30 31 289 38.7% and 23.1% of SNPs with p-values < 0.05, respectively and it can be seen from the 32 33 290 Q-Q plot that both models produced a large proportion of p-values that deviated from the 34 35 36 291 expected distribution, which indicated an excess of false positives. Compared to the naïve 37 38 292 model and the GLM, the CMLM and the FarmCPU resulted in 3.8% and 4.9% of SNPs 39 40 293 with p-values < 0.05, indicating the inflation of p-values was reduced and supported the 41 42 43 294 appropriateness of including population structure and kinship matrix in the model. In 44 45 295 addition, FarmCPU yielded most of the SNPs fitting the expected distribution of p-values 46 47 48 296 with a few SNPs exhibiting high significance (higher significance than CMLM). 49 50 297 Because the CMLM and FarmCPU both seemed appropriate for analyzing the 51 52 53 298 association between the soybean SNPs and SSR resistance, the association results from 54 55 299 CMLM (conducted in the R package GAPIT) and FarmCPU (conducted in the R package 56 57 300 FarmCPU) were compared. Setting the FDR-corrected p-value cutoff at 0.05, six SNPs 58 59 60 61 62 63 14 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 301 (on chromosomes 1, 4, 9, 11, 16, and 18) were detected as significant by FarmCPU, with 5 6 302 each significant SNP explaining approximately 2% to 3.2% of the total phenotypic 7 8 9 303 variation (Table 3). The CMLM analysis did not identify any SNPs as significantly 10 11 304 associated at the 0.05 FDR-corrected p-value cutoff. But loosening the strict threshold to 12 13 14 305 an FDR-corrected p-value < 0.1, a total of 19 SNPs (on chromosomes 1, 11, 18 and 19) 15 16 306 showed significant associations with CMLM, where each SNP explained 3.2% to 4.4% of 17 18 19 307 the total phenotypic variation (Table 3). No additional SNPs were significant with 20 21 308 FarmCPU at the 0.9 (Figure 8). 37 38 338 The soybean reference genome [47] was queried to identify the genes that 39 40 339 predicted to be within the LD blocks. The 1.7Mb LD block around the SNP peak on 41 42 43 340 chromosome 1 contained 36 genes, the 0.3 Mb LD block on chromosome 11 contained 44 45 341 35 genes, and the 1.1 Mb LD block on chromosome 18 contained 54 genes 46 47 48 342 (Supplementary Table S4). 49 50 343 The number of genes within the LD blocks on chromosomes 1, 11, and 18 totalled 51 52 53 344 125. The translations of the putative coding sequences of these genes were aligned by 54 55 345 blastx to the nr database of NCBI [48] to identify possible gene function (Supplementary 56 57 346 Table S4). The genes with the LD blocks were also checked for their expression pattern 58 59 60 61 62 63 16 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 347 in a microarray experiment involving soybean transcriptome response to S. sclerotiorum 5 6 348 infection [12]. Although most of the genes showed moderate of no expression change in 7 8 9 349 response to infection, 34 of the 125 genes were identified as being differentially 10 11 350 expressed (inoculated vs mock) during the first 36 hours post inoculation (hpi) (Figure 9). 12 13 14 351 There were sixteen genes had significant differential expression between infected 15 16 352 samples and mock treated samples in at least one time point, making them more 17 18 19 353 promising candidates for genes potentionally involved in defense efforts. Several of the 20 21 354 genes in the microarray study (Figure 9) showed a much stronger expression during S. 22 23 355 sclerotiorum infection than the others. The gene with the strongest differential expression 24 25 26 356 was Glyma.01G106000, that encodes a tau class glutathione S-transferase; it had the 27 28 357 strongest expression differences in both genotypes used in that sudy, at 12, 24, and 36 hpi, 29 30 31 358 and is about 189.0 kb away from the peak SNP on chromosome 1, S1_36045483. Three 32 33 359 genes on chromosome 11, Glyma.11G084000, Glyma.11G084200, and 34 35 36 360 Glyma.11G086600, that are in LD with the peak SNP S11_6493121, also showed fairly 37 38 361 strong induction of expression after infection compared to other genes. Other genes that 39 40 362 were part of the microarray study did not change any more than 2 fold in response to 41 42 43 363 inoculation (Figure 9). 44 45 364 46 47 48 365 DISCUSSIONS 49 50 51 366 The study was successful in identifying SNPs significantly associated with 52 53 367 soybean defense against SSR, a disease that is increasing in importance worldwide, and 54 55 368 whose resistance QTL are very challenging to map due to the high phenotypic variability 56 57 58 369 of the disease, and the minor contribution from each QTL. The use of GWAS with 59 60 61 62 63 17 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 370 thousands of SNP markers, improves the ability to identify QTL with higher statistical 5 6 371 significance, and several other research groups have already applied GWAS to identify 7 8 9 372 loci associated with SSR responses [28, 29, 31]. In this presented GWAS of soybean 10 11 373 resistance to SSR, six QTL were identified using the FarmCPU computation, and four 12 13 14 374 with the CMLM computation, with an overlap of three QTL identified by SNPs: 15 16 375 S1_36045483, S11_6493121, and S18_14327556. The study did not have good overlap 17 18 19 376 with the QTL identified in the other recent GWAS studies, except the possible overlap of 20 21 377 our QTL on chromosome 1 near 35.5 Mb, to the QTL near 27.7 Mb (position updated to 22 23 378 the same version of the soybean reference genome sequence) of Bastein et al. (2014). The 24 25 26 379 QTL on chromosome 19 was only identified as significant by CMLM, but it might be the 27 28 380 same identified by Vuong et al. (2008) where a QTL for SSR resistance in PI194639 was 29 30 31 381 bordered by Satt495 (Chr19:650674) and Satt388 (Chr19:4212645) [9, 49]. This lack of 32 33 382 overlapping QTL identification between different experiments, might reflect the subtle 34 35 36 383 effects of each QTL for SSR resistance or the high variability in phenotyping this disease. 37 38 384 The lack of QTL overlap could also be due to the use of different soybean genotypes, the 39 40 385 use of different pathogen isolates that were collected on different continents, or the use of 41 42 43 386 different inoculation methods or treatments. Among the other GWAS, two of the studies 44 45 387 inoculated flowers and measured disease progression down the stem seven days later [28, 46 47 48 388 29], and the other study treated soybeans with oxalic acid released by S. sclerotinia, at 40 49 50 389 mM, and measured pigment induction in response to this treatment 48 hours later [31]. In 51 52 53 390 the present study, young seedlings were inoculated on freshly cut stems. Therefore, it is 54 55 391 not too surprising that the QTL identified in each study could differ. 56 57 58 59 60 61 62 63 18 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 392 The model chosen to conduct GWAS should be carefully decided based on 5 6 393 species and traits being studied. For this study of soybean, it involved a plant that is self7 8 9 394 pollinated and with high LD across the genome, and resistance to SSR is underlied by 10 11 395 multiple genes with small effects. The analysis tested two different methods, CMLM and 12 13 14 396 FarmCPU, for marker-trait association. The Q-Q plot showed both CMLM and FarmCPU 15 16 397 controlled inflated p-values due to population structure better than the other more naïve 17 18 19 398 models. The FarmCPU method identified more significant loci and with higher levels of 20 21 399 significance for the most significant SNPs. 22 23 400 The mixed linear model has been widely used in GWAS studies on soybean and 24 25 26 401 in a variety of other crops as it was shown to largely reduce the spurious associations 27 28 402 resulting from both population structure and unequal individual relatedness [50-54]. 29 30 31 403 Based on the MLM, the CMLM implemented in the GAPIT package to deal with large 32 33 404 and computational challenging datasets by clustering individuals into groups in the 34 35 36 405 kinship matrix [32, 55]. Although popular for GWAS, in some cases the CMLM can be 37 38 406 insufficient due to the confounding between the population structure, kinship, and 39 40 407 different testing SNPs, which could potentially lead to compromise of true postives. It 41 42 43 408 was reported that the recently developed method FarmCPU mitigated this problem and 44 45 409 led to both increased statistical power and reduced false positives [33]. FarmCPU 46 47 48 410 implements a fixed effect model that contains the testing markers and multiple associated 49 50 411 markers as covariates, and a random model that contains the kinship matrix. These steps 51 52 53 412 are performed separately but optimize each other iteratively. So compared to CMLM in 54 55 413 which the kinship matrix remains constant for all the markers, FarmCPU adjusts its 56 57 414 kinship based on the testing markers and covariates in the fixed effect model. Moreover, 58 59 60 61 62 63 19 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 415 since the CMLM only tests one marker at a time, other associated loci nearby or 5 6 416 elsewhere in the genome will sometimes disrupt with the test and resulted in false 7 8 9 417 positives or false negatives, especially when the effects of the other loci are large [56]. 10 11 418 Therefore, FarmCPU puts selected associated markers as covariates and tests multiple 12 13 14 419 markers simultaneously, thus improving control of both false positives and false 15 16 420 negatives [33]. This could be illustrated by comparing the Manhattan plots (Figure 5) 17 18 19 421 where “spikes” consisting of multiple SNPs in LD are present with CMLM, but only 20 21 422 single SNPs, representing the most significant association, appear with FarmCPU. 22 23 423 However, FarmCPU has a weakness in that removing the significant SNPs in LD with the 24 25 26 424 peak SNPs, reduces information, and these SNPs in LD help to confirm the existence of 27 28 425 an truly associated locus. Considering upsides and downsides, it is beneficial to 29 30 31 426 implement different models to do association studies to increase confidence in the 32 33 427 overlapping loci, and to also increase the possibility of identifying unique novel loci. 34 35 36 428 Soybean has relatively high LD genome-wide, and LD increases extensively in 37 38 429 the pericentromere regions [28, 57, 58]. In our study, seven and ten significant SNPs on 39 40 430 chromosome 1 and 18 were found to be in megabase-level LD blocks in pericentromeric 41 42 43 431 regions, where genes are sparsely distributed according to the soybean reference genome. 44 45 432 The size of the LD blocks could also be affected by the density of the SNP markers. For 46 47 48 433 example, the closest marker next to the LD block from 35045463 bp to 36783951 bp on 49 50 434 chromosome 1 was 1 Mb away, at 34050470 bp, so there were no SNPs indicating where 51 52 53 435 the LD block terminated in this 1 Mb region. A higher density SNP map would help 54 55 436 refine the large LD, and perhaps narrow the region of interest. 56 57 58 59 60 61 62 63 20 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 437 Looking at the genes that are located in LD with the SNPs, and combining 5 6 438 soybean gene expression data after S. sclerotiorum infection, can increase the confidence 7 8 9 439 in identifying candidate SSR defense-associated genes. Two of the more promising genes 10 11 440 in LD with the QTL identified on chromosome 1 based on gene expression were 12 13 14 441 Glyma.01G104100 and Glyma.01G106000. Likewise, the most promising genes on 15 16 442 chromosome 11 were three that showed the strongest induction in response to S. 17 18 19 443 sclerotiorum infection: Glyma.11G084000, Glyma.11G084200, and Glyma.11G086600. 20 21 444 The genes in LD with the significant SNPs on chromosome 18, were weakly 22 23 445 differentially expressed, with the strongest ones, like Glyma.18G113400 and 24 25 26 446 Glyma.18G117400, showing about a 2-fold induction. 27 28 447 Of the genes that are in LD with the significant SNPs on chromosome 1, and 29 30 31 448 whose expression were induced by S. sclerotiorum infection, Glyma.01G106000 was one 32 33 449 of the most interesting as it was strongly induced over all time points and samples, and 34 35 36 450 also significantly induced after Pseudomonas syringae infection on soybean [62]. 37 38 451 Glyma.01G106000 encodes a tau class glutathione S- transferase (GST) protein, which 39 40 452 may be involved in the process of xenobiotic detoxification, reduction of organic 41 42 43 453 hydroperoxides, or oxidative protection [65, 66]—all common needs during plant44 45 454 pathogen interactions. Overexpression of a rice tau-class GST increased the tolerance to 46 47 48 455 salinity and oxidative stress in Arabidopsis thaliana and tobacco, which may be due to 49 50 456 the lower accumulation of reactive oxygen species [67, 68]. Another interesting defense51 52 53 457 related gene near the QTL on chromosome 1 is Glyma.01G104100 which encodes an 54 55 458 isochorismate synthase, the key enzyme in the synthesis of salicylic acid. This gene was 56 57 459 not strongly induced by infection, and was actually reducing over time (Figure 9). 58 59 60 61 62 63 21 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 460 However, salicylic acid is a very important regulatory signal in plant microbe interactions 5 6 461 [59, 60], especially for the host-related cell-death pathways [61, 62]. Inducing cell death 7 8 9 462 is believed to be a beneficial strategy that necrotrophic pathogens utilize, and S. 10 11 463 sclerotiorum has been shown to manipulate host cell death to its advantage [14]. That 12 13 14 464 salicylic acid is known to induce cell death, and that S. sclerotiorum benefits from 15 16 465 enhanced cell death, expression of this isochorismate synthase might enhance 17 18 19 466 susceptibility. Interestingly. the allele present in both of the genotypes of the microarray 20 21 467 study is predicted to have a negative effect on defense according to the GWAS results. 22 23 468 Looking at the genes within LD of the significant SNPs on chromosome 11, most 24 25 26 469 of the genes seem to be related to primary metabolism. Although not directly associated 27 28 470 with defense, primary metabolism may also affect the outcome of plant pathogen 29 30 31 471 interactions, so the involvement of one of these genes on defense cannot be dismissed. 32 33 472 Gene Glyma.11G084200 was strongly induced and putatively encodes a GRIP-like 34 35 36 473 protein. GRIP proteins was shown to target the golgi [63], but their potential function in 37 38 474 plant-pathogen interactions is unknown. 39 40 475 41 42 43 476 The genes within LD of the significant SNPs on chromosome 18 are more 44 45 477 weakly differentially expressed than some of those genes discussed on chromosome 1 46 47 48 478 and 11. Two genes are involved in cell wall modification, Glyma.18G116400 (a probable 49 50 479 polygalacturonase) and Glyma.18G117100 (a cellulose synthase). Although their 51 52 53 480 expression is not significantly affected by S. sclerotiorum infection, plant cell wall 54 55 481 degrading enzymes play an important role in SSR disease [17]. Another gene of interest 56 57 482 within this region on chromosome 18 is Glyma.18G113400, which encodes a putative 58 59 60 61 62 63 22 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 483 Myb transcription factor, increased in expression after S. sclerotiorum infection along the 5 6 484 time course, from 12 to 36 hours (Figure 10). It is shown that Myb transcription factors 7 8 9 485 can be involved in various biological processes in plants, including response to biotic 10 11 486 stresses [64, 65]. 12 13 14 487 15 16 488 CONCLUSIONS 17 18 19 489 Using different genetic materials and inoculation methods than the other recent 20 21 490 SSR GWAS reports, we succeeded to identify new QTL associated with soybean 22 23 491 resistance to SSR disease caused by S. sclerotiorum. The study identified four or six SSR 24 25 26 492 resistance QTL, depending on GWAS computational analysis performed. Three of the 27 28 493 QTL (on chromosomes 1, 11, and 18) were detected by both methods, giving more 29 30 31 494 confidence that they are signicantly associated with soybean defense to SSR disease. 32 33 495 Looking at the genes within the LD blocks of the significant QTL allowed for prediction 34 35 36 496 of candidate defense-associated genes responsible for the enhanced resistance. One of the 37 38 497 most interesting genes in these LD blocks was Glyma.01G104100, which encodes an 39 40 498 isochorismate synthase, and could play a role in regulating host cell death pathways. 41 42 43 499 Further experimentation will be needed to verify the gene functions in soybean- S. 44 45 500 sclerotiorum interaction. 46 47 48 501 49 50 502 51 52 53 503 LIST OF ABBREVIATIONS 54 55 504 GWAS, GBS, QTL, FarmCPU, GAPIT, LD 56 57 58 59 505 DECLARATIONS 60 61 62 63 23 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 506 Ethics approval and consent to participate: Not applicable 5 6 507 Consent for publication: Not applicable 7 8 9 508 Availability of data and material: All data generated or analysed during this study are 10 509 included in this published article [and its supplementary information files]. 11 12 510 Competing interests: "The authors declare that they have no competing interests" 13 14 15 511 Funding: The following sources helped to finance this research: US Department of 16 512 Agriculture (USDA) National Sclerotinia Initiative, and USDA CRIS to SJC, as well as 17 513 Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Conselho 18 19 514 Nacional de Desenvolvimento Científico e Tecnológico (CNPq), and Fundação de 20 515 Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG) to FCJ. 21 22 516 Authors’ contributions: 23 24 25 517 Wei Wei—Major work in conducting GWAS analyses and figure generation and writing 26 27 518 of the manuscript; assisted with GBS library construction. 28 29 30 519 Ana Carolina Oliveira Mesquita – Major effort in writing of the first draft, inoculations 31 32 520 and disease assessments, DNA extractions / gels, organization of samples. 33 34 521 Adriana Figueiró -- Major player in the organization of the inoculations and disease 35 36 37 522 assessment, supervised and assisted with DNA extractions. 38 39 523 Xing Wu – Major role in GBS and GWAS analyses, assisted in drafting the manuscript. 40 41 42 524 Shilpa Manjunatha – Major role in construction of the GBS libraries. 43 44 525 Daniel P. Wickland – Optimized TASSEL pipeline for soybean SNP calling. 45 46 47 526 Matthew E. Hudson – Adviser of D.P.W., assisted with analysis of the data or 48 49 527 organization of the manuscript. 50 51 52 528 Fernando C. Juliatti – Adviser of A.C.O.M. and A.F. Supervised plant inoculation and 53 54 529 disease analysis (conducted within his lab and campus facilities), assisted with the 55 56 530 writing of the manuscript. 57 58 59 60 61 62 63 24 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 531 Steven J. Clough -- Adviser of W.W. and X.U. Supervised the overall project, major role 5 6 532 in manuscript preparation and writing. 7 8 9 10 533 11 12 534 ACKNOWLEDGEMENTS 13 14 15 535 Authors would like to thank Larissa Miguel for assisting us with some of the plant 16 17 536 care, inoculations, and DNA extractions; and to thank University of Illinos professors 18 19 20 537 Patrick Brown and Alexander Lipka, and former Ph.D. student Marcio Pais de Arruda, 21 22 538 for valuable discussions related to GBS and GWAS analyses. Figure 2B kindly provided 23 24 25 539 by Sufei Wang, PhD candidate, University of Illinois. We would like to thank TMG 26 27 540 (Tropical Melhoramento & Genética, Cambé, PR, Brazil) and FT-Sementes (Ponta 28 29 30 541 Grossa, PR, Brazil) for providing soybean seeds from their breeding programs, as well as 31 32 542 Marcelo Oliveira, curator of the Embrapa Soja (Londrina, PR, Brazil) soybean 33 34 543 germplasm collection, for providing the additional genotypes that were requested. 35 36 37 544 38 39 545 DISCLAIMER 40 41 42 546 Mention of trade names or commercial products in this publication is solely for 43 44 547 the purpose of providing specific information and does not imply recommendation or 45 46 47 548 endorsement by the U.S. Department of Agriculture, the University of Illinois, nor the 48 49 549 Universidade Federal de Uberlândia. 50 51 550 52 53 54 551 55 56 57 58 59 60 61 62 63 25 64 65

1 Submitted to BMC Genomics, May 10, 2017 SSR resistance mapping by GWAS 2 3 4 552 TABLES 5 6 553 7 8 Table 1. Most resistant genotypes (6.25 cm,