Reverse transcriptase-related enzymes are associated with horizontal chromosome transfer in an asexual pathogen Xiaoqiu Huang, Anindya Das, Binod B Sahu, Subodh K Srivastava, Leonor F Leandro, Kerry O'Donnell, Madan K Bhattacharyya
Supernumerary chromosomes have been shown to transfer horizontally from one isolate to another. However, the mechanism by which horizontal chromosome transfer (HCT) occurs
PrePrints
is unknown. In this study, we compared the genomes of 11 isolates comprising six Fusarium species that cause soybean sudden death syndrome (SDS) or bean root rot (BRR), and detected numerous instances of HCT in supernumerary chromosomes. We also identified a statistically significant number (21 standard deviations above the mean) of single nucleotide polymorphisms (SNPs) in the supernumerary chromosomes between isolates of the asexual pathogen F. virguliforme. Supernumerary chromosomes carried reverse transcriptase-related genes (RVT); the presence of long RVT open reading frames (ORFs) in the supernumerary chromosome was correlated with the presence of two or more chromosome copies with a significant number of SNPs between them. Our results suggest that supernumerary chromosomes transfer horizontally via an RNA intermediate. Understanding the mechanism by which HCT occurs will have a profound impact on understanding evolution and applying biotechnology as well as accepting HCT as a natural source of genetic variation.
PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1324v1 | CC-BY 4.0 Open Access | rec: 25 Aug 2015, publ: 25 Aug 2015
1
Reverse transcriptase-related enzymes are
2
associated with horizontal chromosome transfer in an asexual pathogen
PrePrints
3
Xiaoqiu Huang1,2,3 , Anindya Das1 , Binod B. Sahu4 , Subodh K. Srivastava5 ,
4
Leonor F. Leandro6 , Kerry O’Donnell7 , Madan K. Bhattacharyya4
5
6
7
8
9
10
11
12
1
Department of Computer Science and 2 Plant Sciences Institute, Iowa State University,
Ames, Iowa 50011, USA 4
Department of Agronomy, Iowa State University, Ames, Iowa 50011, USA
5
Crop, Soil and Environmental Sciences, University of Arkansas, Fayetteville, Arkansas
72701, USA 6
Department of Plant Pathology, Iowa State University, Ames, Iowa 50011, USA
7
National Center for Agricultural Utilization Research, US Department of Agriculture,
Agricultural Research Service, Peoria, Illinois 61604, USA
13
14
15
Corresponding Author: 3
Xiaoqiu Huang
16
Department of Computer Science, Iowa State University, Ames, Iowa 50011, USA
17
E-mail:
[email protected], Phone: (515) 294-2432, Fax: (515) 294-0258
18
Key Words: Asexual reproduction, Horizontal chromosome transfer, Reverse transcriptase-
19
related enzymes, Duplication-induced mutation
1 PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1324v1 | CC-BY 4.0 Open Access | rec: 25 Aug 2015, publ: 25 Aug 2015
PrePrints
20
Abstract
21
Supernumerary chromosomes have been shown to transfer horizontally from
22
one isolate to another. However, the mechanism by which horizontal chromosome
23
transfer (HCT) occurs is unknown. In this study, we compared the genomes of 11
24
isolates comprising six Fusarium species that cause soybean sudden death syndrome
25
(SDS) or bean root rot (BRR), and detected numerous instances of HCT in super-
26
numerary chromosomes. We also identified a statistically significant number (21
27
standard deviations above the mean) of single nucleotide polymorphisms (SNPs) in
28
the supernumerary chromosomes between isolates of the asexual pathogen F. vir-
29
guliforme. Supernumerary chromosomes carried reverse transcriptase-related genes
30
(RVT); the presence of long RVT open reading frames (ORFs) in the supernu-
31
merary chromosome was correlated with the presence of two or more chromosome
32
copies with a significant number of SNPs between them. Our results suggest that
33
supernumerary chromosomes transfer horizontally via an RNA intermediate. Un-
34
derstanding the mechanism by which HCT occurs will have a profound impact on
35
understanding evolution and applying biotechnology as well as accepting HCT as
36
a natural source of genetic variation.
2 PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1324v1 | CC-BY 4.0 Open Access | rec: 25 Aug 2015, publ: 25 Aug 2015
PrePrints
37
Introduction
38
Conventional wisdom holds that asexual organisms lack a mechanism for generating ge-
39
netic variation. However, evidence of sexual reproduction has not been detected in many
40
important plant pathogens (Masel et al., 1996; Michielse & Rep, 2009). Asexual fungal
41
pathogens are known to have variable electrophoretic karyotypes (Kistler & Miao, 1992).
42
It was shown through pulsed-field gel eletrophoresis that extra nonessential chromosomes
43
(called supernumerary chromosomes) are present only in some individuals of a species
44
(Masel, Irwin & Manners, 1993). It was demonstrated under laboratory conditions that a
45
2-Mb supernumerary chromosome was transferred between two vegetatively incompatible
46
isolates of an asexual fungus (He et al., 1998). A whole-genome comparative study also
47
suggests that supernumerary chromosomes were horizontally acquired (Ma et al., 2010).
48
In addition, the genomes of some asexual fungal pathogens contain lineage-specific (LS)
49
regions that are highly variable among isolates (Klosterman et al., 2011). An analysis of
50
several nonhomologous recombination forms and polymorphic sequence types of each form
51
in LS regions from different isolates suggests that LS sequences were horizontally acquired
52
(Huang, 2014). These results indicate that horizontal transfer generates genetic variation
53
in asexual fungal pathogens, yet the mechanism and extent of transfer are unknown. It is
54
also unclear whether transfer creates not only presence/absence polymorphisms but also
55
single nucleotide polymorphisms (SNPs).
56
Fusarium virguliforme is an economically important fungal pathogen that causes sud-
57
den death syndrome (SDS) in soybean in North and South America (Aoki et al., 2003).
58
Although different F. virguliforme isolates show variation in aggressiveness on soybean
59
plants, studies with various molecular markers detected an extremely low level of genetic
60
variation within F. virguliforme isolates from North and South America (O’Donnell et
61
al., 2010; Mbofung et al., 2012). Moreover, mating experiments with 17 US isolates of
62
F. virguliforme indicated that they all belonged to a single mating type (Covert et al.,
63
2007). A genome assembly of a F. virguliforme isolate was produced recently (Srivastava
64
et al., 2014), and the mating type locus in F. virguliforme and its six close relatives were
65
characterized. A PCR assay based on both mating type sequences revealed that all 129
3 PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1324v1 | CC-BY 4.0 Open Access | rec: 25 Aug 2015, publ: 25 Aug 2015
PrePrints
66
isolates of F. virguliforme in North and South America possessed the MAT1-1 mating type
67
(Hughes et al., 2014). These data suggest that the reproduction mode of F. virguliforme
68
on soybean is asexual. It is unknown how genetic variation in this asexual pathogen is
69
generated for the disease to have reached all major soybean-growing areas in the USA
70
since its first detection in the early 1970s.
71
F. virguliforme is related to the sexual fungal pathogen Nectria haematococca MPVI,
72
which is known to contain supernumerary chromosomes (Miao, Covert & VanEtten, 1991).
73
These supernumerary chromosomes contain genes involved in resistance to plant antimi-
74
crobial compounds and in host-specific pathogenicity (Covert, 1998). Sequences of the
75
N. haematococca MPVI supernumerary chromosomes (Coleman et al., 2009) can be used
76
to determine if their homologs are present in F. virguliforme.
77
F. virguliforme is closely related to five morphologically distinct Fusarium species that
78
cause SDS or bean root rot (BRR): F. azukicola, F. brasiliense, F. cuneirostrum, F.
79
phaseoli and F. tucumaniae (Aoki et al., 2012). F. tucumaniae is the only known sexually
80
reproducing fungus among these species (Covert et al., 2007). In this study, we selected
81
ten isolates of these closely related species — three (F. virguliforme), three (F. tucuma-
82
niae), and one (each of the other four species) — for next-generation genome sequencing
83
and analysis in comparison with the F. virguliforme genome assembly (Srivastava et al.,
84
2014) as a reference (Table 1). Note that including the reference isolate leads to a total
85
of eleven isolates, four of which are F. virguliforme isolates.
86
We compared the genomes of 11 isolates comprising six Fusarium species that cause
87
SDS or BRR, and detected numerous instances of HCT (horizontal chromosome transfer)
88
in supernumerary chromosomes. The genome of the asexual pathogen F. virguliforme
89
was composed of a large core genome and a small supernumerary portion; there was little
90
variation in the core between isolates, but there are a statistically significant number (21
91
standard deviations above the mean) of SNPs in the supernumerary chromosomes between
92
isolates. Supernumerary chromosomes carry reverse transcriptase-related genes (RVT);
93
they were highly variable in length, and the presence of long RVT open reading frames
94
(ORFs) in the supernumerary chromosome was correlated with the presence of two or more
95
chromosome copies with a significant number of SNPs between them. Our results suggest 4
PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1324v1 | CC-BY 4.0 Open Access | rec: 25 Aug 2015, publ: 25 Aug 2015
96
that supernumerary chromosomes transfer horizontally via an RNA intermediate; their high
97
SNP and length variation rates were attributed to the high error rates of the RVT enzymes.
98
We report an extensive body of evidence to suggest that the F. virguliforme genome
99
evolved by two mechanisms: duplication-induced mutation for the core and replication via
PrePrints
100
an RNA intermediate for the supernumerary.
101
Materials & Methods
102
Sequence data
103
We selected 10 isolates of six Fusarium species and produced Illumina paired-end reads of
104
102 bp for each of them. We previously produced a genome assembly (NCBI BioProject
105
Accession: PRJNA63281) of isolate F. virguliforme Mont-1 (Srivastava et al., 2014),
106
which was used as a reference genome assembly in this study. The origin, year of collection,
107
and name abbreviation of each of these 11 isolates are in Table 1.
108
Read mapping and SNP detection
109
A SNP between the reference isolate and another isolate (query) has two or more alleles
110
called REF and ALT. The REF allele refers to the allele in the reference and ALT alleles
111
refer to alternate non-reference alleles. A SNP is of type 2 if both the REF allele and the
112
ALT allele are present in the query isolate, and of type 1 if only the ALT allele is present
113
in the query isolate.
114
The sets of Illumina paired-end reads for each query isolate were mapped onto the
115
reference genome assembly with Bowtie2 (Langmead & Salzberg, 2012). The output from
116
Bowtie2 in SAM format was redirected to Samtools (Li et al., 2009) with the view com-
117
mand to produce output in BAM format, which was sorted with the sort command. The
118
sorted output in BAM format was piled up on the reference with the mpileup command.
119
For command options and parameter values, see Huang (2014). The sorted BAM output
120
files for all the isolates along with the reference genome assembly were uploaded into Inte-
5 PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1324v1 | CC-BY 4.0 Open Access | rec: 25 Aug 2015, publ: 25 Aug 2015
121
grative Genomics Viewer (Robinson et al., 2011) for viewing SNPs and presence/absence
122
polymorphisms in each isolate.
PrePrints
123
Assembly of short reads
124
An assembly of paired-end reads for each isolate was performed with an Illumina version
125
of PCAP (Huang et al., 2003) with the following data and options: a pair of mate files
126
in fastq format; a minimum insert length of 100 bp and a maximum insert length of 700
127
bp; an average insert length of 400 bp with a standard deviation of 100. The minimum
128
length of overlaps with no base mismatch match was set to 84 bp, and that of overlaps
129
with up to three base mismatches was set to 90 bp. No overlap with more than three
130
base mismatches was accepted. Each data set was of size up to 49 Gb, and each assembly
131
could be produced in a day on a processor with 100 Gb of main memory. One feature of
132
the program is that it is conservative in joining reads into contigs by avoiding reads in the
133
overlap between two potential contigs that can not be merged into one.
134
Assembly mapping
135
Each assembly of Illumina reads was mapped to the reference genome assembly by using
136
BWA-MEM (Li, 2013) with the default options. The output from BWA-MEM in SAM
137
format was redirected to Samtools (Li et al., 2009) with the view command and -bS
138
options to produce output in BAM format, which was sorted with the sort command. An
139
output file of SNPs and indels in VCF format was produced in the same way as in the
140
read mapping. The assembly mapping was useful in finding long indels between contigs
141
in the reference assembly and query assembly, respectively. The coordinates of an indel
142
between two contigs were found by computing an alignment of the contigs with GAP3
143
(Huang & Chao, 2003).
144
Gene identification and functional annotation
145
Ab initio gene identification in a Fusarium genomic sequence was performed using Augus-
146
tus (Stanke & Waack, 2003) with training data from F. graminearum. A non-redundant 6
PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1324v1 | CC-BY 4.0 Open Access | rec: 25 Aug 2015, publ: 25 Aug 2015
147
protein sequence database at National Center for Biotechnology Information was searched
148
using Blastx (Gish & States, 1993) with a genomic coding region as a query to find a set
149
of protein database sequences that are most similar to the coding region. The gene struc-
150
ture from Augustus was refined by using AAT (Huang et al., 1997) on the set of protein
151
database sequences. Functional annotation of genes was performed using the HMMER
152
web server (Finn, Clements & Eddy, 2011).
PrePrints
153
Phylogenetic analysis
154
A maximum-likelihood tree of the 11 SDS/BRR Fusarium isolates was inferred from
155
genome-wide SNP data. The data were produced by mapping reads from each of 10
156
of the 11 isolates onto a genome assembly of the reference Fv Mont-1. A covered SNP
157
position is a position of the reference that was sufficiently covered by reads from each
158
isolate and had an alternative allele (a SNP) in the read coverage of this position from
159
one of the 10 isolates. A total of 297,076 covered SNP positions were aligned in the
160
11 isolates. The multiple sequence alignment was analyzed to infer the tree with 200
161
bootstrap samples.
162
Results
163
Rapid evolution in a small portion of the F. virguliforme genome
164
We mapped short reads from each of the ten isolates onto a 50.5-Mb genome assembly
165
of isolate Fv Mont-1 (Srivastava et al., 2014) as a reference. The length of the reference
166
covered by reads from the isolate and the distribution of SNPs between the reference
167
and the isolate are given in Table 2. Table 2 reveals significant variation in evolutionary
168
rate among the isolates. First, the four F. virguliforme isolates possessed a low genome-
169
wide SNP rate of less than 1 in 10,000 bp, which is consistent with an asexual mode of
170
reproduction. Isolate Fv 34551 collected in South America in 2002 was closer to Fv Mont-
171
1 collected in the USA in 1991 than the other two F. virguliforme isolates collected in
172
the USA. Second, the genome-wide SNP rate of about 1 in 200 bp between the reference 7
PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1324v1 | CC-BY 4.0 Open Access | rec: 25 Aug 2015, publ: 25 Aug 2015
PrePrints
173
and each non-F. virguliforme isolate was at least 58 times higher than that between
174
the reference and each F. virguliforme isolate, indicating a significantly higher level of
175
polymorphism and suggesting a much longer divergence time between F. virguliforme and
176
the other species. Third, the genome-wide SNP rate of 1 in 200 bp between the reference
177
and each non-F. virguliforme isolate was not high enough to explain why at least 10 Mb
178
of the reference genome was covered by reads from every F. virguliforme isolate, but not
179
by reads from any non-F. virguliforme isolate.
180
To shed light on the last observation, we selected all of the contigs that were at least
181
1 kb in the reference assembly and calculated the total number of contig bases covered by
182
reads from Fc 31157 as well as that not covered by reads from this isolate. The size of the
183
covered portion was 39.5 Mb; that of the uncovered portion was 10.9 Mb. The uncovered
184
portion was A+T rich (68%), whereas the covered portion was A+T poor (45%). The
185
duplicated content of the uncovered portion was 70%, with 48% made up of sequences
186
with copy numbers above 20. In sharp contrast, the duplicated content of the covered
187
portion was 3.8%, with 0.56% made up of 20-plus-copy sequences. These results indicate
188
that since the divergence between F. virguliforme and F. cuneirostrum, the uncovered
189
portion of the F. virguliforme genome evolved much faster in association with duplication
190
and C-to-T/G-to-A mutation. For example, a maximum likelihood tree of 13 duplicated
191
sequences 3,772 bp in length from the genome assembly of Fv Mont-1 showed that the
192
more recently duplicated sequences have a higher A+T content (Fig. 1).
193
Although the genome-wide SNP rate between the reference F. virguliforme isolate and
194
each of the other three F. virguliforme isolates was at most 0.00007, we found high levels
195
of variation among the four F. virguliforme isolates in a small portion (