Reverse transcriptase-related enzymes are ... - Semantic Scholar

2 downloads 0 Views 377KB Size Report
Aug 25, 2015 - isolates of the asexual pathogen F. virguliforme. .... (Masel, Irwin & Manners, 1993). ..... Five sequence types in the Fv Clinton-1B genome. 292.
Reverse transcriptase-related enzymes are associated with horizontal chromosome transfer in an asexual pathogen Xiaoqiu Huang, Anindya Das, Binod B Sahu, Subodh K Srivastava, Leonor F Leandro, Kerry O'Donnell, Madan K Bhattacharyya

Supernumerary chromosomes have been shown to transfer horizontally from one isolate to another. However, the mechanism by which horizontal chromosome transfer (HCT) occurs

PrePrints

is unknown. In this study, we compared the genomes of 11 isolates comprising six Fusarium species that cause soybean sudden death syndrome (SDS) or bean root rot (BRR), and detected numerous instances of HCT in supernumerary chromosomes. We also identified a statistically significant number (21 standard deviations above the mean) of single nucleotide polymorphisms (SNPs) in the supernumerary chromosomes between isolates of the asexual pathogen F. virguliforme. Supernumerary chromosomes carried reverse transcriptase-related genes (RVT); the presence of long RVT open reading frames (ORFs) in the supernumerary chromosome was correlated with the presence of two or more chromosome copies with a significant number of SNPs between them. Our results suggest that supernumerary chromosomes transfer horizontally via an RNA intermediate. Understanding the mechanism by which HCT occurs will have a profound impact on understanding evolution and applying biotechnology as well as accepting HCT as a natural source of genetic variation.

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1324v1 | CC-BY 4.0 Open Access | rec: 25 Aug 2015, publ: 25 Aug 2015

1

Reverse transcriptase-related enzymes are

2

associated with horizontal chromosome transfer in an asexual pathogen

PrePrints

3

Xiaoqiu Huang1,2,3 , Anindya Das1 , Binod B. Sahu4 , Subodh K. Srivastava5 ,

4

Leonor F. Leandro6 , Kerry O’Donnell7 , Madan K. Bhattacharyya4

5

6

7

8

9

10

11

12

1

Department of Computer Science and 2 Plant Sciences Institute, Iowa State University,

Ames, Iowa 50011, USA 4

Department of Agronomy, Iowa State University, Ames, Iowa 50011, USA

5

Crop, Soil and Environmental Sciences, University of Arkansas, Fayetteville, Arkansas

72701, USA 6

Department of Plant Pathology, Iowa State University, Ames, Iowa 50011, USA

7

National Center for Agricultural Utilization Research, US Department of Agriculture,

Agricultural Research Service, Peoria, Illinois 61604, USA

13

14

15

Corresponding Author: 3

Xiaoqiu Huang

16

Department of Computer Science, Iowa State University, Ames, Iowa 50011, USA

17

E-mail: [email protected], Phone: (515) 294-2432, Fax: (515) 294-0258

18

Key Words: Asexual reproduction, Horizontal chromosome transfer, Reverse transcriptase-

19

related enzymes, Duplication-induced mutation

1 PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1324v1 | CC-BY 4.0 Open Access | rec: 25 Aug 2015, publ: 25 Aug 2015

PrePrints

20

Abstract

21

Supernumerary chromosomes have been shown to transfer horizontally from

22

one isolate to another. However, the mechanism by which horizontal chromosome

23

transfer (HCT) occurs is unknown. In this study, we compared the genomes of 11

24

isolates comprising six Fusarium species that cause soybean sudden death syndrome

25

(SDS) or bean root rot (BRR), and detected numerous instances of HCT in super-

26

numerary chromosomes. We also identified a statistically significant number (21

27

standard deviations above the mean) of single nucleotide polymorphisms (SNPs) in

28

the supernumerary chromosomes between isolates of the asexual pathogen F. vir-

29

guliforme. Supernumerary chromosomes carried reverse transcriptase-related genes

30

(RVT); the presence of long RVT open reading frames (ORFs) in the supernu-

31

merary chromosome was correlated with the presence of two or more chromosome

32

copies with a significant number of SNPs between them. Our results suggest that

33

supernumerary chromosomes transfer horizontally via an RNA intermediate. Un-

34

derstanding the mechanism by which HCT occurs will have a profound impact on

35

understanding evolution and applying biotechnology as well as accepting HCT as

36

a natural source of genetic variation.

2 PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1324v1 | CC-BY 4.0 Open Access | rec: 25 Aug 2015, publ: 25 Aug 2015

PrePrints

37

Introduction

38

Conventional wisdom holds that asexual organisms lack a mechanism for generating ge-

39

netic variation. However, evidence of sexual reproduction has not been detected in many

40

important plant pathogens (Masel et al., 1996; Michielse & Rep, 2009). Asexual fungal

41

pathogens are known to have variable electrophoretic karyotypes (Kistler & Miao, 1992).

42

It was shown through pulsed-field gel eletrophoresis that extra nonessential chromosomes

43

(called supernumerary chromosomes) are present only in some individuals of a species

44

(Masel, Irwin & Manners, 1993). It was demonstrated under laboratory conditions that a

45

2-Mb supernumerary chromosome was transferred between two vegetatively incompatible

46

isolates of an asexual fungus (He et al., 1998). A whole-genome comparative study also

47

suggests that supernumerary chromosomes were horizontally acquired (Ma et al., 2010).

48

In addition, the genomes of some asexual fungal pathogens contain lineage-specific (LS)

49

regions that are highly variable among isolates (Klosterman et al., 2011). An analysis of

50

several nonhomologous recombination forms and polymorphic sequence types of each form

51

in LS regions from different isolates suggests that LS sequences were horizontally acquired

52

(Huang, 2014). These results indicate that horizontal transfer generates genetic variation

53

in asexual fungal pathogens, yet the mechanism and extent of transfer are unknown. It is

54

also unclear whether transfer creates not only presence/absence polymorphisms but also

55

single nucleotide polymorphisms (SNPs).

56

Fusarium virguliforme is an economically important fungal pathogen that causes sud-

57

den death syndrome (SDS) in soybean in North and South America (Aoki et al., 2003).

58

Although different F. virguliforme isolates show variation in aggressiveness on soybean

59

plants, studies with various molecular markers detected an extremely low level of genetic

60

variation within F. virguliforme isolates from North and South America (O’Donnell et

61

al., 2010; Mbofung et al., 2012). Moreover, mating experiments with 17 US isolates of

62

F. virguliforme indicated that they all belonged to a single mating type (Covert et al.,

63

2007). A genome assembly of a F. virguliforme isolate was produced recently (Srivastava

64

et al., 2014), and the mating type locus in F. virguliforme and its six close relatives were

65

characterized. A PCR assay based on both mating type sequences revealed that all 129

3 PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1324v1 | CC-BY 4.0 Open Access | rec: 25 Aug 2015, publ: 25 Aug 2015

PrePrints

66

isolates of F. virguliforme in North and South America possessed the MAT1-1 mating type

67

(Hughes et al., 2014). These data suggest that the reproduction mode of F. virguliforme

68

on soybean is asexual. It is unknown how genetic variation in this asexual pathogen is

69

generated for the disease to have reached all major soybean-growing areas in the USA

70

since its first detection in the early 1970s.

71

F. virguliforme is related to the sexual fungal pathogen Nectria haematococca MPVI,

72

which is known to contain supernumerary chromosomes (Miao, Covert & VanEtten, 1991).

73

These supernumerary chromosomes contain genes involved in resistance to plant antimi-

74

crobial compounds and in host-specific pathogenicity (Covert, 1998). Sequences of the

75

N. haematococca MPVI supernumerary chromosomes (Coleman et al., 2009) can be used

76

to determine if their homologs are present in F. virguliforme.

77

F. virguliforme is closely related to five morphologically distinct Fusarium species that

78

cause SDS or bean root rot (BRR): F. azukicola, F. brasiliense, F. cuneirostrum, F.

79

phaseoli and F. tucumaniae (Aoki et al., 2012). F. tucumaniae is the only known sexually

80

reproducing fungus among these species (Covert et al., 2007). In this study, we selected

81

ten isolates of these closely related species — three (F. virguliforme), three (F. tucuma-

82

niae), and one (each of the other four species) — for next-generation genome sequencing

83

and analysis in comparison with the F. virguliforme genome assembly (Srivastava et al.,

84

2014) as a reference (Table 1). Note that including the reference isolate leads to a total

85

of eleven isolates, four of which are F. virguliforme isolates.

86

We compared the genomes of 11 isolates comprising six Fusarium species that cause

87

SDS or BRR, and detected numerous instances of HCT (horizontal chromosome transfer)

88

in supernumerary chromosomes. The genome of the asexual pathogen F. virguliforme

89

was composed of a large core genome and a small supernumerary portion; there was little

90

variation in the core between isolates, but there are a statistically significant number (21

91

standard deviations above the mean) of SNPs in the supernumerary chromosomes between

92

isolates. Supernumerary chromosomes carry reverse transcriptase-related genes (RVT);

93

they were highly variable in length, and the presence of long RVT open reading frames

94

(ORFs) in the supernumerary chromosome was correlated with the presence of two or more

95

chromosome copies with a significant number of SNPs between them. Our results suggest 4

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1324v1 | CC-BY 4.0 Open Access | rec: 25 Aug 2015, publ: 25 Aug 2015

96

that supernumerary chromosomes transfer horizontally via an RNA intermediate; their high

97

SNP and length variation rates were attributed to the high error rates of the RVT enzymes.

98

We report an extensive body of evidence to suggest that the F. virguliforme genome

99

evolved by two mechanisms: duplication-induced mutation for the core and replication via

PrePrints

100

an RNA intermediate for the supernumerary.

101

Materials & Methods

102

Sequence data

103

We selected 10 isolates of six Fusarium species and produced Illumina paired-end reads of

104

102 bp for each of them. We previously produced a genome assembly (NCBI BioProject

105

Accession: PRJNA63281) of isolate F. virguliforme Mont-1 (Srivastava et al., 2014),

106

which was used as a reference genome assembly in this study. The origin, year of collection,

107

and name abbreviation of each of these 11 isolates are in Table 1.

108

Read mapping and SNP detection

109

A SNP between the reference isolate and another isolate (query) has two or more alleles

110

called REF and ALT. The REF allele refers to the allele in the reference and ALT alleles

111

refer to alternate non-reference alleles. A SNP is of type 2 if both the REF allele and the

112

ALT allele are present in the query isolate, and of type 1 if only the ALT allele is present

113

in the query isolate.

114

The sets of Illumina paired-end reads for each query isolate were mapped onto the

115

reference genome assembly with Bowtie2 (Langmead & Salzberg, 2012). The output from

116

Bowtie2 in SAM format was redirected to Samtools (Li et al., 2009) with the view com-

117

mand to produce output in BAM format, which was sorted with the sort command. The

118

sorted output in BAM format was piled up on the reference with the mpileup command.

119

For command options and parameter values, see Huang (2014). The sorted BAM output

120

files for all the isolates along with the reference genome assembly were uploaded into Inte-

5 PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1324v1 | CC-BY 4.0 Open Access | rec: 25 Aug 2015, publ: 25 Aug 2015

121

grative Genomics Viewer (Robinson et al., 2011) for viewing SNPs and presence/absence

122

polymorphisms in each isolate.

PrePrints

123

Assembly of short reads

124

An assembly of paired-end reads for each isolate was performed with an Illumina version

125

of PCAP (Huang et al., 2003) with the following data and options: a pair of mate files

126

in fastq format; a minimum insert length of 100 bp and a maximum insert length of 700

127

bp; an average insert length of 400 bp with a standard deviation of 100. The minimum

128

length of overlaps with no base mismatch match was set to 84 bp, and that of overlaps

129

with up to three base mismatches was set to 90 bp. No overlap with more than three

130

base mismatches was accepted. Each data set was of size up to 49 Gb, and each assembly

131

could be produced in a day on a processor with 100 Gb of main memory. One feature of

132

the program is that it is conservative in joining reads into contigs by avoiding reads in the

133

overlap between two potential contigs that can not be merged into one.

134

Assembly mapping

135

Each assembly of Illumina reads was mapped to the reference genome assembly by using

136

BWA-MEM (Li, 2013) with the default options. The output from BWA-MEM in SAM

137

format was redirected to Samtools (Li et al., 2009) with the view command and -bS

138

options to produce output in BAM format, which was sorted with the sort command. An

139

output file of SNPs and indels in VCF format was produced in the same way as in the

140

read mapping. The assembly mapping was useful in finding long indels between contigs

141

in the reference assembly and query assembly, respectively. The coordinates of an indel

142

between two contigs were found by computing an alignment of the contigs with GAP3

143

(Huang & Chao, 2003).

144

Gene identification and functional annotation

145

Ab initio gene identification in a Fusarium genomic sequence was performed using Augus-

146

tus (Stanke & Waack, 2003) with training data from F. graminearum. A non-redundant 6

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1324v1 | CC-BY 4.0 Open Access | rec: 25 Aug 2015, publ: 25 Aug 2015

147

protein sequence database at National Center for Biotechnology Information was searched

148

using Blastx (Gish & States, 1993) with a genomic coding region as a query to find a set

149

of protein database sequences that are most similar to the coding region. The gene struc-

150

ture from Augustus was refined by using AAT (Huang et al., 1997) on the set of protein

151

database sequences. Functional annotation of genes was performed using the HMMER

152

web server (Finn, Clements & Eddy, 2011).

PrePrints

153

Phylogenetic analysis

154

A maximum-likelihood tree of the 11 SDS/BRR Fusarium isolates was inferred from

155

genome-wide SNP data. The data were produced by mapping reads from each of 10

156

of the 11 isolates onto a genome assembly of the reference Fv Mont-1. A covered SNP

157

position is a position of the reference that was sufficiently covered by reads from each

158

isolate and had an alternative allele (a SNP) in the read coverage of this position from

159

one of the 10 isolates. A total of 297,076 covered SNP positions were aligned in the

160

11 isolates. The multiple sequence alignment was analyzed to infer the tree with 200

161

bootstrap samples.

162

Results

163

Rapid evolution in a small portion of the F. virguliforme genome

164

We mapped short reads from each of the ten isolates onto a 50.5-Mb genome assembly

165

of isolate Fv Mont-1 (Srivastava et al., 2014) as a reference. The length of the reference

166

covered by reads from the isolate and the distribution of SNPs between the reference

167

and the isolate are given in Table 2. Table 2 reveals significant variation in evolutionary

168

rate among the isolates. First, the four F. virguliforme isolates possessed a low genome-

169

wide SNP rate of less than 1 in 10,000 bp, which is consistent with an asexual mode of

170

reproduction. Isolate Fv 34551 collected in South America in 2002 was closer to Fv Mont-

171

1 collected in the USA in 1991 than the other two F. virguliforme isolates collected in

172

the USA. Second, the genome-wide SNP rate of about 1 in 200 bp between the reference 7

PeerJ PrePrints | https://dx.doi.org/10.7287/peerj.preprints.1324v1 | CC-BY 4.0 Open Access | rec: 25 Aug 2015, publ: 25 Aug 2015

PrePrints

173

and each non-F. virguliforme isolate was at least 58 times higher than that between

174

the reference and each F. virguliforme isolate, indicating a significantly higher level of

175

polymorphism and suggesting a much longer divergence time between F. virguliforme and

176

the other species. Third, the genome-wide SNP rate of 1 in 200 bp between the reference

177

and each non-F. virguliforme isolate was not high enough to explain why at least 10 Mb

178

of the reference genome was covered by reads from every F. virguliforme isolate, but not

179

by reads from any non-F. virguliforme isolate.

180

To shed light on the last observation, we selected all of the contigs that were at least

181

1 kb in the reference assembly and calculated the total number of contig bases covered by

182

reads from Fc 31157 as well as that not covered by reads from this isolate. The size of the

183

covered portion was 39.5 Mb; that of the uncovered portion was 10.9 Mb. The uncovered

184

portion was A+T rich (68%), whereas the covered portion was A+T poor (45%). The

185

duplicated content of the uncovered portion was 70%, with 48% made up of sequences

186

with copy numbers above 20. In sharp contrast, the duplicated content of the covered

187

portion was 3.8%, with 0.56% made up of 20-plus-copy sequences. These results indicate

188

that since the divergence between F. virguliforme and F. cuneirostrum, the uncovered

189

portion of the F. virguliforme genome evolved much faster in association with duplication

190

and C-to-T/G-to-A mutation. For example, a maximum likelihood tree of 13 duplicated

191

sequences 3,772 bp in length from the genome assembly of Fv Mont-1 showed that the

192

more recently duplicated sequences have a higher A+T content (Fig. 1).

193

Although the genome-wide SNP rate between the reference F. virguliforme isolate and

194

each of the other three F. virguliforme isolates was at most 0.00007, we found high levels

195

of variation among the four F. virguliforme isolates in a small portion (