Multiple polyadenylated RNA viruses detected in pooled cultivated ...

Arch Virol (2012) 157:271–284 DOI 10.1007/s00705-011-1166-x

ORIGINAL ARTICLE

Multiple polyadenylated RNA viruses detected in pooled cultivated and wild plant samples Stephen J. Wylie • Hao Luo • Hua Li Michael G. K. Jones

•

Received: 16 August 2011 / Accepted: 28 October 2011 / Published online: 11 November 2011 Ó Springer-Verlag 2011

Abstract RNA extracted from 120 leaf specimens from 17 plant species was pooled, and polyadenylated RNA species were sequenced together without barcoding in one lane using massively parallel sequencing technology. After analysis, complete or partial genome sequences representing 20 virus isolates of 16 polyadenylated RNA species were identified. In three cases, 2-3 distinct isolates of a virus species co-infected the same plant. Twelve of the viruses identified were described previously and belonged to the genera Potyvirus, Nepovirus, Allexivirus, and Carlavirus. Four were unknown and are proposed as members of the genera Potyvirus, Sadwavirus, and Trichovirus. Virus sequences were subsequently matched to original host plants using RT-PCR assays.

Introduction One of the most fundamental and conceptually straightforward steps in studying any biological system involves describing the types and abundance of organisms present [17]. Viruses and virus-like elements are abundant in plants, with reports of *60% of plants in a richly diverse Costa Rican rainforest infected [40, 49]. This figure alone suggests that the *1,000 plant viruses currently

Electronic supplementary material The online version of this article (doi:10.1007/s00705-011-1166-x) contains supplementary material, which is available to authorized users. S. J. Wylie (&) H. Luo H. Li M. G. K. Jones Plant Virology Group, Western Australian State Agricultural Biotechnology Centre, School of Biological Sciences and Biotechnology, Murdoch University, Perth, WA 6150, Australia e-mail: [email protected]

recognized globally by the International Committee on the Taxonomy of Viruses is a gross under-representation of the true diversity. A major limitation to discovery of new viruses is that most commonly used assays, those based on antiserum affinity (e.g. ELISA), primer-directed DNA amplification (PCR), and DNA hybridization (e.g. microarray), are biased towards detection of known viruses and their close relatives. Identifying undescribed or unexpected viruses can be notoriously difficult. For example, in the UK, an average of only one new virus per year has been identified in the past three decades [9]. Unbiased (generic) virus assays include double-stranded RNA (dsRNA) extraction and analysis [7] to identify RNA viruses, rollingcircle amplification [22] to identify circular DNA viruses, enrichment of viral nucleic acids by purification of viruslike particles [33], and SDS-PAGE electrophoresis coupled with mass spectrometry to identify viral proteins expressed in plants [8, 29]. A remarkable recent advance in plant virus discovery has been the utilization of massively parallel pyrosequencing (next-generation sequencing, ‘deep’ sequencing), which is capable of yielding megabases to gigabases of sequence information, coupled with bioinformatics [e.g. 3, 13, 15, 40, 50–54]. This approach has been used to identify viruses indirectly from homologous short interfering RNAs produced by plants in response to virus infection [e.g. 25, 50, 54], and directly from viral genomes present in grapevines [4, 13], Liatris spicata and tomato [3], and wild plants [40, 51, 52]. We describe here the use of a massively parallel sequencing approach whereby polyadenylated plant RNA from multiple plants was pooled and sequenced together before the output was analysed for the presence of viral genomes. This research represents part of a project to describe the ecological roles viruses play in the indigenous flora of the south-west Australian floristic region.

123

272

Materials and methods Plant material Leaf material (approximately 2 g/plant) was collected from 117 specimens of wild plants representing 14 indigenous species: Amphipogon turbinatus R. Br. (1 specimen); Anigozanthos humilis Lindl. Cats Paw (13); Austrostipa elegantissima Labill., Feather Spear-Grass (1); A. flavescens Labill., Coast Spear-Grass (2); A. compressa R. Br. (3); Caesia micrantha Lindl., Pale Grass Lily (2); Caladenia flava R. Br., Cowslip Orchid (17); Chamaescilla corymbosa (R. Br.) Benth., Blue Squill (13); Dichopogon capillipes (Endl.) Brittan, Chocolate Lily (2); Microlaena stipoides (Labill.) R. Br., Weeping Grass (1); Scaevola calliptera Benth., Royal Robe (23); Thelymitra crinita Lindl., Blue Lady Orchid (18), Trichocline spathulata (DC) J.H. Willis, Native Gerbera (21) from parks and roadside verges within the greater Perth City region, southwestern Australia. As positive controls, one specimen from each of three cultivated plants (Allium sativum L. Garlic; Lilium longiflorum Thunb., Easter Lily, November Lily; Iris xiphium Desf. Spanish Iris) was included. Each control plant showed chlorotic streaking and mottling on young leaves, symptoms commonly associated with virus infections. A leaf sample of approximately 1 g from each plant was lyphilised and stored individually in an airtight tube at -20°C. RNA extraction, cDNA synthesis, sequencing Fresh leaf samples were combined into 12 groups of 10 leaves per group, representing single or multiple species. Leaves were held together in a bunch and sliced with a scalpel blade so that 100 mg of tissue, representing approximately equal amounts of tissue from each leaf in the group, was removed. RNA was extracted using a Plant RNeasy (QIAGEN) column and quantified using a NanoDrop ND-1,000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA), and fragmented by nebulization. RNA (2 lg) from each group was pooled into a single tube. Synthesis of cDNA was from total RNA using oligo-d (T) primers, followed by adaptor ligation. Library construction, amplification, and sequencing of single ends (78-nucleotide [nt] reads) using Illumina GAIIx technology was done by Macrogen Inc (Seoul).

S. J. Wylie et al.

read length (38 nt); maximum gaps per read, 10%; minimum overlap identity, 90%. Contigs were sorted according to length, and those less than 1 kb in length were removed. Batches of contigs were subjected to BlastN analysis [5] against the GenBank nucleotide database. Contigs identified with sequence identity to known viruses and those without identity to known sequence of any origin were pooled and subjected to a further round of de novo assembly in Geneious Pro. Consensus sequences generated from the second round of de novo assembly were subjected to BlastN and BlastX analyses against GenBank nucleotide and protein databases. Putative virus-derived consensus sequences were used as scaffolds (reference sequences) against which to reassemble the entire dataset of reads. Putative virus sequences that were identified were edited manually to remove gaps and aberrant reads, and the number of reads corresponding to them was recorded. Open reading frames (ORF) and identities of deduced proteins, mature peptides, and domains encoded by them were predicted within Geneious Pro, the NCBI Conserved Domain Database (CDD), InterProScan (http://www.ebi. ac.uk/Tools/pfa/iprscan), and by identity after alignment with characterised virus sequences in Geneious Pro. Phylogenies were determined using MEGA5 [45] after global alignment of sequences in Geneious Pro with a cost matrix of 65%, a gap open penalty of 12 and a gap extension penalty of 3. Alignments of related virus isolates were trimmed to the length of the shortest sequence and then scanned for evidence of recombination events using DualBrothers Recombination Detection tool within Geneious Pro. Pairs of oligonucleotide primers for amplification of the sequence of each distinct virus isolate found were designed using Geneious Pro Primers tool. Primers were 20-22 nt in length, had a Tm of 50-60°C, and were designed to generate PCR products of 300-500 nt. Initially, groups were screened by RT-PCR using each primer pair, then, when the virus was detected, individual plants within the group were screened. Oligo-d (T) primers were used to prime cDNA synthesis with Superscript III reverse transcriptase (Invitrogen). PCR was carried out using GoTaqÒ Hotstart master mix (Promega). Cycling conditions were 95°C for an initial denaturation of 5 min, followed by 95°C, 10 s; 50-60°C, 30 s; 69°C, 30 s for 35 cycles.

Results Sequence analysis Contig assembly and sequence analysis De novo assembly of contigs was done using the short read assembler Velvet 0.7.31 [56] and the Assembly tool within Geneious Pro v5.4.3 [16]. Parameters for the assembly of contigs were as follows: minimum overlap of reads, 50% of

123

A dataset of 35.29 million reads, each 78 nt in length, was obtained after sequencing. De novo assembly of the whole dataset within Velvet resulted in 3,712 contigs over

Sequencing of RNA viruses in plant samples

1 kb in length, and within Geneious Pro, 4,122 contigs over 1 kb in length. The five longest contigs generated were 9,377; 8,779; 5,466; 4,679 and 4,463 nt. After BlastN analysis, 119 contigs that had sequence identity to known viruses or did not match any known sequences were identified and subjected to de novo assembly with Geneious Pro, which resulted in 37 contigs. The five largest of these were 9,377, 8,779, 8,611, 8,601 and 8,441 nt in length. After BlastN and BlastX analysis, 17 contigs were discarded because they were of non-viral origin. The 20 remaining putative viral contigs were each used as a reference against which the entire data set of 35.29 million reads was reassembled. In some cases, the length of the reference sequence could be increased when the assembled reads extended beyond the 50 or 30 terminus. In these cases, the entire dataset was again assembled to the extended reference sequence, and this process was repeated until the sequence could be extended no further. Putative virus sequences were then edited manually as necessary, and any aberrant reads were removed from the assembly. After editing, the sequence identity of assembled reads to reference sequences was 96.1-99.8%. Of the 35.29 million reads obtained, 191,642 reads (0.54%) were identified as being of viral origin. Sequences representing members of 16 virus species and consisting of 20 genetically distinct isolates were identified (Table 1). The majority of non-viral reads were of plant origin (35.08 million), and 17 contigs above 1 kb in length and consisting of 14,972 reads were of unknown origin. All contigs of unknown origin were subjected to Blastn and BlastX (translated in six reading frames) analysis in batches, and none showed nt and/or aa identity to known viruses. Putative virus sequences ranged from 2.1 kb to 9.4 kb and represented 20% to 100% of complete viral genome sequences (Table 1). Greater than 50% of the genome sequence was determined for 17 virus isolates. Mean sequence coverage was 21- to 393-fold (Table 1). Orientation of the genome was determined by alignment of the sequences against template sequences of known orientation. In all cases, sequence coverage was greatest at the 30 end of the sequence and least at the 50 end. Viral sequences were aligned and scanned for recombination events, but none were identified. Primers were designed to distinguish each virus isolate found (Online Resource 1), and these were used in RT-PCR assays to identify the original host plants(s). First, the RNA pools derived from each group were screened by RT-PCR, and individual lyphilised leaves from each group were then screened, where appropriate. PCR products were sequenced using Sanger sequencing to confirm they were of viral origin (data not shown). Sequences of viral origin were derived from nine individual plants of six species (Table 1).

273

Viruses from garlic The single garlic plant tested was co-infected with members of eight virus species: Garlic common latent virus (GCLV), Shallot latent virus (SLV), Leek yellow stripe virus (LYSV), Onion yellow dwarf virus (OYDV), Garlic virus A (GarVA), Garlic virus B (GarVB), Garlic virus C (GarVC), and Garlic virus D (GarVD). Of the 86,885 sequence reads that mapped to viruses infecting the garlic plant, 74.4% mapped to the carlaviruses GCLV (50%) and SLV (24.4%), whose complete or near-complete genome sequences were obtained. Isolates of the potyviruses LYSV and OYDV accounted for 15.1% of the reads obtained. Five distinct potyvirus isolates co-infected the plant; there were two isolates of LYSV, which together accounted for 9.3% of the reads, and three isolates of OYDV, which accounted for 5.8% of the reads. Approximately equal numbers of reads corresponded to each of the two LYSV isolates, which shared 83% nt identity with one another. In contrast, isolate OYDV-Bate6 was represented by over four times more reads than either of the other two OYDV isolates, and more than three times more of its genome sequence was determined (Table 1). The CP genes of the three isolates of OYDV shared 82-84% sequence identity with one another. The remainder of the reads (10.3%) mapped to four species of allexivirus, GarVA, GarVB, GarVC, and GarVD, each of which was represented by one isolate (Table 1). Viruses from Spanish iris The single specimen of Spanish iris tested was co-infected with members of two poytvirus species, Iris mild mosaic virus (IMMV) and Ornithogalum mosaic virus (OrMV). Of the 47,681 reads that mapped to viruses in this plant, 85.2% were from two isolates of IMMV, and the remainder (14.8%) from one isolate of OrMV. The two IMMV isolates shared 83% nt identity across their partial genomes (Table 1). IMMV isolate WA-1 was represented by approximately three times more reads than isolate Bate2, and approximately one-third more of its genome sequence was determined (Table 1). Viruses from Easter lily The Easter lily plant tested was co-infected with an isolate of the potyvirus species Lily mottle virus (LMoV), an isolate of a potentially novel potyvirus provisionally named lily virus A (LVA) (see below), and an isolate of a nepovirus species, Cycas necrotic stunt virus (CNSV). Of the 40,820 reads that mapped to viruses infecting the lily plant, 87.0% mapped to the LMoV isolate, 5.1% to the LVA isolate, and 7.9% to the CNSV isolate. The bipartite

123

123

KP2

Lily1

WA-1

Bate1

Bate1

Bate1

Bate1

WA-1

Bate2

Bate3

Bate4

Bate5

Bate6

Cycas necrotic stunt virus (CNSV)

Garlic common latent virus (GCLV)

Garlic virus A (GarVA)

Garlic virus B (GarVB)

Garlic virus C (GarVC)

Garlic virus D (GarVD)

Iris mild mosaic virus (IMMV)

Iris mild mosaic virus (IMMV)

Leek yellow stripe virus (LYSV)

Leek yellow stripe virus (LYSV)

Lily mottle virus (LMoV)

Onion yellow dwarf virus (OYDV)

4,435 (60%)

WA-3

Chocolate lily virus Aa (CLVA)

9,433 (96%)

KP1

Blue squill virus Aa (BSVA) Scaevola virus Aa (ScVA)

RNA2: 1,547 (0.004%) 29X

RNA2: 4,206 (90%)

7,207 (68%)

8,465 (88%)

5,264 (52%)

5,563 (55%)

6,251 (62%)

8,599 (86%)

2,880 (35%)

7,708 (92%)

4,923 (56%)

4,363 (50%)

3,544 (0.010%) 38X

35,535 (0.100%) 327X

4,028 (0.011%) 60X

4,088 (0.011%) 57X

10,198 (0.028%) 127X

30,431 (0.086%) 276X

704 (0.002%) 24X

3,736 (0.010%) 38X

2,314 (0.006%) 37X

2,199 (0.006%) 39X

43,519 (0.123%) 393X

RNA2: 6,695 (0.018%) 110X RNA1: 1,671 (0.004%) 21X

RNA2: 4,733 ([90%) RNA1: 6,259 (83%)

8,638 (95%)

RNA1: 2,884 (0.008%) 37X

1,291 (0.003%) 22X

6,090 (0.017%) 50X

2,067 (0.005%) 33X

No. of 78-nt reads assembled in contigc (as % of total reads), mean genome coverage

RNA1: 6,134 ([90%)

4,866 (50%)

Bate1

Lily virus Aa (LVA)

No. nucleotides (estimated % of genomeb)

Isolate name

Virus name

Potyviridae, Potyvirus






Alphaflexiviridae Allexivirus




Betaflexiviridae, Carlavirus

Secoviridae, Nepovirus

Secoviridae, unassignedd

Potyviridae, Poytyvirusd Betaflexiviridae, Trichovirusd

Potyviridae, Potyvirusd

Classification. Family, Genus

OYDV, AJ510223 (86/100) China

LMoV, AJ564636 (100/100) China

LYSV, AJ307057 (81/100) China

LYSV, AJ307057 (88/100) China

IMMV, DQ436919 (98/20) New Zealand

IMMV, DQ436918 (98/20) New Zealand

GarVD, AB010303 (87/100) Japan

GarVC, AB010302 (75/93) Japan

GarVB, AB010301 (88/100) Japan

GarVA, AB010300 (83/100) Japan

(97/16) Germany

GCLV, AB004805

(89/100) Japan

CNSV, RNA1 AB073147; RNA2 AB073148

no significant nt identity

HarMVe, HQ161080 (73/92) Australia APCLSVe, AY713380 (66/36) Italy

LMoV, AB570195 (70/100) Japan

Closest known relative, GenBank accession code (% nt identity to new isolate/ % coverage). Country of collection

Table 1 Virus sequences identified and their GenBank accession codes, origins, genome information, and closest sequence matches

A. sativum (1)

L. longiflorum (1)

A. sativum (1)

A. sativum (1)

I. xiphium (1)

Iris xiphium (1)

A. sativum (1)

A. sativum (1)

A. sativum (1)

A. sativum (1)

Allium sativum (1)

L. longiflorum (1)

Dichopogon capillipes (2)

Chamaescilla corymbosa (3) Scaevola calliptera (1)

Lilium longiflorum (1)

Host species (No. plants in which virus was detected)

Bateman

Bateman

Bateman

Bateman

Bateman

Bateman

Bateman

Bateman

Bateman

Bateman

Bateman

Bateman

Kings Park

Kings Park Wanneroo

Bateman

Collection site in Perth

JN127342

JN127341

JN127340

JN127339

JN127338

JF320812

JN019815

JN019814

JN019813

JN019812

JF320810

JN127337

JN127336

JN052074

JN052073

JN127346

JN052072

JN127335

GenBank accession code

274 S. J. Wylie et al.

Bate8

Bate9

WA-1


Ornithogalum mosaic virus (OrMV)

Shallot latent virus (SLV)

e

d

c

b

8,371 (100%)

8,779 (87%)

2,333 (22%)

2,113 (20%)

No. nucleotides (estimated % of genomeb)

21,239 (0.060%) 198X

7,052 (0.019%) 63X

723 (0.002%) 24X

791 (0.002%) 29X

No. of 78-nt reads assembled in contigc (as % of total reads), mean genome coverage

Betaflexiviridae, Carlavirus




Classification. Family, Genus

APCLSV, apricot pseudo-chlorotic leaf spot virus; HarMV, hardenbergia mosaic virus

Proposed classification of new virus identified

Of a total of 35,289,493 reads

Estimated from genome length (nt) of closest known relative

Proposed name of new virus identified

Bate7


a

Isolate name

Virus name

Table 1 continued

SLV, AJ292226 (85/100) China

OrMV, D00615 (82/41) South Africa

OYDV, AB219834 (77/100) Japan

OYDV, GQ475389 (89/100) Italy

Closest known relative, GenBank accession code (% nt identity to new isolate/ % coverage). Country of collection

A. sativum (1)

I. xiphium (1)

A. sativum (1)

A. sativum (1)

Host species (No. plants in which virus was detected)

Bateman

Bateman

Bateman

Bateman

Collection site in Perth

JF320811

JN127345

JN127344

JN127343

GenBank accession code

Sequencing of RNA viruses in plant samples 275

123

276

S. J. Wylie et al.

95 64 100

a

LMoV ML61 LMoV Sb LMoV SMi

100

LMoV DL

78

LVA

84

LYSV BYMV PVY PaVY

64

TuMV TEV BCMV PWV

100

HarMV 57.1

100 100

HarMV 57.2

0.05

b

Fig. 1 a Phylogenetic tree of the deduced amino acid sequence (1,550 residues) of the partial genome of lily virus A (LVA) (JN127335) (boxed), consisting of the cytoplasmic inclusion (partial), six-kilodalton 2, genome-linked protein (VPg), nuclear inclusion A protease, nuclear inclusion B replicase, and coat protein, with homologous regions of isolates of LMoV (lily mottle virus, AB570195 (ML61), NC_005288 (Sb), AM048875 (SMi), HM222521 (DL)); LYSV (leek yellow stripe virus, NC_004011); BYMV (bean yellow mosaic virus, NC_003492); PVY (potato virus Y, NC_001616); PaVY (panax virus Y, NC_014252); TuMV (turnip mosaic virus, NC_002509); TEV (tobacco etch virus, NC_001555); BCMV (bean common mosaic virus, NC_003397); PWV (passion fruit woodiness

virus, NC_014790); HarMV (hardenbergia mosaic virus, NC_015394, HQ161080). Bootstrap values [ 60% are shown. Evolutionary distances were computed using the maximum composite likelihood method. Trees were drawn to scale, with evolutionary distance used to infer the branch length in nucleotide substitutions per site. b Proposed genome organisation of lily virus A, showing gene positions. Numbers indicate nucleotide positions at the 30 end of each gene. The CI gene is partial at the 50 end. CI, cytoplasmic inclusion protein; 6K2, 6-kilodalton protein 2; NIa-VPg, nuclear inclusion A-genomelinked protein; NIa-Pro, nuclear inclusion A-protease; NIb, nuclear inclusion B; CP, coat protein; UTR, untranslated region. Not drawn to scale

genome of CNSV was represented by an approximately equal number of reads representing RNA1 and RNA2 (Table 1).

it is estimated that the sequence obtained represents approximately half of the virus genome. The genome organisation appears to be typical of that of other potyviruses. The partial genome obtained encodes a polyprotein that consists of the C-terminal part of the cytoplasmic inclusion (CI) protein, a 6-kilodalton protein 2 (6K2), a genome-linked protein (NIa-VPg), a nuclear inclusion A protease (NIa-Pro), a nuclear inclusion B replicase (NIb), a coat protein (CP), and a 30 untranslated region (30 UTR) (Fig. 1b). Protease cleavage recognition sites typical of potyviruses are present at the junctions of predicted protein products (Online Resource 2). Evidence that this virus is distinct from LMoV is that the CP gene shares 69-70% nt identity with LMoV isolates, which is below the species demarcation point of 76-77% recommended by Adams

New virus in lily The sequence of 4,866 nt shared greatest nt identity (70-71%) with LMoV isolate Bate5, with which it co-infected the lily plant tested, and with isolates identified from Lilium species in Japan (e.g. AB570195) and China (NC_005288, AM048875, HM222521), and less than 60% identity with sequences of other known potyviruses (Fig. 1a). The deduced aa sequence of 1,551 aa shared 76-77% identity with the same LMoV isolates. Based on the known genome size of LMoV isolates (9,644 - 9,648 nt),

123


et al. [2] for potyviruses. The CP of LVA is of 267 aa residues, while those from LMoV isolates are 275-276 aa residues. This virus is clearly closely aligned to members of the species Lily mottle virus, but based on the molecular evidence currently available, and until further molecular and biological data on this virus can be gathered, we propose that this sequence be considered to be from a member of a distinct species within the genus Potyvirus. The proposed virus species was provisionally named Lily virus A (LVA), isolate Bate1, and its partial sequence has been assigned GenBank accession JN127335. Sequences with features resembling those of three plant viruses were found in 1-3 specimens each of the Australian indigenous plant species D. capillipes (Chocolate Lily), C. corymbosa (Blue Squill), and S. calliptera (Royal Robe) (Table 1). Based on identity to known viral genome sequences, it was determined that complete or nearly complete genome sequences were obtained for two novel viruses from chocolate lily and blue squill, and approximately half of the genome was obtained of a virus from royal robe. The deduced genome organisation, phylogeny, and proposed taxonomic grouping of each novel virus are presented below. Virus from chocolate lily Two sequences with low identity to known viruses were identified from two asymptomatic specimens of the wild monocotyledonous plant D. capillipes (Chocolate Lily) (Table 1). A single open reading frame (ORF) was predicted within each of the two sequences after analysis with BlastX, the conserved domain database (CDD) at NCBI, and InterProScan. Deduced protein products of both ORFs shared identity with proteins of some plant viruses. The predicted protein encoded by the largest ORF shared highest identity (32%) with replicase proteins of two unassigned members of the family Secoviridae, strawberry mottle virus (SMoV) and black raspberry necrosis virus (BRNV), 25% with satsuma dwarf virus (SDV) (genus Sadwavirus), and lesser identities with members of other genera within the family (Fig. 2a). The protein encoded by the smaller ORF shared 22-24% identity with movement proteins (MP) and CPs of SMoV, BRNV, and SDV. Sequence alignment with these viruses showed that the two sequences obtained represented the complete or nearcomplete RNAs 1 and 2 of the genome of one virus. The new virus was provisionally named chocolate lily virus A (CLVA) isolate KP2. It was assigned GenBank accession numbers JN052073 (RNA1) and JN052074 (RNA2). The CLVA genome consists of two single-stranded, positive-sense RNAs, each of which is translated as one polyprotein. Almost three times more reads representing RNA2 were detected than for RNA1 (Table 1). The

277

sequence obtained of RNA1 was 6,134 nt in length with a 50 UTR of 200 nt and a 30 UTR of 282 nt. The sequence obtained of RNA2 was 4,733 nt long with a 50 UTR of 252 nt and a 30 UTR of 335 nt. The 50 UTRs of RNA1 and RNA2 shared a 38-nt region of sequence identity, and similarly, the 30 UTRs shared an 86-nt region of identity. The single ORF of RNA1 begins at one of two potential in-frame initiation codons, AUG at either nt 193–195 or at nt 202–204, and terminating at UGA (nt 5,851–5,853). The sequence context of the first potential initiation codon was UCCAAUGUCC, and the second was CUUAAUGGAA. Neither context is ideal for translation initiation in plants, but the second is more favorable, with a G at position ?4 [30], so this was considered to be the most likely start of translation. From this codon, the ORF encodes a polyprotein of molecular mass 211.8 kDa (1,883 aa), which contains domains homologous to a putative protease cofactor (Pro-C), helicase, viral genome-linked protein (VPg), protease (Pro), and RNA-dependent RNA polymerase (RdRp) (Online Resource 3), all of which were identified using CDD and InterProScan and through homology with other viruses (Fig. 2b). The putative protease cleavage site between each mature peptide in the polyprotein encoded by RNA1 was the dipeptide Q/G (Online Resource 3). A conserved amino acid (aa) motif (Fx27 Wx11 Lx21 LxE) found in the Pro-C domain of several members of the (now redundant) family Comoviridae [27, 39] was present as Fx29 Wx11 Lx21 LxH at residues 133–199. Conserved helicase motifs A (GKS) and B (DE) were present at aa residues 483–485 and 529–530, respectively. The VPg motif E/Dx1-3 Yx3 Nx4-5 R described for members of the family Comoviridae [32] was not found. The cysteine protease triad H (aa 1,013), D (aa 1,132) and C (aa 1,145) was present within the conserved motif Hxn E/Dxn CGxn Gxn Hxn G [19, 41]. The conserved RdRp core motif (S/TGx3 Tx3 NS/Tx22 GDD) [31] was present as SGx3 Tx3 NSx37 GDD at aa residues 1,533–1,583. RNA2 encoded a single ORF beginning at AUG (nt 253–255) and terminating at UAG (nt 4,397–4,398), generating a polyprotein of 1,381 aa with a molecular mass of 153.3 kDa. It encodes a putative movement protein (MP) and coat protein (CP). The context of the first initiation codon (nt 103–105), CACAAUGTCC, was in a favourable but not excellent context, with an A at position -3 and a C at position ?6 [30]. The putative MP had the conserved motif (LxxPxL) described for members of the genus Nepovirus [35], family Secoviridae, at aa residues 89-94 (LFLPFL). The putative cleavage site between the MP and the CP is a S/G dipeptide (Online Resource 3). It is unclear whether the CP is processed as a single domain or two, as in some secoviruses. The N-terminal region of the CP shared 19-20% identity with the large CP subunit (CPL) of SMoV and BRNV. Alignment with them suggests that the

123

278

S. J. Wylie et al.

a

Fabavirus

Comovirus

Sadwavirus unassigned

b

Fig. 2 a Phylogenetic tree showing the deduced amino acid sequence (1,883 residues) of the partial replicase gene (RNA1) of chocolate lily virus A (CLVA) (boxed) (JN052073) showing its position with members of the family Secoviridae. Genus names are listed on the right. MMMV (Mikania micrantha mosaic virus, NC_011190); BBWV-1 (broadbean wilt virus 1, AY781171); BBWV-2 (broad bean wilt virus 2, AF225953); RCMV (red clover mottle virus, NC_003741); RaMV (radish mosaic virus, NC_010709); TuRSV (turnip ringspot virus, NC_013218); SDV (satsuma dwarf virus, NC_003785); SMoV (strawberry mottle virus, NC_003445); BRNV (black raspberry necrosis virus, NC_008182). Bootstrap values [ 55%

are shown. Evolutionary distances were computed using the maximum composite likelihood method. Trees are drawn to scale, with evolutionary distance used to infer the branch length in nucleotide substitutions per site. b Proposed genome organisation of chocolate lily virus A, RNA1 (above) and RNA 2 (below) showing domains (dotted lines) and mature peptides (solid lines). Pro-C, protease cofactor; Hel, helicase; Pro, protease; RdRp, RNA-dependent RNA polymerase; MP, movement protein; CP, coat protein; UTR, untranslated region. Numbers indicate nucleotide position at the 30 end of each domain or gene. Not drawn to scale

extent of the CPL in CLVA is nt 1,294-3,975, encoding a peptide of 97.2 kDa (894 aa). The small CP subunits (CPS) of SMoV and BRNV did not share identity with the C-terminal region of the CP of CLVA. If CLVA CPS exists, it possibly extends over nt 3,976-4,398 to encode a peptide of 16.8 kDa (141 aa). The predicted dipeptide cleavage site between CPL and the putative CPS is S/V (Online Resource 3). CLVA is clearly aligned to members of the family Secoviridae [42] in that it has a bipartite genome, the ORFs encode polyproteins, and it shares the Hel-Pro-RdRp organisation on RNA1 and the MP-CP organisation of RNA2. Less clear is whether the CP is cleaved into two subunits as it is in BRNV and SMoV. More specifically, the

CLVA genome organisation and sequence most closely match those of BRNV and SMoV, unassigned members of the family, and also SDV, a member of the genus Sadwavirus [28]. Criteria for distinguishing sadwaviruses from other secoviruses are transmission by aphids or longidorid nematodes, icosahedral, non-enveloped virions about 30 nm in diameter, two distinct CP subunits, a bipartite genome, and a replication gene block typical of picornalike viruses [28]. Information gained here for CLVA can confirm only the last two of these criteria, and the CP is possibly processed into two subunits. Because its CP sequence shares only 19-20% identity with members of other sadwavirus species, it is therefore well below the demarcation line of 80% distinguishing sadwavirus species

123


279

a

RSPaV

89

Foveavirus

ApLV

100

CVNV 100

LNRV

99

Carlavirus

CNRMV

100

AOPRV

100

unassigned

BanMMV CVA

Capillovirus unassigned

HarVA

75

DMV

Citrivirus FLV-1

97

GBINV ScVA

99

Trichovirus CMLV

60

ACLSV

100 94

APCLSV

0.1

b

Fig. 3 a Phylogenetic tree showing the placement of scaevola virus A (ScVA) (boxed) (JN127346) within the family Betaflexiviridae, genus Trichovirus. The deduced amino acid sequence (916 residues) of the replicase gene of ScVA was compared with the corresponding region of some other members of the family. RSPaV (rupestris stem pittingassociated virus, NC_001948); APLV (apricot latent virus, NC_014821); CVNV (coleus vein necrosis virus, NC_009764); LNRV (ligustrum necrotic ringspot virus, NC_010305); CNRMV (cherry necrotic rusty mottle virus, NC_002468); AOPRV (African oil palm ringspot virus, NC_012519); BanMMV (banana mild mosaic virus, NC_002729); CVA (cherry virus A, NC_003689); HarVA (hardenbergia virus A, HQ241409); DMV (dweet mottle virus, FJ009367) (syn citrus leaf blotch virus); FLV-1 (fig latent virus 1, FN377573); GBINV (grapevine berry inner necrosis virus, NC_015220);

CMLV (cherry mottle leaf virus, NC_002500); ACLSV (apple chlorotic leaf spot virus, NC_001409); APCLSV (apricot pseudochlorotic leaf spot virus, NC_006946). Bootstrap values [ 60% are shown. Evolutionary distances were computed using the maximum composite likelihood method. Trees are drawn to scale, with evolutionary distance used to infer the branch length in nucleotide substitutions per site. b Proposed genome organisation of scaevola virus A showing positions of mature peptides (solid lines) and domains (dotted lines) in ORF1 (replicase partial at 50 end with domains Hel, helicase and RdRp, RNA-dependent RNA polymerase), ORF2 (MP, movement protein), and ORF3 (CP, coat protein), and 30 UTR, untranslated region. Numbers indicate the nucleotide position at the 30 end of each domain or gene. Not drawn to scale

[28]. Like both BRNV and SMoV, CLVA has regions of identity in the 30 UTRs of its RNA1 and RNA2. Based on the evidence presented, we propose that chocolate lily virus A be considered as an unassigned member of the family Secoviridae.

BlastP, the sequence most closely matched the replicase, MP and CP gene and protein sequences of members of the genus Trichovirus, family Betaflexiviridae. Specifically, the replicase protein sequence shared 42-50% identity with isolates of the species Cherry mottle leaf virus (CMLV), Apple chlorotic leaf spot virus (ACLSV), Apricot pseudochlorotic leaf spot virus (APCLSV), Grapevine berry inner necrosis virus (GBINV), and Fig latent virus 1 (FLV-1), all tentative or definitive species of the genus Trichovirus, family Betaflexiviridae [10] (Fig. 3a). The sequence was therefore recognised as that of a virus and provisionally

Virus from royal robe A sequence of 4,435 nt was isolated from one asymptomatic wild dicotyledonous plant of Scaevola calliptera (Royal Robe). When it was analysed using BlastN and

123

280

named Scaevola virus A (ScVA), isolate WA-3. It was assigned GenBank accession number JN127346. The genome of ScVA was incomplete at the 50 end. Three ORFs were recognized. Analysis with CDD and InterProScan predicted that ORF1 encoded a replicase gene with two active domains. A helicase domain was present at the N-terminal portion of the replicase gene, and an RdRp domain was present at the C-terminal end (Online Resource 4). The conserved helicase motifs of GKS, DE, and N were present at residues 92–94, 253–154, and 243, respectively. The RdRp domain occurred at residues 494–859. The conserved core RdRp motif S/TGx3 Tx3 NS/ Tx22 GDD [31] was present as TGx3 Tx3 NTx22 GDD at residues 724–759. The replicase ORF was terminated at UGA. ORF 2 overlapped ORF1 at its 50 end in the ?1 frame. It encoded a putative p30-like MP of 36.7 kDa. The initiation codon was in a favourable context (GAA GAUGGCG), and the termination codon was UGA. Identity was closest to the 50-kDa MPs of APCLSV, ACLSV, peach mosaic virus (PcMV) (all with 53% aa identity), and cherry mottle leaf virus (CMLV) (46%). The 30 K MP superfamily conserved motif LxD [20] was present as LSD (aa 148–150). ORF3 encoded the putative CP. The initiation codon in ORF3 (-1 frame) was in a favorable context, GAAAAUGGCA and was terminated at UAG. It overlapped the putative MP gene (Fig. 3b). The CP gene shared highest aa identity to ACLSV (50%), APCLSV (47%), PcMV and CMLV (41%). ScVA clearly meets species demarcation criteria by sharing less than 80% aa (72% nt) identity between its CP and replication protein genes [1] with those of other established betaflexiviruses. Based on the evidence presented, we propose scaevola virus A be accepted as a member of the family Betaflexiviridae, genus Trichovirus. Virus from blue squill A sequence of 9,433 nt was isolated from a wild plant of the monocotyledonous C. corymbosa (Blue Squill) showing faint chlorotic mottling on the leaves. BlastN analysis of the full sequence showed that it shared closest identity with complete genome sequences of isolates of potyviruses identified only from Australia, Hardenbergia mosaic virus (HarMV) (72% nt, 77% aa) and passion fruit woodiness virus (PWV) (69% nt, 69% aa), and lesser identity (65-68% nt, 67-68% aa) with complete genomes of viruses within the bean common mosaic virus (BCMV) subgroup of potyviruses from elsewhere (Fig. 4a) [18, 51, 53]. When the CP gene of the new virus was aligned with other potyviruses, it most closely matched those of HarMV and PWV [47, 51], with which it shared 75% nt identity, marginally below the species demarcation point of 76-77% proposed for potyviruses [2]. Based on alignment with the

123

S. J. Wylie et al.

complete genome sequences of HarMV and PWV, it is estimated that approximately 96% of the genome sequence of the new virus was obtained (Table 1). Based on the degree of sequence divergence from HarMV and PWV, it is proposed that the sequence represents a new member of the genus Potyvirus, family Potyviridae. It was provisionally named blue squill virus A (BSVA) isolate KP1. It was assigned GenBank accession number JN052072. The 50 UTR and the translation initiation codon for ORF1 were not obtained for BSVA. The termination codon was UAG (nt 9,191–9,193). Putative protease cleavage sites within the polyprotein were identified by comparison with those of other potyviruses. Ten predicted mature peptide products were of the type and in the order typical of members of the genus Potyvirus [38, 43], followed by a 30 UTR of 240 nt (Fig. 4b). ORF2 (nt 2,226–2,929), encoding the putative pretty interesting potyvirus ORF (PIPO) [12], was embedded in the P3 protein in the -1 frame (Online Resource 5). Expected potyvirus motifs of FRNK in the HC-Pro, GDD in the NIb, and DAG in the CP were conserved. The BSVA primers (Online Resource 1) used to link BSVA with its host plant were subsequently used to screen RNA pooled from 10 specimens of C. corymbosa collected from another site of remnant vegetation located on a roadside at Wanneroo, located 34 km north of the site where isolate BSVA-KP1 was collected. A PCR product of expected size was detected by RT-PCR assay in one of the plants tested, and Sanger sequencing showed that it had high sequence identity to BSVA-KP1. Primers flanking the HC-Pro to CI and NIb to 30 UTR regions were designed from the genome sequence of BSVA-KP1 and used to amplify and sequence the two regions (GenBank accession number JN416599). When compared, the HC-Pro-CI region of isolate Wanneroo shared 86% nt identity (93% aa identity) with isolate KP1, and the CP sequences shared 92% nt (96% aa) identity, confirming it to be an isolate of BSVA.

Discussion Sequences representing 16 polyadenylated RNA virus species were identified from the output of a single massive parallel sequencing reaction of pooled RNA samples derived from 120 individual plants of 17 species, both wild and domesticated. Subsequent matching of viral sequences to plants showed that only nine plants of six species were hosts, and 13 viruses infected only three specimens of three cultivated species. Of the viral sequences identified from domesticated plants, one represented a potentially novel species and the other 12 were described previously. The virus-derived sequences from wild plants represented three


281 100 100

a

SMV WMV WVMV

66

EAPV

89

FVY

94

BCMV subgroup

BCMNV BCMV PWV BSVA

100 100

‘Australian’ potyviruses

HarMV PVA

0.1

b

Fig. 4 a Neighbor-joining tree of deduced amino acid sequence of the nearly complete genome of blue squill virus A (BSVA) (boxed) (JN052072) compared with complete genomes of isolates of other potyviruses of the bean common mosaic virus (BCMV) subgroup of potyviruses. Potyviruses identified only from Australia are indicated. SMV (soybean mosaic virus, NC_ 002,634); WMV (watermelon mosaic virus, NC_010736); WVMV (wisteria vein mottling virus, NC_007216); EAPV (East-Asian passiflora virus, NC_007728); FVY (fritillary virus Y, NC_010954); BCMNV (bean common mosaic necrosis virus, NC_004047); BCMV (NC_003397); PWV (passion fruit woodiness virus, NC_014790), and HarMV (hardenbergia mosaic virus, NC_015394). An isolate of PVA (potato virus A, NC_004039) is provided as an outgroup. Bootstrap values [ 60% are

shown. Evolutionary distances were computed using the maximum composite likelihood method. Trees are drawn to scale, with evolutionary distance used to infer the branch length in nucleotide substitutions per site. b Proposed genome organisation of blue squill virus A showing deduced gene positions. Numbers indicate nucleotide positions at the 30 end of each gene except PIPO, where both 50 and 30 positions are shown. P1 gene is partial at 50 end. P1, protein 1; HC-Pro, helper component-protease; P3, protein 3; 6K1, 6-kilodalton protein 1; CI, cytoplasmic inclusion protein; 6K2, 6-kilodalton protein 2; NIa-VPg, nuclear inclusion A-genome-linked protein; NIa-Pro, nuclear inclusion A-protease; NIb, nuclear inclusion B; CP, coat protein; 30 UTR, 30 untranslated region. Not drawn to scale

viruses, and these are presented as novel members of existing virus families. This is the first record of CNSV in Australia and the first record of it infecting a species of Lilium. Previously CNSV was recorded from Sago Cycad (Cycas revolute Thunb.) [26] and Gladiolus sp. in Japan [23], and from Sago Cycad and Chinese Peonie (Paeonia lactiflora Pall.) in New Zealand [36]. Of possible significance is that L. longiflorum is a plant indigenous to the Ryukyu Archipelago, Japan [24]. The sequence of LMoV isolate Bate5 that co-infected the Easter lily plant was identical to that of an isolate from China that was identified in a batch of lily bulbs imported there from the Netherlands [57]. LMoV is described from Australia on Lilium species [6], although no sequences are available from isolates there. The same Easter lily plant was also co-infected with an isolate of a potyvirus with a sequence similar to that of LMoV but sufficiently diverse to be proposed as a member of a distinct species. It is uncertain whether this virus, provisionally named lily virus A, originated in the Australian flora and then subsequently

infected the imported lily or originated elsewhere. Some evidence that it evolved outside Australia is provided in the analysis of its sequence, which shares high identity with isolates of LMoV, a virus found widely distributed in Europe and Asia [57]. In addition, potyviruses identified only in Australia typically belong to the BCMV potyvirus subgroup [18], whereas LVA (and LMoV) shares greater identity to members of the Potato virus Y subgroup. The other novel potyvirus identified was blue squill virus A (BSVA). This virus occurred in an indigenous wild plant, and its sequence closely aligned with other viruses within the BCMV subgroup, especially those identified only from Australia [18, 47]. Based on this evidence, it is concluded that BSVA is likely to have evolved within the Australian flora. Isolates of the potyviruses IMMV and OrMV were identified co-infecting a plant of Spanish iris. Both species have previously been identified in Australia infecting iris species [34, 47]. A garlic plant purchased from a garden store in 2010 was co-infected with a complex of isolates representing

123

282

eight virus species: two carlaviruses, four allexiviruses, and two potyviruses. Although isolates of SLV, OYDV, and LYSV were previously sequenced in Australia from a garlic bulb imported from China (GenBank accession numbers HQ258896, HQ258894, and HQ258895, respectively), and GCLV infection of garlic is reported from there [44], this is the first report of allexiviruses from Australia. Various combinations of these and other viruses within these genera occur frequently as a complex in garlic and other alliums, where they may cause serious losses in crop yield and deterioration of quality [11, 14, 46]. Much of the garlic consumed in Australia is imported from Asia and South America as viable bulbs that can potentially be propagated, thereby establishing and spreading the viruses that infect them. This trade, therefore, poses a potential threat to production of garlic and other alliums in Australia and to natural ecosystems [48]. Chocolate lily virus A is proposed as an unassigned member of the family Secoviridae. It is not the first tentative sadwavirus identified from the Australian continent; lucerne Australian symptomless virus (LASV) was identified only from Australia on the introduced plant Medicago sativa [37, 42]. Unfortunately no sequence information is available for LASV. Scaveola virus A is proposed as a new member of the genus Trichovirus. Scaevola is a genus of more than 130 tropical and temperate species, with the center of diversity being Australia and Polynesia. Species of Scaveola are widely cultivated as ornamental plants around the world. To our knowledge, ScVA is the first virus described from a Scaevola species. ScVA is the second member of the family Betaflexiviridae identified from the indigenous flora of south-west Australia and proposed to have evolved there, the other being the putative capillovirus Hardenbergia virus A [52]. In previous work by us, where single plants were analysed for polyadenylated viral genome sequences using a similar massively parallel sequencing approach, the viral component of total sequence reads was 7.38% (PWV) in a plant of Passiflora caerulea [53] and 10.99% (HarMV, HarVA) in a plant of Hardenbergia comptoniana [51, 52]. In the multiplex virus assay described here, only 0.5% of total sequence reads from polyadenylated RNA species corresponded to viral genome sequences, probably because less than 10% of the plants tested harboured viruses. This ‘dilution’ of viral sequences by the transcripts of 111 uninfected plants may be the reason why complete or nearly complete genome sequences were obtained for only eight isolates. Equally, it may reflect bias in reverse transcription and amplification efficiencies, the sequencing approach used, or indeed expression of the viral nucleic acids. These factors and the depth of sequence data obtained will determine the upper limit to the number of

123

S. J. Wylie et al.

plants that can be pooled before viral molecules become too dilute to detect with high confidence. Where obtaining complete virus genome sequences is important, analysis of fewer plants or sequencing to greater depth would be desirable. Although it is meaningless in this case to compare data on expression of viral genomes from different host plants, comparisons within multiply-infected individual plants are valid. The number of reads corresponding to a sequence is somewhat proportional to the expression level of the RNA template at the time the RNA was extracted [51]. For example, for the two secoviruses detected, approximately equal proportions of RNA1 and RNA2 were present from CNSV, but surprisingly, for CLVA, almost three times more RNA2 than RNA1 was present. In both species, the CP is translated from RNA2, and the number of reads corresponding to RNA2 may reflect levels of transcription of CP subunits when the plant was harvested. Protein expression assays are needed to confirm this. The two LYSV isolates detected in garlic were expressed in almost equal amounts, whereas one of the three OYDV isolates infecting the same plant was clearly dominant, apparently having over four times higher expression. For the two isolates of IMMV detected in iris, one isolate was expressed at an approximately 30% higher level than the other. This is a similar scenario to that observed in a H. comptoniana plant co-infected with two isolates of HarMV, where one isolate was expressed 23% more highly than the other [51]. Clearly, the phenomenon of ‘cross protection’ where one strain effectively slows or prohibits replication of another [55] does not apply in all cases of mixed infections with strains of the same virus. In the cases found where members of multiple virus species co-infect a host, the potential exists for interspecific or intraspecific recombination to occur, but this was not detected here. There are a number of reports describing the use of massively parallel sequencing to detect viruses in plants (and other organisms) (e.g. 3, 4, 13, 15, 21, 40, 51, 52, 53, 54]. What is notable about the method used here compared to those described previously is that individual samples were not bar-coded (genetically tagged) prior to pooling and sequencing [40], samples were not enriched in any way for viral sequences [3, 4, 13, 40], and small RNA species were not purified and sequenced [15, 21, 54]. An advantage of the current method is the per-sample cost of sequencing is reduced. When each sample is barcoded prior to pooling and sequencing as in the method developed by Roossinck et al. [40], the per-sample cost is considerably increased. The clear benefit of bar-coding is that viral sequences discovered are readily matched to host plants. In the current method, primers are designed against each virus discovered, and host plants are subsequently screened by RT-PCR, first in groups and then


individually, for presence of the virus. Although this procedure is laborious, the per-sample cost is reduced, and the primers developed may be used in subsequent studies, for example, to study distribution of the virus within a population. This was demonstrated for BSVA, where a new isolate of BSVA was identified using primers designed from the genome sequence of BSVA isolate KP1. A shortfall of the current method is that it was not designed to detect non-polyadenylated RNA viruses, viroids, and DNA viruses. This limitation may be overcome for non-polyadenylated RNA viruses by utilising random primers to synthesise cDNA after removal of ribosomal RNA and/or by enriching for viral sequences by one of a number of methods, such as by purifying doublestranded RNA through a CF11-cellulose-based approach [40], by removal of non-viral sequences through subtractive hybridisation of RNA from healthy and infected host plants [3], or by partial purification of viral particles from samples [33]. A further possible shortfall of the current method was its use of a single proprietary RNA extraction procedure that may not efficiently extract RNA from all plant species tested. Acknowledgments This study was funded by an Australian Research Council Linkage Grant (LP110200180) and the Murdoch University Institutes of Sustainable Ecosystems, and Crop and Plant Research. Thanks to Professor Kingsley Dixon and Mr. Steve Easton, Botanic Gardens and Parks Authority, for authorizing collections at Kings Park and for assistance in collecting and identifying the plants used from there.

References 1. Adams MJ, Antoniw JF, Bar-Joseph M, Brunt AA, Candresse T, Foster GD, Martelli GP, Milne RG, Fauquet CM (2004) The new plant virus family Flexiviridae and assessment of molecular criteria for species demarcation. Arch Virol 149:1045–1060 2. Adams MJ, Antoniw JF, Fauquet CM (2005) Molecular criteria for genus and species discrimination within the family Potyviridae. Arch Virol 150:459–479 3. Adams IP, Glover RH, Monger WA, Mumford R, Jackeviciene E, Navalinskiene M, Samuitiene M, Boonham N (2009) Next-generation sequencing and metagenomic analysis: a universal diagnostic tool in plant virology. Mol Plant Pathol 10:537–545 4. Al Rwahnih M, Daubert S, Golino D, Rowhani A (2009) Deep sequencing analysis of RNAs from a grapevine showing Syrah decline symptoms reveals a multiple virus infection that includes a novel virus. Virology 387:395–401 5. Altschul SF, Madden TL, Scha¨ffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402 6. Asjes CJ (1998) Data Sheet on Lily Mottle Potyvirus for CABI Crop Protection Compendium. CABI Information. Wallingford, Oxon 7. Balijja A, Kvarnheden A, Turchetti T (2008) A non-phenol– chloroform extraction of double-stranded RNA from plant and fungal tissues. J Virol Methods 152:32–37

283 8. Blouin AG, Greenwood DR, Chavan RR, Pearson MN, Clover GRG, MacDiarmid RM, Cohen D (2010) A generic method to identify plant viruses by high-resolution tandem mass spectrometry of their coat proteins. J Virol Methods 163:49–56 9. Boonham N, Adams I, Glover R, Monger W, Hodges T, Ashton P (2010) High throughput sequencing—next wave diagnostics. Phytopathology 100(suppl. 1):S154 10. Carstens EB (2010) Ratification vote on taxonomic proposals to the International Committee on Taxonomy of Viruses (2009). Arch Virol 155:133–146 11. Chen J, Zheng HY, Antoniw JF, Adams MJ, Chen JP, Lin L (2004) Detection and classification of allexiviruses from garlic in China. Arch Virol 149:435–445 12. Chung BYW, Miller WA, Atkins JF, Firth AE (2008) An overlapping essential gene in the Potyviridae. Proc Natl Acad Sci USA 105:5897–5902 13. Coetzee B, Freeborough M-J, Maree HJ, Celton J-M, Rees DJG, Burger JT (2010) Deep sequencing analysis of viruses infecting grapevines: virome of a vineyard. Virology 400:157–163 14. Conci VC, Canavelli AE, Balzarini MG (2010) The distribution of garlic viruses in leaves and bulbs during the first year of infection. J Phytopath 158:186–193 15. Donaire L, Wang Y, Gonzalez-Ibeas D, Mayer KF, Aranda MA, Llave C (2009) Deep-sequencing of plant viral small RNAs reveals effective and widespread targeting of viral genomes. Virology 392:203–214 16. Drummond AJ, Ashton B, Buxton S, Cheung M, Cooper A, Duran C, Field M, Heled J, Kearse M, Markowitz S, Moir R, Stones-Havas S, Sturrock S, Thierer T, Wilson A (2011) Geneious v5.4. http://www.geneious.com. Accessed 15 June 2011 17. Eisen JA (2007) Environmental shotgun sequencing: its potential and challenges for studying the hidden world of microbes. PLoS Biol 5:e82 18. Gibbs AJ, Mackenzie AM, Wei KJ, Gibbs MJ (2008) The potyviruses of Australia. Arch Virol 153:1411–1420 19. Gorbalenya AE, Donchenko AP, Blinov VM, Koonin EV (1989) Cysteine proteases of positive strand RNA viruses and chymotrypsin-like serine proteases. FEBS Lett 243:103–114 20. Gorbalenya AE, Koonin EV, Wolf YI (1990) A new superfamily of putative NTP-binding domains encoded by genomes of small DNA and RNA viruses. FEBS Lett 262:145–148 21. Hagen C, Frizzi A, Kao J, Jia L, Huang M, Zhang Y, Huang S (2011) Using small RNA sequences to diagnose, sequence, and investigate the infectivity characteristics of vegetable-infecting viruses. Arch Virol 156:1209–1216 22. Haible D, Kober S, Jeske H (2006) Rolling circle amplification revolutionizes diagnosis and genomics of geminiviruses. J Virol Methods 135:9–16 23. Hanada K, Fukumoto F, Kusunoki M, Kameya-Iwaki M, Tanaka Y, Iwanami T (2006) Cycas necrotic stunt virus isolated from gladiolus plants in Japan. J Gen Plant Pathol 72:383–386 24. Hiramatsu M, Ii K, Okubo H, Huang K-L, Huang C-W (2001) Biogeography and origin of Lilium longiflorum and L. formosanum (Liliaceae) endemic to the Ryukyu Archipelago and Taiwan as determined by allozyme diversity. Am J Bot 88:1230–1239 25. Kreuze JF, Perez A, Untiveros M, Quispe D, Fuentes S, Barker I, Simon R (2009) Complete viral genome sequence and discovery of novel viruses by deep sequencing of small RNAs: a generic method for diagnosis, discovery and sequencing of viruses. Virology 388:1–7 26. Kusunoki M, Hanada K, Iwaki M, Chang MU, Doi Y, Yora K (1986) Cycas necrotic stunt virus, a new member of nepoviruses found in Cycas revolute-Host range, purification, serology and some other properties. Ann Phytopathol Soc Japan 52:302–311 27. Le Gall O, Iwanami T, Jones AT, Lehto K, Sanfacon H, Wellink J, Wetzel T, Yoshikawa N (2005) Comoviridae. In: Fauquet CM,

123

284

28.

29.

30.

31.

32.

33.

34.

35.

36.

37.

38. 39.

40.

41.

S. J. Wylie et al. Mayo MA, Maniloff J, Desselberger U, Ball LA (eds) Virus taxonomy, eighth report of the international committee on the taxonomy of viruses. Elsevier Academic Press, London, pp 807–818 Le Gall O, Sanfaçon H, Ikegami M, Iwanami T, Jones T, Karasev A, Lehto K, Wellink J, Wetzel T, Yoshikawa N (2007) Cheravirus and Sadwavirus: two unassigned genera of plant positivesense single-stranded RNA viruses formerly considered atypical members of the genus Nepovirus (family Comoviridae). Arch Virol 152:1767–1774 Luo H, Wylie SJ, Jones MGK (2010) Identification of plant viruses using one-dimensional gel electrophoresis and peptide mass fingerprints. J Virol Meth 165:297–301 Lutcke HA, Chow KC, Mickel FS, Moss KA, Kern HF, Scheele GA (1987) Selection of AUG initiation codons differs in plants and animals. EMBO J 6:43–48 Martelli GP, Adams MJ, Kreuze JF, Dolja VV (2007) Family Flexiviridae: a case study in virion and genome plasticity. Ann Rev Phytopathol 45:73–100 Mayo MA, Fritsch C (1994) A possible consensus sequence for VPg of viruses in the family Comoviridae. FEBS Lett 354:129–130 Melcher U, Muthukumar V, Wiley GB, Min BE, Palmer MW, Verchot-Lubicz J, Ali A, Nelson RS, Roe BA, Thapa V, Pierce ML (2008) Evidence for novel viruses by analysis of nucleic acids in virus-like particle fractions from Ambrosia psilostachya. J Virol Meth 152:49–55 Morschel JR (1966) Recorded plant diseases in and outside australia: part 4–forest trees and ornamental plants. Commonwealth Department of Health Division of Plant Quarantine, Canberra Mushegian AR (1994) The putative movement domain encoded by nepovirus RNA-2 is conserved in all sequenced nepoviruses. Arch Virol 135:437–441 Ochoa-Corona FM, Elliot DR, Tang Z, Lebas BSM, Alexander BJR (2003) Detection of Cycas necrotic stunt virus (CNSV) in post-entry quarantine stocks of ornamentals in New Zealand. Phytopathology 93:S67 Remah A, Jones AT, Mitchell MJ (1986) Purification and properties of lucerne Australian symptomless virus, a new virus infecting lucerne in Australia. Ann Appl Biol 109:307–315 Riechmann JL, Lain S, Garcia JA (1992) Highlights and prospects of potyvirus molecular biology. J Gen Virol 73:1–16 Ritzenthaler C, Viry M, Pinck M, Margis R, Margis R, Fuchs M, Pinck L (1991) Complete nucleotide sequence and genetic organization of grapevine fanleaf nepovirus RNA1. J Gen Virol 72:2357–2365 Roossinck MJ, Saha P, Wiley GB, Quan J, White JD, Lai H, Chavarría F, Shen G, Roe BA (2010) Ecogenomics: using massively parallel pyrosequencing to understand virus ecology. Mol Ecol 19(Suppl. 1):81–88 Rott ME, Gilchrist A, Lee L, Rochon DM (1995) Nucleotide sequence of tomato ringspot virus RNA1. J Gen Virol 76: 465–471

123

42. Sanfaçon H, Wellink J, Le Gall O, Karasev A, van der Vlugt R, Wetzel T (2009) Secoviridae: a proposed family of plant viruses within the order Picornavirales that combines the families Sequiviridae and Comoviridae, the unassigned genera Cheravirus. Arch Virol 154:899–907 43. Shukla DD, Ward CW, Brunt AA (1994) The Potyviridae. CAB International, Wallingford 44. Sward RJ (1990) Lettuce necrotic yellows rhabdovirus and other viruses infecting garlic. Australas Plant Pathol 19:46–51 45. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S (2011) MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. doi:10.1093/molbev/msr121 46. Tsuneyoshi T, Matsumi T, Natsuaka KT, Sumi S (1998) Nucleotide sequence analysis of virus isolates indicates the presence of three potyvirus species in Allium plants. Arch Virol 143:97–113 47. Webster C, Jones RAC, Coutts BA, Jones MGK, Wylie SJ (2007) Virus impact at the interface of an ancient ecosystem and a recent agroecosystem: studies on three legume-infecting potyviruses in the southwest Australian floristic region. Plant Pathol 56:729–742 48. Westphal MI, Browne M, MacKinnon K, Noble I (2008) The link between international trade and the global distribution of invasive alien species. Biol Invasions 10:391–398 49. Wren JD, Roossinck MJ, Nelson RS, Scheets K, Palmer MW, Melcher U (2006) Plant virus biodiversity and ecology. PLoS Biol 4:e80 50. Wu Q, Luo Y, Lu R, Lau N, Lai EC, Li WX, Ding SW (2010) Virus discovery by deep sequencing and assembly of virusderived small silencing RNAs. Proc Natl Acad Sci USA 107:1606–1611 51. Wylie SJ, Jones MGK (2011) Characterisation and quantitation of mutant and wild-type genomes of Hardenbergia mosaic virus isolates co-infecting a wild plant of Hardenbergia comptoniana. Arch Virol 156:1287–1290 52. Wylie SJ, Jones MGK (2011) Hardenbergia virus A, a novel member of the Betaflexiviridae from a wild legume in South-west Australia. Arch Virol 156:1245–1250 53. Wylie SJ, Jones MGK (2011) The complete genome sequence of Passionfruit woodiness virus determined using deep sequencing, and its relationship to other potyviruses. Arch Virol 156:479–482 54. Yan F, Zhang HM, Adams MJ, Yang J, Peng JJ, Antoniw JF, Zhou YJ, Chen JP (2010) Characterization of siRNAs derived from rice stripe virus in infected rice plants by deep sequencing. Arch Virol 155:935–940 55. Yeh SD, Gonsalves D, Wang HL, Namba R (1988) Control of papaya ringspot virus by cross protection. Plant Dis 72:375–380 56. Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829 57. Zheng HY, Chen J, Zhao MF, Lin L, Chen JP, Antoniw JF, Adams MJ (2003) Occurrence and sequences of Lily mottle virus and Lily symptomless virus in plants grown from imported bulbs in Zhejiang province, China. Arch Virol 148:2419–2428