a local assembly algorithm for description of microRNAs in non ...

2 downloads 0 Views 3MB Size Report
genomic reference for miRNA studies. II. develop a tool ... derived from different genome encoded hairpin precursors .... (10, 51, 99, 100, 125/lin4, 993). 1. 1. 1. 1.
a local assembly algorithm for description of microRNAs in nongenome organisms Bastian Fromm1, T D. Otto2, M.M. Worren3, C. Hahn1, P. D. Harris1, L. Bachmann1 1Evolutionary

Parasitology Group, Natural History Museum, University of Oslo, Norway, 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK,3Bioinformatics core facility, Department of Informatics, University of Oslo, Norway

MiRCandRef microRNA

motivation I. Overcome the necessity of an existing high quality

18,439,129 raw Reads

1 Lane : 46,919,223 pairs

55bp, single Read

76bp paired-end (167bp insertsize)

FASTX-Toolkit



• •

clipping adapters from reads



• • • • • •

reads shorter than 10 nt after clipping reads containing only adaptor sequences reads containing N reads without clipping (= other RNAs) reads with quality score less than 33 reads shorter than 18nt

Remove

14,852,745 microRNA candidate reads

Genomic reads

FASTX-Toolkit Remove

genomic reference for miRNA studies II. develop a tool that allows the utilization of available sequence data without bias III. Make miRNA studies cheaper, more accurate and without need for cluster computing, also more environmental friendly

smallRNA reads



Collapsing identical reads

background

2 700 720 uniq microRNA reads

clipping adapters from reads trimming reads from nucleotides quality score less than 33, starting at the end of reads • • • •

adapter only reads reads shorter than 22 nt after clipping reads containing only adaptor sequences reads containing N

keep good pairs

39,571,231 Genomic Read Pairs

BLAST

iterator.pl

microRNAs single-stranded non-coding RNA molecules 18-24 nucleotides long involved in a broad variety of biological processes represent the most recently discovered gene regulators (1993) regulate gene expression through mRNA destabilization (“silencing”) or translation inhibition derived from different genome encoded hairpin precursors highly conserved; during evolution continuously added and rarely lost from metazoan genomes represent candidate phylogenetic markers and have already been used in several phylogenetic debates with controversial data from morphology and standard molecular markers rapidly increasing number of published miRNAs will further increase their eligibility for phylogenetic studies.

BLAST all genomic reads to index - blastall -p blastn -a 8 -m 8 -b 300

BLASTOUPUT-FILE multi column file for all genomic reads and there mapping on a miRNA , contains only identities, no sequences

FASTA File with ALL confirmed short contigs

MirCandRef



Index of all genomic reads - “formatdb -p F –I”

takes the short contigs and puts them back in the pipeline to enlarge them

Gyrodactylus (Plathyhelminthes; Monogenea) 500μm size on average extremely diverse, worldwide distributed ~400 described small ectoparasitic species may cause serious diseases to teleost fish Special reproductive mode:  (hyper-)viviparity and progenesis poorly understood phylogeny

for EACH miRNA candidate get_miRNA_reads_and_mates.pl • • • • •

read blast-output file get the genomic read identities that match filter these that match the full length of the miRNA with 100% identity (perfect matches) find genomic reads hits and their mates in read file create fasta file with hits and their mates

Genomic_Reads_that_hit_and_their_mates.fa

run_velvet.pl PERL •

find and keep contigs that contain the miRNA candidate

Gyrodactylus salaris

VELVETH run Velveth with optional KMERsizes

reported in Norway from 1970s (Salmo salar) proofed reports from east, north and south Europe cause severe damage to Atlantic Salmon stocks many rivers and fish farms infected losses of ca. 30 million €/year highly disputed Aluminum and Rotenone treatment in Norway with very limited long-term effects

confirmed contigs

VELVETG PERL • • • •

test if miRNA hits the contig ±40nt from the edges, so a hairpintest is possible keep the longest contig report settings Send short ones to Iterate.pl

run Velvetg with optimal • values for expected coverages (default = “auto”) • values for coverage cutoffs from stats.txt

META Contig(s) confirmed long contig

Platyhelminths Genome projects and miRNA Numbers “Turbellaria”

Macrostomida

?

FASTA File with ALL confirmed long contigs

eg Macrostomum lignano compare_contigs.pl

Polycladida Seriata

???

148

• •

Cluster contigs (CD-Hit-EST) Check if contigs contain the miRNA, they have been build from

eg Schmidtea mediterranea

Neodermata

FASTA file with ALL confirmed long, clustered contigs

“Monogenea” Trematoda Cestoda

Gyrodactylus salaris 55 23

MirDEEP2

eg Schistosoma mansoni

• •

eg Echinococcus granulosus



Several studies have shown the importance of miRNAs as phylogenetic markers. Furthermore Platyhelminthes and especially the parasitic groups (see Neodermata) within the flatworms experienced – as one of very few examples that we know – a dramatic loss of miRNAs. Within Neodermata genomes and miRNA complements are available for major groups except for the monogeneans. Therefore the economical important and biologically outstanding parasite Gyrodactylus salaris has been chosen for our approach with NGS methods.

miRNA complement of Gyrodactylus

Results and Discussion miRNA families expected / Gyrodactylus/Schmidtea/Schistosoma/Echinococcus Eumetazoa

(10, 51, 99, 100, 125/lin4, 993)

1

1

1

1

Nephrozoa

(let7, 84, 98, 966, 2354), 7, (8, 141, 200, 236, 429), (22, 745, 980), (29, 83, 285,746), 33, 71, (96, 182, 183, 228, 263), (125, lin4), 133, (137,234), 153, 184, (50,190), 193, 210, (216, 283, 304, 747), 242, 278, 281, 315, 365, 375, 2001

24 14 15

6

8

Protostomia Triploblastica

1. 2. 3.

(Bantam, 80, 81, 82), (2, 13), 12, 36, (67, 307), (76, 981), 87, 277, (61, 279, 996), 317, 750, (958, 13 1175), 1993 (1, 206), (31, 72), (34, 449), (4, 9 =79, 244), (25, 92, 235, 310, 311, 313, 363), 124, 219,(252a, 252b) 8

Bakke T, Cable J, Harris P: The biology of gyrodactylid monogeneans: the "Russian-doll killers". Adv. in Par 2007, 64:161 - 376. Ambros V: The functions of animal microRNAs. Nature 2004, 431(7006):350-355. Douglas H. Erwin et al.: The Cambrian Conundrum: Early Divergence and Later Ecological Succes in the Early History of Animals. Science 334, 1091 (2011); DOI: 10.1126/science, 1206375

bowtie-build contigs.fa contigs mapper.pl miRNA_Reads.fa -c -j -l 18 -m -p contigs -s reads_collapsed.fa -t reads_collapsed_vs_genome.arf -v miRDeep2.pl reads_collapsed.fa contigs.fa reads_collapsed_vs_genome.arf mature_ref_this_species_clean.fa mature_ref_other_species_clean.fa precursors_ref_this_species_clean.fa > report.log

4. 5.

1

8

12

6

3

4

6

4

3

• MiRCandRef finds a surprisingly high number of miRNA families in Gyrodactylus salaris in comparison to the other Platyhelminthes, which suggests basal Monogenea and more importantly major evolutionary events in miRNA complement not at the basis of Neodermata, but Trematoda+Cestoda. • In total around 80 miRNAs have been found and are at the moment subject to confirmation experiments in the wet-lab.

B. Fromm, PD Harris, L. Bachmann, MicroRNA preparations from individual monogenean Gyrodactylus salaris-a comparison of six commercially available totalRNA extraction kits, BMC Research Notes 2011, 4:217, doi:10.1186/1756-0500-4-217. Friedländer MR, Mackowiak SD, Li N, Chen W, Rajewsky N: miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucl. Acids Res. (2012) 40 (1):37-52.doi: 10.1093/nar/gkr688