genomic reference for miRNA studies. II. develop a tool ... derived from different genome encoded hairpin precursors .... (10, 51, 99, 100, 125/lin4, 993). 1. 1. 1. 1.
a local assembly algorithm for description of microRNAs in nongenome organisms Bastian Fromm1, T D. Otto2, M.M. Worren3, C. Hahn1, P. D. Harris1, L. Bachmann1 1Evolutionary
Parasitology Group, Natural History Museum, University of Oslo, Norway, 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK,3Bioinformatics core facility, Department of Informatics, University of Oslo, Norway
MiRCandRef microRNA
motivation I. Overcome the necessity of an existing high quality
18,439,129 raw Reads
1 Lane : 46,919,223 pairs
55bp, single Read
76bp paired-end (167bp insertsize)
FASTX-Toolkit
•
• •
clipping adapters from reads
•
• • • • • •
reads shorter than 10 nt after clipping reads containing only adaptor sequences reads containing N reads without clipping (= other RNAs) reads with quality score less than 33 reads shorter than 18nt
Remove
14,852,745 microRNA candidate reads
Genomic reads
FASTX-Toolkit Remove
genomic reference for miRNA studies II. develop a tool that allows the utilization of available sequence data without bias III. Make miRNA studies cheaper, more accurate and without need for cluster computing, also more environmental friendly
smallRNA reads
•
Collapsing identical reads
background
2 700 720 uniq microRNA reads
clipping adapters from reads trimming reads from nucleotides quality score less than 33, starting at the end of reads • • • •
adapter only reads reads shorter than 22 nt after clipping reads containing only adaptor sequences reads containing N
keep good pairs
39,571,231 Genomic Read Pairs
BLAST
iterator.pl
microRNAs single-stranded non-coding RNA molecules 18-24 nucleotides long involved in a broad variety of biological processes represent the most recently discovered gene regulators (1993) regulate gene expression through mRNA destabilization (“silencing”) or translation inhibition derived from different genome encoded hairpin precursors highly conserved; during evolution continuously added and rarely lost from metazoan genomes represent candidate phylogenetic markers and have already been used in several phylogenetic debates with controversial data from morphology and standard molecular markers rapidly increasing number of published miRNAs will further increase their eligibility for phylogenetic studies.
BLAST all genomic reads to index - blastall -p blastn -a 8 -m 8 -b 300
BLASTOUPUT-FILE multi column file for all genomic reads and there mapping on a miRNA , contains only identities, no sequences
FASTA File with ALL confirmed short contigs
MirCandRef
Index of all genomic reads - “formatdb -p F –I”
takes the short contigs and puts them back in the pipeline to enlarge them
Gyrodactylus (Plathyhelminthes; Monogenea) 500μm size on average extremely diverse, worldwide distributed ~400 described small ectoparasitic species may cause serious diseases to teleost fish Special reproductive mode: (hyper-)viviparity and progenesis poorly understood phylogeny
for EACH miRNA candidate get_miRNA_reads_and_mates.pl • • • • •
read blast-output file get the genomic read identities that match filter these that match the full length of the miRNA with 100% identity (perfect matches) find genomic reads hits and their mates in read file create fasta file with hits and their mates
Genomic_Reads_that_hit_and_their_mates.fa
run_velvet.pl PERL •
find and keep contigs that contain the miRNA candidate
Gyrodactylus salaris
VELVETH run Velveth with optional KMERsizes
reported in Norway from 1970s (Salmo salar) proofed reports from east, north and south Europe cause severe damage to Atlantic Salmon stocks many rivers and fish farms infected losses of ca. 30 million €/year highly disputed Aluminum and Rotenone treatment in Norway with very limited long-term effects
confirmed contigs
VELVETG PERL • • • •
test if miRNA hits the contig ±40nt from the edges, so a hairpintest is possible keep the longest contig report settings Send short ones to Iterate.pl
run Velvetg with optimal • values for expected coverages (default = “auto”) • values for coverage cutoffs from stats.txt
META Contig(s) confirmed long contig
Platyhelminths Genome projects and miRNA Numbers “Turbellaria”
Macrostomida
?
FASTA File with ALL confirmed long contigs
eg Macrostomum lignano compare_contigs.pl
Polycladida Seriata
???
148
• •
Cluster contigs (CD-Hit-EST) Check if contigs contain the miRNA, they have been build from
eg Schmidtea mediterranea
Neodermata
FASTA file with ALL confirmed long, clustered contigs
“Monogenea” Trematoda Cestoda
Gyrodactylus salaris 55 23
MirDEEP2
eg Schistosoma mansoni
• •
eg Echinococcus granulosus
•
Several studies have shown the importance of miRNAs as phylogenetic markers. Furthermore Platyhelminthes and especially the parasitic groups (see Neodermata) within the flatworms experienced – as one of very few examples that we know – a dramatic loss of miRNAs. Within Neodermata genomes and miRNA complements are available for major groups except for the monogeneans. Therefore the economical important and biologically outstanding parasite Gyrodactylus salaris has been chosen for our approach with NGS methods.
miRNA complement of Gyrodactylus
Results and Discussion miRNA families expected / Gyrodactylus/Schmidtea/Schistosoma/Echinococcus Eumetazoa
(10, 51, 99, 100, 125/lin4, 993)
1
1
1
1
Nephrozoa
(let7, 84, 98, 966, 2354), 7, (8, 141, 200, 236, 429), (22, 745, 980), (29, 83, 285,746), 33, 71, (96, 182, 183, 228, 263), (125, lin4), 133, (137,234), 153, 184, (50,190), 193, 210, (216, 283, 304, 747), 242, 278, 281, 315, 365, 375, 2001
24 14 15
6
8
Protostomia Triploblastica
1. 2. 3.
(Bantam, 80, 81, 82), (2, 13), 12, 36, (67, 307), (76, 981), 87, 277, (61, 279, 996), 317, 750, (958, 13 1175), 1993 (1, 206), (31, 72), (34, 449), (4, 9 =79, 244), (25, 92, 235, 310, 311, 313, 363), 124, 219,(252a, 252b) 8
Bakke T, Cable J, Harris P: The biology of gyrodactylid monogeneans: the "Russian-doll killers". Adv. in Par 2007, 64:161 - 376. Ambros V: The functions of animal microRNAs. Nature 2004, 431(7006):350-355. Douglas H. Erwin et al.: The Cambrian Conundrum: Early Divergence and Later Ecological Succes in the Early History of Animals. Science 334, 1091 (2011); DOI: 10.1126/science, 1206375
bowtie-build contigs.fa contigs mapper.pl miRNA_Reads.fa -c -j -l 18 -m -p contigs -s reads_collapsed.fa -t reads_collapsed_vs_genome.arf -v miRDeep2.pl reads_collapsed.fa contigs.fa reads_collapsed_vs_genome.arf mature_ref_this_species_clean.fa mature_ref_other_species_clean.fa precursors_ref_this_species_clean.fa > report.log
4. 5.
1
8
12
6
3
4
6
4
3
• MiRCandRef finds a surprisingly high number of miRNA families in Gyrodactylus salaris in comparison to the other Platyhelminthes, which suggests basal Monogenea and more importantly major evolutionary events in miRNA complement not at the basis of Neodermata, but Trematoda+Cestoda. • In total around 80 miRNAs have been found and are at the moment subject to confirmation experiments in the wet-lab.
B. Fromm, PD Harris, L. Bachmann, MicroRNA preparations from individual monogenean Gyrodactylus salaris-a comparison of six commercially available totalRNA extraction kits, BMC Research Notes 2011, 4:217, doi:10.1186/1756-0500-4-217. Friedländer MR, Mackowiak SD, Li N, Chen W, Rajewsky N: miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucl. Acids Res. (2012) 40 (1):37-52.doi: 10.1093/nar/gkr688