Book of Abstracts Book of Abstracts k of Abstracts

10 downloads 0 Views 2MB Size Report
[3] T. Frickey, A.N. Lupas, CLANS: a Java application for visualizing protein families .... of already existing methods for model quality assessment, such as RASP,.
SocBiN/BIT13 Book of Abstracts 26-29 29 June 2013, Torun, Poland

Program Committee: Prof. Janusz Bujnicki (International Institute of Molecular and Cell Biology in Warsaw and Adam Mickiewicz University in Poznan, Poland) Prof. Wieslaw Nowak (Nicolaus Copernicus University, Torun, Poland) Prof. Arne Elofsson (Stockholm University, Sweden)

Local Organizing Committee: Prof. Wieslaw Nowak (Nicolaus Copernicus University, Torun, Poland) Dr. Witold Rudnicki (Warsaw University, Poland) Dr. Łukasz Pepłowski (Nicolaus Copernicus University, Torun, Poland) Karolina Mikulska, Msc (Nicolaus Copernicus University, Torun, Poland) Anna Gogolińska, Msc (Nicolaus Copernicus University, Torun, Poland) Rafał Jakubowski, Msc Eng (Nicolaus Copernicus University, Torun, Poland) Marcin Dąbrowski, Msc Eng (Nicolaus Copernicus University, Torun, Poland)

Sponsors:

Table of contents TABLE OF CONTENTS ............................................................................................................................................. 3 KEYNOTE LECTURE: SØREN BRUNAK.............................................................................................................. 6 SESSION I: GENOMICS/METAGENOMICS ......................................................................................................... 8 LECTURE 1: KAROLINE FAUST ................................................................................................................................... 9 LECTURE 2: JARONE PINHASSI.................................................................................................................................. 10 LECTURE 3: DIMITER DIMITROV............................................................................................................................... 11 LECTURE 4: DUNCAN ODOM .................................................................................................................................... 12 LECTURE 5: MARCIN WĄSOWSKI ............................................................................................................................. 13 LECTURE 6: MICHAŁ KIERZYNKA ............................................................................................................................ 14 SESSION II: RNA BIOINFORMATICS ................................................................................................................. 15 LECTURE 1: JAN GORODKIN ..................................................................................................................................... 16 LECTURE 2: DAVID H. MATHEWS............................................................................................................................. 17 LECTURE 3: RADHAKRISHNAN SABARINATHAN ....................................................................................................... 18 LECTURE 4: IVO HOFACKER ..................................................................................................................................... 19 LECTURE 5: GRZEGORZ CHOJNOWSKI ...................................................................................................................... 20 LECTURE 6: ČESLOVAS VENCLOVAS ........................................................................................................................ 21 SESSION III: PROTEIN EVOLUTION AND STRUCTURE ............................................................................... 22 LECTURE 1: NICK GRISHIN ....................................................................................................................................... 23 LECTURE 2: ROB RUSSELL ....................................................................................................................................... 24 LECTURE 3: ARNE ELOFSSON ................................................................................................................................... 25 LECTURE 4: BIRTE HÖCKER ..................................................................................................................................... 26 LECTURE 5: JENS MEILER ......................................................................................................................................... 27 LECTURE 6: GUIDO CAPITANI................................................................................................................................... 28 SESSION IV: PROTEIN DISORDER ..................................................................................................................... 29 LECTURE 1: PHILIP M. KIM ...................................................................................................................................... 30 LECTURE 2: MARIJA BULJAN.................................................................................................................................... 31 LECTURE 3: ŁUKASZ KOZŁOWSKI ............................................................................................................................ 32 LECTURE 4: OXANA V. GALZITSKAYA ..................................................................................................................... 33 LECTURE 5: MALGORZATA KOTULSKA .................................................................................................................... 34 LECTURE 6: VIKRAM ALVA KULLANJA .................................................................................................................... 35 SESSION V: BIOMACROMOLECULAR SIMULATIONS ................................................................................. 36 LECTURE 1: VALERIE DAGGETT ............................................................................................................................... 37 LECTURE 2: JOHAN ÅQVIST ...................................................................................................................................... 38 LECTURE 3: ANDRZEJ KOLIŃSKI............................................................................................................................... 39 LECTURE 4: BERT DE GROOT.................................................................................................................................... 40 LECTURE 5: SŁAWOMIR FILIPEK ............................................................................................................................... 41 LECTURE 6: JAREK MELLER ..................................................................................................................................... 42 SESSION VI: SYSTEMS BIOLOGY ....................................................................................................................... 43 LECTURE 1: DENNIS BRAY ....................................................................................................................................... 44 LECTURE 2: EDWARD M. MARCOTTE ....................................................................................................................... 45 LECTURE 3: STANISLAW DUNIN-HORKAWICZ .......................................................................................................... 46 LECTURE 4: LARS J. JENSEN ..................................................................................................................................... 47 LECTURE 5: TERESA PRZYTYCKA ............................................................................................................................. 48 LECTURE 6: EWA SZCZUREK .................................................................................................................................... 49 CLOSING LECTURE: PIOTR ZIELENKIEWICZ .............................................................................................. 50 POSTERS .................................................................................................................................................................... 52 POSTER 1: REIDAR ANDRESSON ............................................................................................................................... 53 POSTER 2: PIOTR BENTKOWSKI ................................................................................................................................ 54 POSTER 3: PAWEŁ BŁAŻEJ ........................................................................................................................................ 55

POSTER 4: MICHAŁ J. BONIECKI ............................................................................................................................... 56 POSTER 5: MARCIN BOROWSKI ................................................................................................................................ 57 POSTER 6: MACIEJ BRATEK ...................................................................................................................................... 58 POSTER 7: MARCIN DĄBROWSKI .............................................................................................................................. 59 POSTER 8: MICHAL DABROWSKI .............................................................................................................................. 60 POSTER 9: MATEUSZ DOBRYCHŁOP ......................................................................................................................... 61 POSTER 10: FINN DRABLØS ...................................................................................................................................... 62 POSTER 11: MAŁGORZATA DUDKIEWICZ ................................................................................................................. 63 POSTER 12: PRZEMYSŁAW GAGAT ........................................................................................................................... 64 POSTER 13: WIKTORIA GIEDROYĆ-PIASECKA .......................................................................................................... 65 POSTER 14: TOMASZ GŁOWACKI .............................................................................................................................. 66 POSTER 15: ANNA GOGOLINSKA .............................................................................................................................. 67 POSTER 16: MAŁGORZATA GRABIŃSKA ................................................................................................................... 68 POSTER 17: ALEKSANDRA GRUCA ........................................................................................................................... 69 POSTER 18: MD. ANAYET HASAN............................................................................................................................. 70 POSTER 19: MARTA IWANASZKO .............................................................................................................................. 71 POSTER 20: RAFAŁ JAKUBOWSKI ............................................................................................................................. 72 POSTER 21: WITOLD JANUSZEWSKI .......................................................................................................................... 73 POSTER 22: BARBARA KALINOWSKA ....................................................................................................................... 74 POSTER 23: JOANNA M. KASPRZAK .......................................................................................................................... 75 POSTER 24: PAWEŁ KĘDZIERSKI ............................................................................................................................... 76 POSTER 25: BOGUSLAW KLUGE ............................................................................................................................... 77 POSTER 26: MATEUSZ KORYCINSKI ......................................................................................................................... 78 POSTER 27: JOANNA KOWALSKA.............................................................................................................................. 79 POSTER 28: ADAM KOZAK ....................................................................................................................................... 80 POSTER 29: KARINA KUBIAK - OSSOWSKA .............................................................................................................. 81 POSTER 30: TADEUSZ KULINSKI ............................................................................................................................... 82 POSTER 31: MATEUSZ KURCIŃSKI ............................................................................................................................ 83 POSTER 32: KAMIL KWARCIAK ................................................................................................................................ 84 POSTER 33: DOROTA LATEK..................................................................................................................................... 85 POSTER 34: FILIP LEONARSKI ................................................................................................................................... 86 POSTER 35: GRZEGORZ ŁACH ................................................................................................................................... 87 POSTER 36: MICHAŁ ŁAŹNIEWSKI ............................................................................................................................ 88 POSTER 37: MAGDALENA MACHNICKA .................................................................................................................... 89 POSTER 38: PAWEŁ MACKIEWICZ............................................................................................................................. 90 POSTER 39: DOROTA MACKIEWICZ .......................................................................................................................... 91 POSTER 40: MARCIN MAGNUS ................................................................................................................................. 92 POSTER 41: JAN MAJTA ............................................................................................................................................ 93 POSTER 42: DAMIAN MARCHEWKA .......................................................................................................................... 94 POSTER 43: GRZEGORZ MARKOWSKI ....................................................................................................................... 95 POSTER 44: KAROLINA MIKULSKA-RUMIŃSKA ........................................................................................................ 96 POSTER 45: MACIEJ MIŁOSTAN ................................................................................................................................ 97 POSTER 46: PRZEMYSŁAW MISZTA .......................................................................................................................... 98 POSTER 47: KRZYSZTOF MURZYN ............................................................................................................................ 99 POSTER 48: LINH NGUYEN TUYET.......................................................................................................................... 100 POSTER 49: ANNA OLCHOWIK ................................................................................................................................ 101 POSTER 50: KLIMENT OLECHNOVIČ ....................................................................................................................... 102 POSTER 51: LINUS J ÖSTBERG ................................................................................................................................ 103 POSTER 52: MARCIN PACHOLCZYK ........................................................................................................................ 104 POSTER 53: WIESŁAW PAJA.................................................................................................................................... 105 POSTER 54: CHANDRA PAREEK .............................................................................................................................. 106 POSTER 55: LUKASZ PEPLOWSKI ............................................................................................................................ 107 POSTER 56: PAWEŁ PIĄTKOWSKI ............................................................................................................................ 108 POSTER 57: DARIUSZ PLEWCZYŃSKI ...................................................................................................................... 109 POSTER 58: LESZEK PRYSZCZ................................................................................................................................. 110 POSTER 59: TOMASZ PUTON ................................................................................................................................... 111 POSTER 60: TOMASZ RATAJCZAK ........................................................................................................................... 112 POSTER 61: WITOLD RUDNICKI .............................................................................................................................. 113 POSTER 62: JOANNA SARZYŃSKA ........................................................................................................................... 114 POSTER 63: ALEX SHCHUKIN ................................................................................................................................. 115 POSTER 64: BEATA SOKOŁOWSKA.......................................................................................................................... 116 POSTER 65: MICHAŁ STANISZEWSKI....................................................................................................................... 117 POSTER 66: KAMIL STECZKIEWICZ ......................................................................................................................... 118 POSTER 67: ROBERT SZCZELINA............................................................................................................................. 119

POSTER 68: KRZYSZTOF SZCZEPANIAK .................................................................................................................. 120 POSTER 69: MAŁGORZATA SZELĄG ........................................................................................................................ 121 POSTER 70: JACEK ŚMIETAŃSKI ............................................................................................................................. 122 POSTER 71: CORINNA THEIS ................................................................................................................................... 123 POSTER 72: FABIO TROVATO .................................................................................................................................. 124 POSTER 73: RADOSŁAW URBANIAK ....................................................................................................................... 125 POSTER 74: JACEK WABIK...................................................................................................................................... 126 POSTER 75: TOMASZ WALEŃ.................................................................................................................................. 127 POSTER 76: TOMASZ WOŹNIAK .............................................................................................................................. 128 POSTER 77: SHILPA YADAHALLI ............................................................................................................................ 129 POSTER 78: RAFAŁ ZABOROWSKI ........................................................................................................................... 130

Keynote lecture: Søren Brunak

6

Keynote lecture

Systems chemical analysis of adverse drug reactions extracted from electronic patient records Søren Brunak Technical University of Denmark & University of Copenhagen World-wide the healthcare sector is confronted with the problem of underreported ADRs resulting from treatment. At the same time it is a fundamental issue to resolve whether specific ADRs stem from variation in the individual genome of a patient, from drug/environment cocktail effects, or both. Towards these goals we have developed a text mining pipeline for temporal analysis of electronic patient records for identification of ADRs directly from the free text narratives describing patient disease trajectories over time. Electronic patient records remain a rather unexplored, but potentially rich data source for discovering correlations between diseases, drugs and genetic information. Linking these data is a huge undertaking which soon will represent a major challenge given that it now has become feasible to sequence the DNA of entire populations at low cost. Combining molecular level data with clinical information and data on the chemical environment may add complementary types of knowledge which - can reveal disease mechanisms and drug reaction patterns in novel ways. We describe a general approach for gathering phenotypic descriptions of patients from medical records in a systematic and noncohort dependent manner. By extracting phenotype information and information of adverse drug reactions from the free-text in such records we demonstrate that we can extend the information contained in the structured record data, and use it for producing fine-grained patient stratification and disease co-occurrence statistics. The approach uses a dictionary based on the WHO International Classification of Disease ontology and is therefore in principle language independent. We subsequently make systems level analyses of hospital-wide drug administration. We characterize the similarity of ADR profiles of approved drugs using drug-ADR networks and report on the relationship between the chemical similarity of drugs and their ADRs. [1] Using electronic patient records to discover disease correlations and stratify patient cohorts. Roque FS, Jensen PB, Schmock H, Dalgaard M, Andreatta M, Hansen T, Søeby K, Bredkjær S, Juul A, Werge T, Jensen LJ, Brunak S. PLoS Comput Biol. 2011 Aug;7(8):e1002141. [2] Mining electronic health records: towards better research applications and clinical care. Jensen PB, Jensen LJ, and Brunak S, Nature Reviews Genetics, 13, 395-405, 2012. [3] Dictionary construction and identification of possible adverse drug events in Danish clinical narrative text. Eriksson R, Jensen PB, Frankild S, Jensen LJ, Brunak S. J Am Med Inform Assoc. 2013 May 23.

7

Session I: Genomics/metagenomics chair: Bengt Persson (Uppsala University & Karolinska Institutet, Sweden)

8

Session I: Genomics/metagenomics

Lecture 1: Karoline Faust Detection of microbial relationships from metagenomics data using network inference Karoline Faust *presenting author, e-mail: [email protected]

Microorganisms form complex communities featuring relationships such as symbiosis, parasitism and competition. Such ecological interactions as well as niche preferences and random factors shape the abundances of microorganisms. The advent of metagenomics nowadays allows measuring the abundance of hundreds of microorganisms across many samples. This situation has parallels to microarray analysis, where the expression of thousands of genes is measured across different conditions or time points. In genomics, a number of techniques have been developed to infer gene regulatory networks. Likewise, the massive metagenomics data available today enable the inference of microbial association networks. I will present a microbial network inference pipeline and its application to various metagenomics data sets sampled in environments ranging from the human body to the ocean.

9

Session I: Genomics/metagenomics

Lecture 2: Jarone Pinhassi Uncovering novel ecosystem functions of marine bacteria through genomics approaches Jarone Pinhassi Linnaeus University, Kalmar, Sweden *presenting author, e-mail: [email protected]

Dissolved organic carbon (DOC) derived from phytoplankton photosynthesis represents the main biologically available organic carbon pool in the ocean. Current estimates indicate that marine microscopic algae (phytoplankton) carry out half of the Earth’s photosynthesis, and around 50% of the organic carbon produced is channelled through bacterioplankton. Through CO2-uptake during phytoplankton photosynthesis and CO2 release primarily through bacterial respiration, these microbes collectively have a crucial impact on the CO2-flux balance between the atmosphere and the ocean. Bacteria play a key role in the turnover of DOC since they are the predominant organisms that readily assimilate or transform this source of reduced carbon in the sea. However, only very recently has it become technically possible to uncover the molecular mechanisms by which bacteria regulate the biogeochemical cycling of carbon and nutrients. The application of metagenomics approaches to seawater samples during the last decade has produced important new knowledge on the spatial and temporal variability in distributions of previously recognized as well as novel metabolisms; the latter including for example the widespread distribution of a potential for alternative harvesting of energy from sunlight through the membrane-located light driven proton pump protein proteorhodopsin. Ecophysiological response experiments with sequenced marine bacterial isolates guided by genomics-informed hypotheses have provided direct evidence for the role of proteorhodopsins in providing energy to marine microorganisms. Notably these analyses show that light-harvesting can improve different aspects of bacterial fitness, including both increased growth and prolonged survival during starvation, depending on the genomic architecture of the organisms. Currently, transcriptomics and protemics (meta-)approaches applied to seawater samples are increasingly used to identify and constrain the activities of different microbes in their natural environment. To be successful, these attempts heavily rely on the development of novel approaches to integrate biochemical, genetic, genomic and ecological information from organisms ranging from single microbial species to mixed natural assemblages. Thus, increased interdisciplinary interactions have a great potential to provide the tools to uncover the microbial molecular mechanisms that to a dominant degree regulate biogeochemical cycles and global climate.

10

Session I: Genomics/metagenomics

Lecture 3: Dimiter Dimitrov Systems Biology and Systems Medicine of the Oral Microbiome Dimiter V. Dimitrov1* and Julia Hoeng2 1 2

Diavita Ltd, Varna, Bulgaria

Philip Morris International R&D, Switzerland

*presenting author; e-mail:[email protected]

There is a wide agreement on the etiological role of oral bacteria in human disease. A huge range of chemical and biological interactions exist between bacteria and between bacteria and the host. Systems biology and in particular functional metagenomics, as well as salivary proteomics and metabolomics are excellent tools for probing this capacity in order to achieve both a complete understanding of oral microbial ecology and to discover new activities useful for biotechnological, industrial and medical applications. Periodontal medicine defines a rapidly emerging branch of periodontology, focusing on the wealth of new data, and establishing a strong relationship between periodontal health and disease. There is increasing evidence that individuals with periodontal disease may be at increased risk for adverse medical outcomes such as: respiratory disease, cardiovascular disease, diabetes, obesity, rheumatoid arthritis, inflammatory bowel disease, and Alzheimer disease. The role of confounding and effect modification is discussed in the association between oral micriobiome and systemic diseases.

11

Session I: Genomics/metagenomics

Lecture 4: Duncan Odom Evolution of Transcription Factor Binding in Closely Related Mammals Duncan T. Odom University of Cambridge, Cancer Research UK, Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, United Kingdom

Changes in gene regulation ultimately lead to changes in organismal phenotypes, yet the mechanisms underlying this process remain poorly explored in mammals. To address this, we used six closely related rodent species to compare the genome-wide binding of three tissuespecific transcription factors (TF) that control liver gene expression. Our data revealed thousands of regions bound by TFs with highly constrained intensity, despite an overall fast regulatory turnover. While individual mutations in bound sequence motifs can influence TF binding, most interspecies binding changes occur in the absence of nearby sequence variations. TF clusters are more robust to sequence variation, and this evolutionary stability is partly dependent on the context of nearby bound factors. Combinatorial binding also disappears in concert, which is reflected in genetic stability: co-bound transcription factors were sensitive to genetic knock-out of partner TFs. Coordinated, quantitative changes in transcription factor binding appear to be a unifying mechanism of eukaryotic transcriptional evolution.

12

Session I: Genomics/metagenomics

Lecture 5: Marcin Wąsowski De novo gene structure prediction using machine learning approach Marcin Wąsowski *a, Rafał Pokrzywa a, Michał Wojciech Szcześniak b, Izabela Makałowska b a) Silesian University of Technology, Institute of Informatics b) Adam Mickiewicz University, Institute of Molecular Biology and Biotechnology, Laboratory of Bioinformatics *presenting author, e-mail: [email protected]

Accurate splice site prediction is a critical component of any computational approach to gene prediction in higher organisms, as knowing the gene exon-intron structures is a key to understand its functions and evolution. Unfortunately, modern RNA-Seq based methods for gene prediction perform quite bad in terms of gene structure prediction, though they are characterized by satisfactory sensitivity and specificity regarding prediction of new genes on the genome [1][2]. The proposed by us new approach for splice site prediction is based on the use of machine learning to identify splicing sites on a genome scale. Our classifier will be built, among others, on splice-site associated features that have not been considered before in this task. For instance we are going to apply features generated from 7- and 8-mers that are overrepresented over intronic sequences and thus it is assumed they could be essential in the process of splicing [3]. We are also going to use features associated with the secondary structure of splice site sequences, such as Folding Optimal Energy (FOE) and Max Helix (MH)[1][3]. Additionally, a number of machine learning methods is going to be tested with different settings in order to select the optimal classifier. We are going to generate separate classifiers for several model plant and animal species, including Homo sapiens and Arabidopsis thaliana. The classifiers, and then a tool, will be used by us in large-scale experiments focused on miRNA gene structure prediction but it will also be suitable for splice site prediction in any gene. Finally, we are going to apply our classifiers to enhance the performance of current RNA-Seq based methods of gene prediction. Acknowledgements: This work was supported by the European Union from the European Social Fund (grant agreement number: UDA-POKL.04.01.01-00-106/09) Keywords: microRNA, splice site, classifier, secondary structure. [1] Donald J. Patterson, Ken Yasuhara, Walter L. Ruzzo, “PRE-mRNA Secondary Structure Prediction Aids Splice Site Prediction” Pacific Symposium on Biocomputing, 2002, 7:223-234 [2] M. Pertea, X. Lin, and S.L. Salzberg. GeneSplicer: “New Computational Method for Splice Site Prediction” Nucleic Acids Research, 2001, 29(5):1185–1190 [3] Andigoni Malousi, Ioanna Chouvarda, Vassilis Koutkias, Sofia Kouidou, Nicos Maglaveras “SpliceIT: A hybrid method for splice signal identification based on probabilistic and biological inferencje”, Journal of Biomedical Informatics, 2010, 43: 208–217

13

Session I: Genomics/metagenomics

Lecture 6: Michał Kierzynka Sequence alignment on GPUs for the DNA de-novo assembly problem Michał Kierzynka *ab, Wojciech Frohmberg a, Jacek Błażewicz ab, Paweł Wojciechowski a a Institute of Computing Science, Poznan University of Technology, Poznan, Poland b Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland *presenting author, e-mail: [email protected]

Sequencing has recently become a primary method used by life scientists to investigate biologically relevant problems. As modern sequencers can only read very short fragments of DNA strands, an algorithm is needed to assemble them into the finally desired sequence. When the original sequence is not known beforehand, this process is called DNA de-novo assembly. There are a couple of methods available that address this problem. However, one of them, based on the overlap-layout-consensus approach, despite its accuracy, has recently been nearly supplanted from the market due to its time consumption, especially in the context of constantly increasing number of sequences. In response to this, we propose a new scheme based on this classical approach, but being able to handle large data sets coming from new sequencing machines. First of all, a great number of reads that are the integral part of the Next Generation Sequenceing imposes that in order to obtain results in a reasonable time a very effective algorithm aligning the reads must be applied. For this reason, based on our prior experience with sequence alignment [13], we stared with an ultra efficient GPU-based implementation of the Needleman-Wunsch algorithm optimized for the nucleotide sequences. We utilize this application to select similar sequences from the whole data set using a few interesting heuristics, including alignment-free sequence comparison and information about paired-end reads. These help us to tackle the big data challenge. The selected similar sequences are then used to construct a graph model for the DNA assembly problem [4]. The software was tested on a variety of real data coming mostly form the Illumina/Solexa sequencers, and was proved to deal with them particularly well. Therefore, we believe that it is a valuable tool for many life scientists. [1] J. Blazewicz, W. Frohmberg, M. Kierzynka, E. Pesch, P. Wojciechowski, "Protein alignment algorithms with an efficient backtracking routine on multiple GPUs", BMC Bioinformatics, 2011, 12:181 [2] J. Blazewicz, W. Frohmberg, M. Kierzynka, P. Wojciechowski, "G-MSA – GPU-based, fast and accurate algorithm for multiple sequence alignment", Journal of Parallel and Distributed Computing, 2012, Vol. 73(1), p.32-41 [3] W. Frohmberg, M. Kierzynka, J. Blazewicz, P. Wojciechowski, "G-PAS 2.0 – an improved version of protein alignment tool with an efficient backtracking routine on multiple GPUs", Bull. Pol. Ac.: Tech., 2012, 60(3), p.491-494 [4] J. Blazewicz, W. Frohmberg, P. Gawron, M. Kasprzak, M. Kierzynka, A. Swiercz, P. Wojciechowski, "DNA Sequence Assembly Involving an Acyclic Graph Model ", Foundations of Computing and Decision Sciences, 2013, Vol. 38, p.25-34

14

Session II: RNA bioinformatics chair: Jan Gorodkin (University of Copenhagen, Denmark)

15

Session II: RNA bioinformatics

Lecture 1: Jan Gorodkin Searching for profiles in RNAseq data Jan Gorodkin University of Copenhagen, Denmark RNAseq experiments often leave traces of characteristic profiles in which reads mapped to a reference genome form well defined shapes in terms of mapping to near identical positions, possibly with multiple instances grouped locally. Some of these profiles represent "projections" of (portions of) genes processed in a characteristic way, with microRNA being probably the most profound example. To analyze such profiles, we have developed an RNAseq profile alignment tool, DeepBlockAlign (dba), which align two profiles regardless of the primary sequence. In a further microRNA specific application we compiled ~2,500 microRNA profiles from miRBase and compared them to a cleaned set of ~4,800 RNAseq profiles from the ENCODE consrotium. Of these 1,361 are annotated as non-coding RNAs (ncRNAs) and 285 of these are annotated as microRNAs. In search for an optimal profile alignment score to separate the microRNA profiles from the other profiles we obtained a Matthews correlation coefficient of 0.8 in classification. However, with this profile alignment score cut-off, we reveal more than 500 novel microRNA candidates in the human genome using the ENCODE data. Interestingly a substantial number of these are located in poorly conserved regions. In addition we show that the profile search also can used to distinguish between kingdom specific microRNA profiles consistent with the ability to distinguish between different profiles in general. The compiled database of microRNA profiles, miRRPdb is available for profile search upon upload of a bed file at http://rth.dk/resources/mirdba.

16

Session II: RNA bioinformatics

Lecture 2: David H. Mathews Predicting RNA Secondary Structure Using Probabilistic Methods David H. Mathews* University of Rochester, Rochester, New York, USA *presenting author, e-mail: [email protected]

RNA is increasingly understood to play many important cellular roles. This talk focuses on computational methods for modeling and understanding RNA structure. At the secondary structure resolution, i.e. the resolution of base pairing, a free energy nearest neighbor model can be used to quantify the stability of a structure at folding equilibrium. Using this model and a dynamic programming algorithm, partition functions can be calculated, which can be used to predict base pairing probabilities.

This talk will focus on the use of base pairing probabilities to improve secondary structure prediction. Base pairing probabilities suggest predicted pairs that are more likely to be accurate. Structures can be assembled from highly probable pairs, and these structures can include pseudoknots, which are notoriously difficult to predict. Finally, for a set of homologous sequences, base pairing probabilities and alignment probabilities can be used to model a conserved structure.

17

Session II: I: RNA bioinformatics

Lecture 3: Radhakrishnan Sabarinathan Efficient detection of SNPs SNP effect fect on local RNA secondary structure using RNAsnp Radhakrishnan Sabarinathana,b,*, Hakim Taferc, Stefan E. Seemanna,b, Ivo L. Hofackera,d, Peter F. Stadlera,c,d, Jan Gorodkina,b a

Center for non-coding coding RNA in Technology and Health, bIKVH,, University of Copenhagen, Grønnegårdsvej 3, DK-1870 1870 Frederiksberg, Denmark , cBioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 04107 Leipzig, Germany, dDepartment of Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 A 1090 Wien, Austria. *presenting author, e-mail: e [email protected]

Figure: Dot plot shows the base pair probabilities of a disrupted local RNA secondary structure region predicted predict by RNAsnp [1], where the upper triangle represents the probabilities for wild-type wild (green) and lower triangle represents the mutant (red). The wildwild type sequence forms a stem-loop loop structure, which is, however, disrupted due to the nucleotide change from C to A at position 7 (indicated by circle) in the mutant sequence. Genomic variation has long been known to cause a variety of diseases, probably the most profound are known from nonnon synonymous SNPs in protein coding sequences. However, Genome-wide association iation studies (GWAS) often identify disease-associated disease associated variants in nonnon coding regions of the genome [2]. It is estimated that up to 95% of the human genome is transcribed, suggesting that a majority of variants are transferred to RNA. Although some of the transcripts could be considered as noise, there are still many non-coding non coding RNAs (ncRNAs) (e.g., tRNAs, rRNAs, and other ncRNAs) that are known to have diverse functions (including housekeeping and regulatory). In the past, however, much of the studies have been carried out to study the effect of SNPs in coding regions compared to ncRNAs. The function of many ncRNAs and cis-regulatory regulatory elements of mRNAs largely depends on their structure which is in turn determined by their sequence. SNPs and other mutations may disrupt those structure and their associated functions. Thus, it is of relevance to predict the effect of SNPs on ncRNA structure. Existing prediction methods have addressed this problem, but they have limitations towards the detection of local structural ral changes and genome-wide genome wide applications. The method we present here, RNAsnp, fulfills the aforesaid limitations and it can be used to analyze the enormous variation data generated from the HapMap or the 1000 genome project. The performance of RNAsnp was evaluated valuated using the data set of SNPs with experimentally verified structural effects. Furthermore, on a data set comprising 501 SNPs associated with human inherited diseases, we predict 54 to have significant local structural effect in the UTR region of mRNAs. mRNAs. The RNAsnp program is freely available at http://rth.dk/resources/rnasnp. [1] R. Sabarinathan, H. Tafer, S.E. Seemann, Seemann I.L. Hofacker, P.F. Stadler, J. Gorodkin, Hum Mutat., 2013 (In press). Frenkel B. Henderson, A. Tanay, C. Haiman, M. Freedman, Cell [2] G. Coetzee, L. Jia, B. Frenkel, Cycle, 2010, 9, 256 – 259. 18

Session II: RNA bioinformatics

Lecture 4: Ivo Hofacker Beyond RNA secondary structure: G-quadruplexes and noncanonical pairs Ivo L. Hofacker* University of Vienna, Vienna, Austria *presenting author, e-mail: Ivo Hofacker [email protected]

Traditional RNA secondary structure prediction allows six base pairing combinations, all of which are assumed to form Watson-Crick type pairs, and only considers non-crossing interactions. Apart from efforts to include certain types of pseudo-knots, this paradigm has remained unchanged since the first dynamic programming algorithms for secondary structure prediction some 30 years ago. Tertiary structures from X-ray crystallography, however, reveal that “loops” in secondary structures are not unstructured, but are filled with networks of noncanonical base pairs. G-quadruplexes are another example of structural elements not covered by standard folding algorithms. My talk will present two extensions to classical RNA secondary structure prediction to address the to cases mentioned above. We show that non-canonical base pairs, and even base triplets, can be accommodated in the dynamic programming scheme without sacrificing efficiency. The difficulty, in this case, lies in the parametrization of loop energies, for a vastly increased set of different loop types. Furthermore we present a variant of the folding algorithm that includes a search for Gquadruplexes. It has been implemented in the RNAfold program from the ViennaRNA package as well as the local folding variants RNALfold and RNAplfold. Genome-wide scans reveal that most potential G-quadruplexes are unstable. Stable quadruplexes are, however, overrrepresented in the 5' UTRs of protein coding genes.

19

Session II: RNA bioinformatics

Lecture 5: Grzegorz Chojnowski RNA Bricks - a database of RNA interactions Grzegorz Chojnowski1*, Tomasz Waleń1,2, Janusz M. Bujnicki1,3 1

2

International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw, Poland Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland 3 Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, Umultowska 89, 61-614 Poznan, Poland *presenting author, e-mail: [email protected]

Non-coding RNAs (ncRNAs) were found to be involved in many cellular processes from the gene transcriptional regulation to the catalysis of chemical reactions. Many ncRNAs, including cis-regulatory elements, are modular biomolecules, composed largely of recurrent structural modules glued together via mutual interactions into a compact, functional, 3D structure. RNA Bricks database (http://iimcb.genesilico.pl/rnabricks) provides information about the recurrent RNA modules and their interactions, both with themselves and with proteins. In contrast to other similar tools (RNA Frabase [1], Rloom [2]) here RNA modules are presented in their natural environment, that is neighborhood of other molecules in a solution or a crystal. There are also two other features that make the RNA Bricks unique: First is a robust algorithm for 3D motif search and comparison. The algorithm compares spatial positions of backbone atoms in query and a representative set of RNA modules, with no use of secondary, or primary structure information. This enables the study of local structural similarities among distant RNA homologs and evolutionarily unrelated analogs. Second is the availability of three structure-quality scores with a single nucleotide resolution. These scores facilitate the identification of regions with poor backbone geometry, severe steric clashes, and low real-space correlation coefficients for experimental diffraction data (if available). The local quality assessment enables the selection of good modules from any RNA structure, not only the ones believed to be most reliable (e.g. high resolution crystallographic models). This is particularly important taking into account small number of unique RNA structures available in the PDB. RNA Bricks database with its advanced web interface is a useful tool for studying architecture of even the largest RNA molecules and searching for local similarities among distant homologs.

Figure: The result of RNA Bricks query with a small RNA fragment (blue). Figure depicts a module found in the database (red) and a related protein contact (green).

[1] Popenda,M., Szachniuk,M., Blazewicz,M., Wasik,S., Burke,E.K., Blazewicz,J. and Adamiak,R.W. (2010). BMC bioinformatics, 11, 231. [2] Schudoma,C., May,P., Nikiforova,V. and Walther,D. (2010). Nucleic acids research, 38, 970–80. 20

Session II: RNA bioinformatics

Lecture 6: Česlovas Venclovas The use of interatomic contact areas for the assessment of RNA 3D structural models Kliment Olechnovič*, Česlovas Venclovas Institute of Biotechnology, Vilnius University *presenting author, e-mail: [email protected]

The growing interest in RNA also led to the increased activity in the development of computational methods for the prediction of RNA three-dimensional (3D) structure. The critically important component in the development and benchmarking of such methods is the ability to objectively evaluate RNA models against the experimentally determined reference structure. In addition to being objective, the evaluation is expected to provide guidance to the method developers. So far, the accuracy of RNA structural models has been evaluated using structure superposition-based methods. Root-mean-square deviation (RMSD) perhaps is the most common, but other methods borrowed from the protein field (GDT-TS and TM-score) have also been used. The drawback of superposition-based methods is in their unpredictable impact of local errors to the total score. Recently, a new method specifically designed for RNA evaluation has been proposed [1]. It includes both global (Deformation Index, DI) and local (Deformation Profile, DP) scores. However, both scores are derived from RMSD with the associated drawbacks. We propose a new, superposition-free, interatomic contact area-based method for the evaluation of RNA structural models. This method is the adaptation of CAD-score (Contact Area Difference score), our recently developed score for proteins [2], to RNA. The main idea of the method is to use contact areas between nucleotides or subsets of nucleotide atoms (bases, main chain) to quantify both the local and the global accuracy of RNA models. The more closely these contacts are reproduced in the model the more accurate the model is considered to be. Contacts and contact areas are derived from the Voronoi diagram of spheres that correspond to heavy atoms of van der Waals radii. The Voronoi diagram of spheres is constructed by an algorithm that is especially suited for processing macromolecular structures. Dominating contacts in RNA are those between nucleobases. Therefore, we further distinguish stacking and non-stacking contacts between nucleobases in order to provide more detailed information about the type of errors in the model. Interestingly, the contact area approach is able to closely reproduce base-stacking and base-pairing interactions using a fairly simple definition. This is in stark contrast to previous approaches involving a considerable number of empirical parameters and thresholds. We tested the new contact area-based evaluation method on a large number of RNA models and found that it is able both to effectively point out local errors in a model and to rank models by their overall quality. We believe that the new method should be useful not only for the developers of RNA structure prediction methods but also for the RNA community at large. [1] Parisien M, Cruz JA, Westhof E, Major F, RNA, 2009, 15, 1875-1885. [2] Olechnovič K, Kulberkytė E, Venclovas Č, Proteins, 2013, 81, 149-162.

21

Session III: Protein evolution and structure chair: Andrei Lupas (Max Planck Institute for Developmental Biology in Tübingen, Germany)

22

Session III: Protein evolution and structure

Lecture 1: Nick Grishin

23

Session III: Protein evolution and structure

Lecture 2: Rob Russell

24

Session III: Protein evolution and structure

Lecture 3: Arne Elofsson PconsC: Combination of direct information methods and alignments improves contact predictions Marcin Skwark, Abbi Abdel-Rehim and Arne Elofsson* *presenting author, e-mail: [email protected]

Recently, several new contact prediction methods have been published. They use (i) large sets of multiple aligned sequences (ii) and assume that correlations between columns in these alignments can be the results of indirect interaction. These methods are clearly superior to red earlier methods when it comes to predicting contacts in proteins. Here, we demonstrate that combining predictions from two prediction methods, PSICOV[1] and plmDCA [2], and two alignment methods, HHblits[3] and jackhmmer [4] at four different e-value cutoffs, provides a relative improvement of 20% in comparison to the best single method, exceeding 70% correct predictions for one contact prediction per residue, see Figure.

[1] Jones et al. (2012) Bioinformatics, 28(2), 184-190. [2] Ekeberg et al. (2013) Phys Rev E Stat Nonlin Soft Matter Phys, 87(1-1) [3] Remmert et al.(2012) Nat Methods, 9(2), 173-175. [4] Johnson et al.(2010) BMC Bioinformatics, 11, 431

25

Session III: Protein evolution and structure

Lecture 4: Birte Höcker Coupling evolutionary concepts with protein design Birte Höcker Max Planck Institute for Developmental Biology e-mail: [email protected]

Similarities in sequence and structure suggest that modern proteins evolved from simpler and less specialized subunits. Domains as well as subdomains were recruited and adapted. Similarly, we have applied the concept of combinatorial assembly and fragment recruitment to construct new well-folded proteins using stable fragments from two different ancient topologies populated by thousands of proteins in all kingdoms of life, the TIM-barrel and the flavodoxin-like fold [1-3]. This modular assembly approach can be generalized by applying it to other folds; additionally, it enables new combinations of functional properties encoded in fold fragments [4]. It further demonstrates how new proteins can quickly develop and be competitive in today’s protein world. Our combinatorial experiments suggest that homologous proteins can evolve to adopt different folds. To test this assumption, we compared TIM-barrel and flavodoxin-like fold sequences to find the most closely related homologous groups between these two topologies. We discovered a sequence-intermediate family of proteins and determined the first representative crystal structure of one of its members [5]. The structural data supports the relationship and enables the development of a model explaining the evolutionary link. Overall, such a systematic approach for the detection of intermediate steps between structural forms will enable us to cover missing areas of the protein structure universe. Moreover, we contribute to the emergent vision of the protein world in which homology can be recognized across different folds and beyond classifications schemes. [1] T. Bharat, S. Eisenbeis, K. Zeth, B. Höcker, PNAS, 2008, 105, 9942-7. [2] S. Eisenbeis, W. Proffitt, M. Coles, V. Truffault, S. Shanmugaratnam, J. Meiler, B. Höcker, JACS, 2012, 134, 4019-22. [3] S. Shanmugaratnam, S. Eisenbeis, B. Höcker, PEDS, 2012, 25, 699-703. [4] J.A. Farias-Rico, B. Höcker, Methods Enzymol, 2013, 523, 389-405. [5] J.A. Farias-Rico, B. Höcker, unpublished

26

Session III: Protein evolution and structure

Lecture 5: Jens Meiler Symmetry and Fragment Recombination in Evolution and Rational Design of Proteins Jens Meiler* Department of Chemistry, Vanderbilt University, Nashville, TN, United States *[email protected]

It has been demonstrated previously that symmetric, homodimeric proteins are energetically favored, which explains their abundance in nature. It has been proposed that such s symmetric homodimers underwent gene duplication and fusion to evolve into protein topologies that have a symmetric arrangement of secondary structure elements—“symmetric elements “symmetric superfolds”. Here, the ROSETTA protein design software was used to computationally engineer a perfectly symmetric variant of imidazole glycerol phosphate synthase and its corresponding symmetric homodimer. The new protein, termed FLR, adopts the symmetric (βα)8 ( TIM-barrel barrel superfold [1]. The protein is soluble and monomeric and exhibits two-fold fold symmetry not only in the arrangement of secondary structure elements but also in sequence and at atomic detail, as verified by crystallography (Figure). When cut in half, FLR dimerizes readily to form the symmetric homodimer. It is further hypothesized hypothesized that protein domains evolved from smaller intrinsically stable subunits via combinatorial assembly. Illegitimate recombination of fragments that encode protein subunits could have quickly led to diversification of protein folds and their functionality. functionalit Guided by computational design, we optimized the interface between fragments of the (βα)8( barrel and the flavodoxin-like like fold with five targeted mutations yielding a stable, monomeric protein whose predicted structure was verified experimentally [2]. We further tested binding of a phosphorylated compound and detected that some affinity was already present due to an intact phosphate-binding binding site provided by one fragment. The affinity could be improved quickly to the level of natural proteins by introducing introducing two additional mutations. These studies demonstrate progress in our understanding of the underlying principles of protein stability and presents an attractive strategy for the in silico construction of larger functional protein domains from smaller pieces.

[1] C. Fortenberry, E. A. Bowman, W. Proffitt, B. Dorr, S. Combs, J. Harp, L. Mizoue and J. Meiler, J Am Chem Soc, 2011, 133 (45),18026-9. (45) [2] S. Eisenbeis, W. Proffitt, M. Coles, V. Truffault, S. Shanmugaratnam, J. Meiler and B. Hocker, J Am Chem Soc, 2012,, 134 (9), 4019-22.

27

Session III: Protein evolution and structure

Lecture 6: Guido Capitani Is it biologically relevant? An evolution- and geometry-based protein interface classification approach J.M. Duarte1, A. Srebniak2, M.A. Schärer1,2, G. Capitani1* Paul Scherrer Institut, Villigen, Switzerland ETHZ, Zurich, Switzerland *presenting author, e-mail: [email protected]

In the last decade macromolecular crystallography has experienced technical advances that greatly pushed back the frontier of feasibility in structure determination. An intrinsic limitation, however, remains: crystallography does not provide information about which lattice contacts represent biologically relevant interfaces and which ones are simply crystal contacts. Correctly assigning the two types of interfaces in a crystal lattice, thus being able to derive the correct biological assembly, can be a difficult task. This is normally addressed experimentally by mutagenesis and biophysical characterization. Detecting the footprint of evolution in biologically relevant interfaces is the most straightforward way of computationally solving the above problem [1,2]. However, many issues are still open and the de facto standard method in the field does not make use of evolutionary information from protein sequences [3]. Nuclear pore complex tomogram courtesy of Dr. M. Beck, EMBL

To this end, we have developed a general protein interface classification method (EPPIC, Evolutionary Protein Protein Interface Classifier [4]), which relies on the concept of interface core residues, defined as in our previous work [5]. EPPIC uses a simple geometric measure, number of core residues, and two evolutionary indicators based on the sequence entropy of homolog sequences. One of the indicators detects differential selection pressure between interface core and rim, while the other compares interface core and rest of the surface, minimizing bias issues with a Z-score like approach. The method has been implemented as a standalone tool and as a user-friendly web server (www.eppic-web.org). EPPIC can be also used to analyze multichain atomic models obtained with techniques other than crystallography, for instance NMR structures. Being able to detect biologically relevant interfaces, the method is very suitable for hybrid and divide-and-conquer approaches in structural biology, where a key issue is the correct assembly of the various components. A paradigmatic example in this respect is the nuclear pore complex [6], where the structures of many components and subcomplexes are being determined by crystallography and need to be correctly assembled into low-resolution tomograms (see figure). The method, its applications and ongoing developments will be presented. [1] W.S. Valdar, J.M. Thornton, J Mol Biol, 2001, 313, 399-416. [2] A.H. Elcock, J.A. McCammon, Proc Natl Acad Sci USA, 2001, 98, 2990-4. [3] E. Krissinel, K. Henrick, J Mol Biol, 2007, 372, 774–797. [4] J.M. Duarte, A. Srebniak, M.A. Schärer, G. Capitani, BMC Bioinformatics, 2012, 13, 334. [5] M.A. Schärer, M.G. Grütter, G. Capitani, Proteins, 2010, 78, 2707–2713. [6] A. Hoelz, E.W. Debler, G. Blobel, Annu Rev Biochem, 2011, 80, 613–43. 28

Session IV: Protein disorder chair: Arne Elofsson (Stockholm University, Sweden)

29

Session IV: Protein disorder

Lecture 1: Philip M. Kim Disorder, Motifs and Alternative Splicing Philip M. Kim* The Donnelly Centre, University of Toronto, Toronto, ON, Canada *presenting author, e-mail: [email protected]

I will discuss our recent results in the areas of protein disorder and alternative splicing. In particular, I will discuss how protein disorder performs two different functions in alternatively spliced exons; one to provide flexibility to facilitate folding of both isoforms and the other to rewire protein interactions through the use of linear motifs. We use the first property to implement an accurate predictor of which alternative isoforms are stably expressed and the second to start predicting and mapping changes in protein interactions mediated by alternative splicing.

30

Session IV: Protein disorder

Lecture 2: Marija Buljan Tissue-specific splicing rewires protein interaction networks Marija Buljan (1, 2), Guilhem Chalancon (1), Sebastian Eustermann (1), Gunter P. Wagner (3), Monika Fuxreiter (1, 4), Alex Bateman (2) and M. Madan Babu (1) 1 MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 0QH, UK 2 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK 3 Yale Systems Biology Institute, and Department of Ecology and Evolutionary Biology, Yale University, West Haven, CT 06516, USA 4 Department of Biochemistry and Molecular Biology, Medical and Health Science Center, University of Debrecen, Nagyerdei krt 98, POB 6, H-4032 Debrecen, Hungary Marija Buljan, e-mail: [email protected]

Alternative inclusion of exons increases the functional diversity of proteins. Among alternatively spliced exons, tissue-specific exons play a critical role in maintaining tissue identity. This raises the question of how tissue-specific protein-coding exons influence protein function. In this talk I will compare the structural, functional, interaction, and evolutionary properties of constitutive, tissue-specific, and other alternative exons in human. We find that tissuespecific protein segments often contain disordered regions, are enriched in posttranslational modification sites, and frequently embed conserved binding motifs. Furthermore, genes containing tissue-specific exons tend to occupy Tissue-specific splicing rewires interaction networks central positions in interaction networks and display distinct interaction partners in the respective tissues, and are enriched in signaling, development, and disease genes. Based on these findings, we propose that tissue-specific inclusion of disordered segments that contain binding motifs rewires interaction networks and signaling pathways [1]. In this way, tissue-specific splicing may contribute to functional versatility of proteins and increases the diversity of interaction networks across tissues [2]. Tissue A

Tissue B

AAAAAA

AAAAAA

[1] M. Buljan et al. Tissue-specific splicing of disordered segments that embed binding motifs rewires protein interaction networks, Mol Cell, 2012, 46(6), 871-83 [2] M. Buljan et al. Alternative splicing of intrinsically disordered regions and rewiring of protein interactions, Curr Opin Struct Biol, 2013, 23(3), 443-50

31

Session IV: Protein disorder

Lecture 3: Łukasz Kozłowski Predictions of domains and intrinsically unstructured regions in human pre-mRNA pre 3'-end end processing complex Łukasz P. Kozłowski a

a*

, Janusz M. Bujnicki

a, b

Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, ul. Trojdena 4, 02-109 02 Warsaw, Poland, b Laboratory of Bioinformatics, Institute of Molecular Biology and Biotechnology, Faculty of Biology, ul. Umultowska U 89, 61-614 Poznan, Poland *presenting author, e-mail: e [email protected]

Pre-mRNA 3'-end end formation is an essential step in gene expression regulation. Messenger RNAs are endonuleolytically cleaved and most of them acquire a poly(A) tail whichh length influences the mRNA fate (including stability, translocation to the cytoplasm and translation) [1]. The complex responsible for this process is built from ~30 proteins forming the core of the complex and over 50 auxiliary proteins [2]. Here, we present detailed analysis of intrinsically unstructured regions (IURs) and domains of the core proteins. IURs are predicted by GeneSilico MetaDisorder [3]. This part of analysis revealed that up to 51% of residues of the analyzed proteins can be classified as IURs. Next, we have characterized the domains of the proteins forming the complex. In the first stage, we have mapped all domains from the PFAM database. Not surprisingly, the most abundant domains are: RRMs (RNA recognition motif), WD40 and different kinds of zinc finger domains (the last two types of domains are responsible for forming protein-protein protein and protein-RNA RNA interactions). The results showed that over 43% of residues were assigned to PFAM domains. Next, for the unassigned regions, we used DomainSVM DomainSVM program, which predicts domains boundaries using amino acid composition, entropy, hydrophobicity, solvent accessibility, protein disorder, and the tendency to form secondary structure [4]. In the final stage of the analysis, the homology models for for all proteins were built. They were based on templates found by GeneSilico Metaserver [5]. The modeling procedure was done using MODELLER program [6]. This allows to construct potential models of unknown domains not available so far. The results of the analysis can be used as a good starting point for the design of experiments for the most interesting targets and testing the validity of proposed models of action of pre-mRNA pre 3'-end complex. [1] J.E. Darnell, RNA, 2013,, [Epub ahead of print] PMID: 23440351. 234403 [2] Y. Shi, D. Giammartino, et al., Molecular Cell, 2009, 33, 365-376. [3] L.P. Kozlowski, J.M. Bujnicki, BMC Bioinformatics, 2012, 13, 111. [4] L.P. Kozlowski, J.M. Bujnicki, [manuscript in preparation] [5] M.A. Kurowski, J.M. Bujnicki, Nucleic Acid Res., 2003, 31, 3305-7. [6] N. Eswar, D. Eramian, B. Webb, M.Y. Shen, A. Sali, Methods Mol Biol. 2008;; 426, 145-59. 145

32

Session IV: Protein disorder

Lecture 4: Oxana V. Galzitskaya To be folded or to be unfolded? Oxana V. Galzitskaya*, Michail Yu. Lobanov Institute of Protein Research of the Russian Academy of Sciences, 4 Institutskaya str., Pushchino, Moscow Region, 142290, Russia *presenting author, e-mail: [email protected]

A new method (IsUnstruct) based on the Ising model for prediction of disordered residues from protein sequence alone has been developed [1, 2]. According to this model, each residue can be in one of two states: ordered or disordered. The general idea is new and has the distinct advantage over various machine learning methods: the model used to describe protein disorder can have a direct physical interpretation. For this method we have used the potentials derived from the clustered Protein Data Bank where there are clusters of chains of different identity inside each cluster, http://bioinfo.protres.ru/st_pdb_2012/. For the first time, in our method we have added the library of disordered patterns (171) constructed from the recently clustered PDB (version 9 April 2012) [3,4]. The IsUnstruct method has been compared with other available methods and found to perform well. Using a novel algorithm which was not trained on proteins with large disordered regions, the method correctly finds 72% of disordered residues as well as 83% of ordered residues in the Disprot database version 5.9. The server is available at the site: http://bioinfo.protres.ru/IsUnstruct.

Figure. Prediction of intrinsically disordered residues in Human P53 (Disprot ID is DP00086) by using three programs: IsUnstruct, PONDR-FIT, and ESpritz. Blue rectangles correspond to the crystallized domains. The dashed lines at 0.5 of the Y-axis are threshold lines for residues to be disordered. [1] M.Yu. Lobanov., O.V. Galzitskaya. Phys. Biol., 2011, 8, 035004. [2] M.Yu. Lobanov., I.V. Sokolovskiy, O.V. Galzitskaya. J. Biomol. Struct. Dyn. 2012. doi 10.1080/07391102.2012.718529 [3] M.Yu. Lobanov, O.V. Galzitskaya. Mol. BioSyst., 2012, 8, 327 – 337. [4] M.Yu. Lobanov, O.V. Galzitskaya. PLOS ONE, 2011, 6(11), e27142.

33

Session IV: Protein disorder

Lecture 5: Malgorzata Kotulska How proteins forming amyloids can be recognized computationally Malgorzata Kotulska* Institute of Biomedical Engineering and Instrumentation, Wroclaw Institute of Technology, 50370 Wroclaw, Poland *presenting author, e-mail: [email protected]

Proteins capable of forming fibrils are known as amyloids. Their intramolecular contact sites pattern change in such a manner that they form characteristic zipper aggregates which deprive the protein of its physiological and functional structure. A number of amyloidogenic diseases, which are due to misfolding of a protein into an amyloid fibril, is constantly increasing and include Alzheimer disease (amyloid-β, tau), Parkinson disease (α-synuclein), type 2 diabetes (amylin), Creutzfeldt-Jakob disease (prion protein), Huntington disease (huntington), amyotrophic lateral sclerosis (SOD1), etc. They affect a growing number of people, especially in well developed countries. Recognition of factors responsible for protein misfolding can contribute to better understanding of its mechanisms and potential drug design. Recent studies indicate that short segments of aminoacids, which are called hot spots, can underly amyloidogenic properties of a protein. Those fragments are harmless only when they are burried inside a protein. The amyloidogenic fragments responsible for amyloidogenicity of the whole protein are believed to be 4-10 residues long and it is often assumed that 6-residue fragments of amyloidogenic properties are sufficient “hot spots”. Recognition of amyloidogenic fragments can be obtained by computational approach. In this talk we discuss if classical machine learning methods can be applied for recognition of such hexapeptides and how much we need to know about the training data. We also show an original machine learning method for classification of biological sequences (e.g. sequences of aminoacids), based on discovering a segment with a discriminative pattern of correlations between sequence elements. The pattern is based on location of correlated couples of elements in the window. The algorithm first recognizes the most relevant training segment in each positive training instance. Classification is based on maximal differences between correlation matrices of the relevant segments in positive training sequences and negative training segments. The method efficiency is tested on amyloid proteins.

34

Session IV: Protein disorder

Lecture 6: Vikram Alva Kullanja On the origin of proteins from peptides Vikram Alva1*, Johannes Söding2, and Andrei N. Lupas1 1 2

Max Planck Institute for Developmental Biology, Tübingen, Germany

Gene Center Munich and Department of Biochemistry, Ludwig-Maximilians-Universtät München, Munich, Germany *e-mail: [email protected]

Though seemingly boundless, the diversity of proteins in nature is in fact rather narrowly circumscribed. Many proteins share recognizable similarity in sequence and structure, since they arose by amplification, recombination, and divergence from a basic complement of domains, many of which can be traced back to the time of the ‘Last Universal Common Ancestor’. In fact, sequence comparisons show that the millions of proteins we observe today are built from only about 10,000 domain types. While it is generally agreed upon that modern proteins arose from a limited set of domains, the origin of this set itself is poorly understood. Even the simplest domains are too complex to have arisen de novo. How did they then evolve? We are pursuing the hypothesis that they arose by fusion and accretion from an ancestral set of peptides active as co-factors of RNA-dependent replication and catalysis in the RNA world [1]. We reasoned that if this hypothesis is true, systematic studies should allow a description of this peptide set in the same way in which ancient vocabularies have been reconstructed from the comparative study of modern languages. To this end, we compared domains of known structure using a sequence- and structure-based approach and identified 50 fragments that occur in domains of different folds, yet show significant similarities in sequence and structure. A third of these fragments had been noticed individually before, confirming the validity of our approach. Based (1) on their widespread occurrence, (2) on their involvement in the most ancient functions, e.g. nucleic acid-binding and metal-binding, (3) on their occurrence in the most ancient folds, e.g. the P-loop NTPases and ribosomal proteins, and (4) on their enrichment in nucleic acid-binding folds, we propose that these fragments may represent the observable remnants of the RNApeptide world from which the first folded domains arose. Intriguingly, we did not find a single case of a domain containing two different fragments from our set. However, we do find many examples of domains that are built from the same fragment either by accretion or by amplification. Our results therefore indicate that repetition and accretion, and not assembly from non-identical fragments, were the main factors in the emergence of domains.

[1] AN. Lupas, CP. Ponting, RB. Russell, J Struct Biol., 2001, 134(2-3), 191-203.

35

Session V: Biomacromolecular simulations chair: Wiesław Nowak (Nicholas Copernicus University in Toruń, Poland)

36

Session V: Biomacromolecular simulations

Lecture 1: Valerie Daggett

37

Session V: Biomacromolecular simulations

Lecture 2: Johan Åqvist Optimization of thermodynamic activation parameters in enzyme catalysis Johan Åqvist Uppsala University, Sweden [email protected]

A major issue for organisms living at extreme temperatures is to preserve both stability and activity of their enzymes. Cold-adapted enzymes generally have a reduced thermal stability, to counteract freezing, and show a lower enthalpy and a more negative entropy of activation compared to mesophilic and thermophilic homologs. Such a balance of thermodynamic activation parameters can make the reaction rate decrease more linearly, rather than exponentially, as the temperature is lowered, but the structural basis for rate optimization towards low working temperatures remains unclear. We will report results from extensive computer simulations of differently adapted citrate synthases and trypsins exploring the structure−function relationships behind catalytic rate adaptation to different temperatures.

38

Session V: Biomacromolecular simulations

Lecture 3: Andrzej Koliński Combining atomic-level Molecular Dynamics with coarsegrained Monte-Carlo dynamics Andrzej Koliński Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw *presenting author, e-mail: [email protected]

It is widely recognized that atomistic Molecular Dynamics (MD), a classical simulation method, captures the essential physics of protein dynamics. That idea is supported by a theoretical study showing that various MD force-fields provide a consensus picture of protein fluctuations in aqueous solution [1]. However, atomistic MD cannot be applied to most biologically relevant processes due to its limitation to relatively short time scales. Much longer time scales can be accessed by properly designed coarse-grained models. We demonstrate that the aforementioned consensus view of protein dynamics from short (nanosecond) time scale MD simulations is fairly consistent with the dynamics of the coarse-grained protein model - the CABS model [2]. The CABS model employs stochastic dynamics (a Monte Carlo method) and a knowledge-based force-field, which is not biased toward the native structure of a simulated protein. Since CABSbased dynamics allows for the simulation of entire folding (or multiple folding events) in a single run, integration of the CABS approach with all-atom MD promises a convenient (and computationally feasible) means for the long-time multiscale molecular modeling of protein systems with atomistic resolution [3]. Multiscale protein dynamics and structure prediction modeling tools are freely available from LTB homepage: http://biocomp.chem.uw.edu.pl

Figure 1: Mobility profile (left side) of 1I6F protein from short-time CABS trajectory. Comparisons of coarse-grained CABS (in red) and all-atom, various force fields, MD mobility curves. [1] M. Rueda, C. Ferrer-Costa, T. Meyer, A. Perez, J. Camps, A. Hospital, J. L. Gelpi, M. Orozco, Proc. Natl. Acad. Sci. U. S. A. 2007, 104, 796−801. [2] M. Jamroz, M. Orozco, A. Kolinski, S. Kmiecik, J. Chem. Theo. Comp. 2013, 9, 119-125. [3] S. Kmiecik, D. Gront, M. Kouza, A. Kolinski, J. Phys. Chem. B. 2012, 116, 7026-7032. 39

Session V: Biomacromolecular simulations

Lecture 4: Bert de Groot Molecular dynamics of channel inhibition, permeation and gating. Bert de Groot

Max Planck Institute for biophysical chemistry Göttingen, Germany [email protected]

Can we design specific membrane channel inhibitors? What is the antimicrobial mechanism of the human antibiotic dermcidin? What are the driving forces that determine lipid ordering around membrane proteins? What are the molecular determinants of channel permeation and gating? These are some of the questions that are addressed at the atomic level by molecular dynamics simulations.

[1] Sören J. Wacker, Camilo Aponte-Santamaria, Per Kjellbom, Soren Nielsen, Bert L. de Groot, Michael Rützler. The identification of novel, high affinity AQP9 inhibitors in an intracellular binding site. Molecular Membrane Biology 30:246-260 (2013). [2] Ulrich Zachariae, Robert Schneider, Rodolfo Briones, Zrinka Gattin, Jean-Philippe Demers, Karin Giller, Elke Maier, Markus Zweckstetter, Christian Griesinger, Stefan Becker, Roland Benz, Bert L. de Groot, and Adam Lange. Beta-barrel mobility underlies closure of the voltage-dependent anion channel. Structure. 20:1540-1549 (2012). [3] Camilo Aponte-Santamaria, Rodolfo Briones, Andreas D. Schenk, Thomas Walz, Bert L. de Groot. Molecular driving forces defining lipid positions around aquaporin-0. Proc. Nat. Acad. Sci. 109: 44319-44325 (2012). [4] Carsten Kutzner, Helmut Grubmüller, Bert L. de Groot, Ulrich Zachariae. Computational Electrophysiology: The Molecular Dynamics of Ion Channel Permeation and Selectivity in Atomistic Detail.

Biophys. J. 101: 809-817 (2011).

40

Session V: Biomacromolecular simulations

Lecture 5: Sławomir Filipek G-Protein-Coupled Receptors Molecular Switches and Activation Mechanisms Sławomir Filipek Biomodeling group, University of Warsaw, Faculty of Chemistry, Warsaw, Poland e-mail: [email protected]

G protein coupled receptors (GPCRs) constitute the largest human protein family. They are membrane proteins that, upon activation by extracellular agonists (or light in case of rhodopsin), pass the signal to the cell interior and for this reason they are pharmacologically very important. Experimental as well as theoretical studies confirmed that the process of receptor activation consists of actions of so-called molecular switches buried in the receptor structure (Fig. 1).

Fig. 1. Examples of molecular switches in GPCRs [1].

The Biomodeling group investigates activation mechanisms of GPCRs as well as passing the signal to G protein and inactivation by arrestin. Apart from molecular switches the role of water in activation mechanism of those membranous proteins is also crucial as it was recently modeled using molecular dynamics simulations for N-Formyl Peptide Receptor FPR1 [2]. The recent crystal structures of GPCRs with agonists, antagonists and inverse agonists made it possible to elucidate the mechanism of receptor activation and passing the signal from the ligand binding site to the intracellular side for some GPCRs from the best characterized family A (rhodopsin-like). However, the recently discovered signaling via arrestin, the possibility of allosteric modulation, and also a formation of functional receptor dimers provide additional levels of complexity in functioning of GPCRs. In 2012, the Nobel Prize in Chemistry was awarded to Kobilka and Lefkowitz for their sophisticated biochemical and crystallographic studies that were "crucial for understanding how G-protein–coupled receptors function". The overview of their work as well as a discussion on structures and mechanism of signal passing via GPCRs can be found in a recent review [3]. [1] B. Trzaskowski, D. Latek, S. Yuan, U. Ghoshdastider, A. Debinski, S. Filipek, Curr. Med. Chem. 2012, 19, 1090-1109. [2] S. Yuan, U. Ghoshdastider, B. Trzaskowski, D. Latek, A. Debinski, W. Pulawski, R. Wu, V. Gerke, S. Filipek, PLOS ONE 2012, 7, e47114. [3] D. Latek, A. Modzelewska, B. Trzaskowski, K. Palczewski, S. Filipek, Acta Biochim. Pol. 2012, 59, 515-529.

41

Session V: Biomacromolecular simulations

Lecture 6: Jarek Meller Ultrafast clustering of protein structures for macromolecular simulations and model quality assessment Jarek Meller University of Cincinnati, OH, USA & Nicholas Copernicus University, Toruń, Poland [email protected]

Molecular simulations and de novo folding methods can generate very large numbers of protein conformations, e.g. representing conformational ensembles in solution, folding trajectories or collections of putative models for folded states. Clustering of such generated models or intermediate states can subsequently be used in order to identify best models or relevant folding intermediates. As the number of structures to be analyzed grows with advances in hardware and software, the computational cost of the clustering step becomes an important issue to consider. We present a novel ultrafast clustering approach that avoids the difficulties of comparing and aligning 3D structures by using 1D projections into discrete secondary structure, torsional angle, solvent accessibility and related states. With appropriate preprocessing, using such profiles allows one to perform implicit comparison of all pairs of structures (and identify for them similar substractures in the spirit of 3D-Jury) in linear (O(N_struct)) time. This led to very efficient consensus-based MQA method dubbed 1D-Jury. Another important advantage of this approach is that computationally expensive heuristics for the identification of similar substructures, such as MaxSub, can be avoided. As a result, a general, accurate and ultrafast clustering approach can be achieved, as demonstrated by comparison with currently available efficient clustering methods.

42

Session VI: Systems biology chair: Janusz M. Bujnicki (International Institute of Molecular and Cell Biology in Warsaw & Adam Mickiewicz University in Poznań, Poland)

43

Session VI: Systems biology

Lecture 1: Dennis Bray PROTEIN computers versus microchips Dennis Bray University of Cambridge & Oxford Centre for Integrative Systems Biology, UK Cells are built up of molecular circuits that perform logical operations, analogous in many ways to electronic devices but with unique properties. Proteins and other molecules act like miniature transistors to guide the biochemical processes of a cell; linked into huge networks they form the basis of all of the distinctive properties of living systems. However, as illustrated by our work on bacterial chemotaxis, the simple form of behaviour in which bacteria smell and swim towards distant sources of food, living circuitry differs in fundamental respects to silicon devices. It has unique features such as a highly malleable internal architecture and an existence of a multitude of molecular states that we cannot yet reproduce. The chemical states of a cell encode its past experiences and allow it to anticipate future conditions. Predictions underlie the goal-oriented movements of all animals, even single cells, and form the basis of brain function in humans.

44

Session VI: Systems biology

Lecture 2: Edward M. Marcotte A Mass-Spectrometry-Based Map of Universally-Shared Animal Protein Complexes Blake Borgeson1*, Cuihong Wan2*, Ophelia Papoulas1, Daniel R. Boutz1, Andrew Emili2, Edward M. Marcotte1 1

Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas, Austin, Texas, USA 78712 2

Banting and Best Department of Medical Research, Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada

An important aspect of a protein’s function is its assembly with other proteins into higher molecular weight complexes that are typically the active species in the cell. Knowledge of protein complexes often reveals proteins’ functions, especially when some members of the complex are better characterized than others. This problem is critically important for the core set of proteins shared by every animal cell, which form the basic machinery common to every human cell and the cells of many major model organisms; more than 1/3 of these key proteins are still largely uncharacterized. More generally, maps of protein complexes provide the mechanistic foundations for understanding diverse human traits and diseases. One approach that is proving remarkably powerful is based on systematically mapping protein complexes by native biochemical fractionation and high-throughput mass spectrometry. In this paradigm, protein complexes are computationally inferred from the separation behaviour of proteins across many, independent biochemical fractionations. We have launched a major effort to apply this strategy to define the the set of major, stable protein complexes shared across animal cells. In initial studies, we separated cultured human cell extracts into >2,000 biochemical fractions, subsequently analyzed by tandem mass spectrometry (nearly 9,000 hours of instrument time), thereby enriching and systematically identifying 622 putative native soluble protein complexes and documenting ~14,000 protein interactions. We have now extended these studies to samples from 7 animal lineages, analyzing >70 biochemical fractionations comprising >7,000 distinct biochemical fractions, in all capturing the biochemical separation behaviour of ~12,000 animal proteins. I’ll describe our progress analyzing these data and our efforts to define the core set of stable protein complexes conserved across metazoa, as well to measure the evolutionary conservation, divergence, and rewiring of protein complexes across full animal proteomes.

45

Session VI: Systems biology

Lecture 3: Stanislaw Dunin-Horkawicz Bioinformatics analysis of cis-regulatory RNA motifs Dorota Matelska (1), Elzbieta Purta (1), Sylwia Panek (1), Michal J. Boniecki (1), Janusz M. Bujnicki (1,2), and Stanislaw Dunin-Horkawicz* (1) (1) Laboratory of Bioinformatics and Protein Engineering Institution, International Institute of Molecular and Cell Biology, Warsaw, 02-109, Poland (2) Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, Poznań, 61-614, Poland *presenting author, e-mail: [email protected]

Cis-regulatory structural RNA motifs play an important role in the regulation of gene expression in prokaryotes. A variety of them have been identified and characterized, but many still await identification. This especially concerns RNA motifs which taxonomic distribution is limited or scattered, and those, which are present in the context of evolutionary unrelated genes. To address this problem, we implemented an automated bioinformatics pipeline for detection and classification of RNA motifs in non-coding regions of transcriptional units (operons). This pipeline facilitates the process of collecting genes associated with a given metabolic or functional pathway and allows identifying associated RNA motifs. The procedure involves prediction of operons, detection and sequence-based clustering of non-coding regions (e.g. UTRs and intergenic regions), recognition of potential structural RNA motifs, and annotation of these motifs. Using the presented method we analyzed various pathways such as aminoacid metabolism, nucleic acids metabolism, aminoacyl-tRNA biosynthesis, pyruvate metabolism, chemotaxis, ribosome assembly, just to name few of them. Here, we present an analysis of RNA regulatory motifs associated with ribosomal proteins. Their genes are typically grouped within highly conserved operons. In many cases, one or more of the encoded proteins not only bind to a specific site in the ribosomal RNA, but also to a motif localized within non-coding region of its own mRNA, and thereby regulate expression of the operon. In course of our work, we comprehensively studied evolutionary relationships between such regulatory systems and characterized involved RNA motifs. One of the motifs, present within the 5ʹ UTRs of operons encoding ribosomal proteins S6 and S18, we analyzed experimentally and demonstrated that it is specifically recognized by the S6:S18 protein complex. The motif is predicted to adopt a hairpin-like structure characterized by a bulge harboring a conserved “CCG” sequence. A similar structure containing a “CCG” trinucleotide forms the S6:S18 complex binding site in 16S ribosomal RNA. We have constructed a 3D structural model of a S6:S18 complex with the motif, which suggested that the “CCG” trinucleotide in a specific structural context may constitute a recognition determinant. This prediction was supported by site-directed mutagenesis of both RNA and protein components. Analysis of the S6:S18 complex-binding motif provided a molecular basis for understanding the role of molecular mimicry in protein-RNA recognition and triggered us to start further analysis on minimal recognition determinants in other ribosomal protein-recognition RNA motifs.

46

Session VI: Systems biology

Lecture 4: Lars J. Jensen Network biology - Large-scale data integration and text mining Lars J. Jensen Novo Nordisk Foundation Center for Protein Research in Copenhagen & Intomics, Denmark Methodological advances have in recent years given us unprecedented information on the molecular details of living cells. However, it remains a largely unsolved challenge to link molecular-level data to the phenotypic consequences at the cellular level such as diseases. One reason for this is that biology is facing the limitations of reductionism: most phenotypes cannot be attributed to a single gene – they can only be understood at the systems level. Networks have proven to be a very useful abstraction for bridging single-gene and systems-level analysis. In my presentation I will describe the STRING database (http://string-db.org), which scores and integrates evidence from a diverse range of curated databases, raw data repositories, text-mining methods, and computational prediction methods to provide the most comprehensive protein association network possible. I will also introduce a suite of three new web-based resources that use similar techniques to associate the proteins in STRING with cellular compartments (http://compartments.jensenlab.org), tissues (http://tissues.jensenlab.org), and diseases (http://diseases.jensenlab.org). These resources aim to enable systems biology studies of diseases, taking into account both interactions and spatial localization of the proteins.

47

Session VI: Systems biology

Lecture 5: Teresa Przytycka Towards uncovering phenotype/genotype relationships and systems level modeling of tumor heterogeneity Teresa M. Przytycka National Center of Biotechnology Information, NIH

Uncovering and interpreting genotype/phenotype relationships are among the most challenging open questions in disease studies. In complex diseases, such as cancer, uncovering of this relationship is complicated by the heterogeneous nature of these diseases. We have recently developed two complementary approaches to address this challenge. First, called module cover [1] captures differences between patients by identifying differences in dys-regulated modules. The second approach is based on a probabilistic mixture modeling [2]. Based on phenotypic similarity between the patients and a spectrum of possible disease causes/explanation such as mutations, copy number variation, microRNA levels, etc. the method identifies disease subtypes together with their causes and models the disease of each patient as a mixture of the subtypes. Applying these approaches to caner data provided a number of new insights. Taken together, these approaches help to fill a significant gap between the general current understanding of cancer and existing approaches to model cancer diversity. [1] Kim YA, Salari R, Wuchty S. and Przytycka TM. Module cover - a new approach to genotype-phenotype studies. Pac Symp Biocomputing (PSB) 2013:135-46. [3] Cho DY, Przytycka TM. Dissecting Cancer Heterogeneity with probabilistic genotypephenotype model. RECOMB 2013, accepted.

48

Session VI: Systems biology

Lecture 6: Ewa Szczurek Racing driver gene teams in cancer Szczurek Ewa Department of Biosystems Science and Engineering, ETH Zurich and SIB Swiss Institute of Bioinformatics e-mail: [email protected]

Systems biology of cancer is dominated by the search for genes that drive malignant progression (driver genes), and efforts to reconcile those drivers into functional pathways (teams). It has been observed that functional relations between teams of driver genes are reflected in their mutational patterns (Vandin et al, Cancer Research 2012). Here, we propose a significance-based approach of identification and evaluation ("racing") of driver gene teams from mutation data measured in large cohorts of tumors. For a given set of genes that candidates to be a functionally related cancer driver team, we draw two factor graphs. One encodes the probability of observed mutations under the model that the genes are truly related and show a specific pattern, which might be obscured by errors. The other evaluates the probability of the same data under the mutational independence model that assumes that these genes are unrelated. Finally, we apply a likelihood ratio test for ranking and assessment of the candidate gene sets.

49

Closing lecture: Piotr Zielenkiewicz

50

Closing lecture

Quantitative, time-resolved model of translation Piotr Zielenkiewicz1 1

Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw

We have developed a comprehensive and quantitative model of translation, characterizing protein synthesis separately for individual genes. Each gene is attributed with a set of translational parameters, namely: absolute number of transcripts, ribosome density, codon mean translation time, transcript total translation time, total time required for translation initiation and elongation, probability of translation initiation, mRNA mean lifetime and absolute number of proteins produced by gene transcripts. Most parameters are calculated only on the basis of one experimental dataset of genome-wide ribosome profiling. The model is also implemented as a web service showing translational activity of genes in the three organisms: Escherichia coli, Saccharomyces cerevisiae and Homo sapiens. The simulation of ribosome movement is interactive and allows to modify the coding sequence on the fly. The software allows to upload any coding sequence and simulate its translation in one of the three organisms, which may prove useful for heterologous expression. The software may be accessed at http://nexus.ibb.waw.pl/Transimulation and its source code is freely available at the same address.

51

Posters

52

Posters

Poster 1: Reidar Andresson Tools for Efficient Primer and Oligo Design Triinu Kõressaar1,2, Lauris Kaplinski1,2, Tarmo Puurand1,2, Tõnu Möls1, Reidar Andreson*1,2, Maido Remm1,2, ELIXIR project3 1Department

of Bioinformatics, University of Tartu, Riia 23C, 51010 Tartu, Estonia 2Estonian

Biocentre, Riia 23C, 51010 Tartu, Estonia 3http://www.elixir-europe.org/

*presenting author, e-mail: [email protected]

Polymerase chain reaction (PCR) as a molecular biology technique is widely used in multiple research fields from analysis of genes or other DNA regions to studies about genetic variants and used in high-throughput sequencing. Various methodologies use also hybridization oligonucleotides for detecting species from samples and diagnosis of diseases for example. Therefore, the success of the experiment depends highly on an efficient design process. We present the collection of web-based tools for PCR primer and oligo design. These are divided into three different packages: PCR primer design, GenomeMasker and hybridization probe design. Primer design package contains two main tools: mPrimer3 and MultiPLX. mPrimer3 (http://bioinfo.ut.ee/mprimer3/) is an enhanced version of the widely used primer design program Primer3. Tool includes several improvements as updated thermodynamic models with up-to-date formulas for calculating melting temperature and salt correction. Additional enhancements include the calculation of effects of divalent cations and the ability to avoid masked template sequences for primer design. MultiPLX (http://bioinfo.ut.ee/multiplx/) analyzes PCR primer compatibility and finds automatically optimal multiplexing (grouping) solution. Program includes nearest-neighbor DNA binding thermodynamics to estimate possible unwanted pairings between PCR samples. Given data is used to distribute primers into groups that satisfy user defined set of constraints. GenomeMasker package tool SNPmasker (http://bioinfo.ut.ee/snpmasker/) allows rapid masking of repeats and single nucleotide polymorphisms (SNPs) in DNA templates. Masked sequences can be efficiently used within PCR or oligo hybridization methodologies to avoid non-unique regions from small to large genomes. Second application - GenomeTester (http://bioinfo.ut.ee/genometester/) - tests whether PCR primers have excessive number of binding sites on DNA template (e.g. the whole genome of given species) sequence, how many PCR products would be amplified from it, and where are they located. Having too many PCR primer binding sites will typically result in failed PCR. Additionally, amplifying more than one product is undesirable because alternative PCR products could cause false positive signals in genotyping methodologies. Testing primers in silico with GenomeTester before ordering them is fast and helps to avoid reaction failures in later stages. SLICSel (http://bioinfo.ut.ee/slicsel/) creates specific oligonucleotide hybridization probes for microbial detection and identification. Their design is also based on nearest-neighbor thermodynamic modeling. SLICSel can be used separately for different areas where DNA or RNA oligonucleotide probe design is necessary including microbial diagnostics and environmental monitoring, bio threat detection, industrial process monitoring, and clinical microbiology. All these tools are freely accessible without any registration. Binary executables of the programs are available for download at http://bioinfo.ut.ee/ (Tools and Services -> Downloads).

53

Posters

Poster 2: Piotr Bentkowski Modeling evolution of genome size in prokaryotes in response to changes in their abiotic environment Piotr Bentkowski 1 2 *, Hywel T. P. Williams 3, Thomas Mock 1, Timothy M. Lenton 3 1 2

School of Environmental Sciences, University of East Anglia, Norwich, UK

Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Warsaw, Poland

3

College of Life and Environmental Sciences, University of Exeter, Exter, UK *presenting author, e-mail: e [email protected]

The size of the genomes of known free-living free prokaryotes varies from ~1,3 Mbp to ~13 Mbp [1] with on average 85% of base pairs coding for proteins [2]. A number of possible explanations have been suggested in the literature [3][4]. Here we propose that the temporal variability of the environmental conditions can alter the genome size of a free-living living prokaryotic population. In a stable environment the competition for the resource becomes the main force of selection and smaller (thus cheaper) genomes are favored. In more variable conditions larger genomes with more genes will ll be preferred, as they have a wider range of response to a less predictable environment, despite being more costly. An agent-based agent based model (ABM) of genome evolution in an free-living living prokaryotic population has been proposed. Using a classic Hutchinson niche nich space model, a gene was defined as a Gaussian function over a corresponding niche dimension. The cell can have more than one gene along a given dimension, and the envelope of all the corresponding responses is considered a full description of a cell’s phenotype phenotype over that dimension (see Fig.). Gene deletion, gene duplication, and modifying mutations are permitted during reproduction, so the number of genes and their phenotypic effect (height and position of the Gaussian envelope) are free to evolve. The surface urface under the curve is fixed to prevent ’supergenes’ from occurring. Change of the environmental conditions is simulated as a bounded random walk with a varying length of the step (a parameter representing variability of the environment). Using this approach roach the model is able to reproduce the phenomenon of genome size reduction in more stable environments (analogical to e.g. oligotrophic gyre regions of the ocean) and genome complexification in variable environments. Horizontal gene transfer (HGT) was also also introduced, but was found to act in a similar manner as gene duplication and shown no important contribution to the speed of evolution and the adaptive potential of the population. [1] E. Koonin, Y. Wolf, Nucleic Acid Research, Research 2008, 36, 6688-6719. [2] M. Lynch, Annual Review of Microbiology, Microbiology 2006, 60, 327-349. [3] J. A. G. Ranea, A. Grant, J. M. Thornton, C. A. Orengo, Trends in Genetics, Genetics 2005, 21, 21–25. [4] D. O. Hessen, P. D. Jeyasingh, M. Neiman, L. J. Weider, Trends in Ecology & Evolution, Evolution 2010, 25, 75–80.

54

Posters

Poster 3: Paweł Błażej Computer simulation of prokaryotic genomes evolution under two mutational pressures Paweł Błażej, Paweł Mackiewicz, Małgorzata Grabińska, Stanisław Cebrat Department of Genomics, Faculty of Biotechnology, University of Wrocław *[email protected]

We created a simulation model of prokaryotic genome evolution to analyze the influence of two mutational pressures associated with differently replicated DNA strands on evolution of protein coding sequences. The simulations were made on a population of individuals (genomes) which consisted of protein coding genes extracted from bacteria Borrelia burgdorferi genome. The process of mutation accumulation was modeled by a realization of continuous in time and time reversible Markov process with given nucleotide substitution rate matrix and stationary distribution of nucleotides which were computed from the real data. We considered two separate rate matrices for two differently replicated DNA strands (called leading and lagging). During simulations individuals were subjected both to the direct mutational pressure i.e. characteristic of the DNA strand on which they were located in the real genome and to the reverse pressure, typical of the reverse strand. The simulated genomes could be eliminated because of two reasons: (i) stop translation codon appearance in at least one of their protein coding sequence and (ii) the loss of coding signal in their genes, which was validated by the algorithm for recognition of protein coding sequences. The main idea of this algorithm was based on the assumption that any protein coding sequences is a realization of three independent homogeneous Markov chains that describe transitions between nucleotides for each of three codon positions in a given DNA sequence separately. We observed that the selection against stop codon was more deleterious than the selection for the loss of coding signal in genes. In addition the reverse pressures destroyed the coding signal weaker than the direct pressure. It is in agreement with very frequent change of genes’ location between differently replicated DNA strands observed in real bacterial genomes.

55

Posters

Poster 4: Michał J. Boniecki SimRNA: a program for RNA folding simulations Michał J. Boniecki1*, Grzegorz Łach1, Konrad Tomala1, Tomasz Sołtysiński1, Paweł Łukasz1, Kristian Rother 1, 2, Janusz M. Bujnicki1,2 1 International Institute of Molecular and Cell Biology, ul. Trojdena 4, 02-109 Warsaw, Poland 2 Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, ul. Umultowska 89, 61-614 Poznań, Poland *presenting author, e-mail: [email protected]

Structure and behavior of large biologically relevant molecules can be approximated by using computational models and methods. We have developed a method for RNA structure simulations named SimRNA. The method is based on a simplified representation of nucleotide chain, a statistically derived model of interactions, and Monte Carlo methods for sampling of conformational space. Backbone of RNA chain is represented by P and C4’ atoms, whereas nucleotide bases are represented by free atoms: N1-C2-C4 for pyrimidines and N9-C2-C6 for purines. As nucleotide base is rigid, selection of these three atoms allows for tracking of the positions of all others atoms of the base. Moreover three atoms set a local system of coordinates which is an anchor for positioning three-dimensional grids responsible for calculating basebase interactions. Despite base being represented by only three atoms other atoms can be implicitly taken into account in terms of excluded volume (space occupied by atoms). It allows for quite accurate reconstruction of excluded volume of the entire base. All base-base interactions were modeled using three-dimensional grids built on local systems of coordinates. 3D grid is a three-dimensional map of potential for contact with the other base. When contact occurs one base is reference base (three atoms), whereas the other (contacting base) is represented as a single point of interaction. This point corresponds to geometrical center of the base. Calculation of total energy of contacting bases requires a two-step procedure when first and then second base becomes a reference. Because the definitions of three-dimensional grids do not refer to the backbone atoms (namely their definition is based only on the atoms of the base) the base-base interactions do not depend on the backbone conformation, which allows proper recapitulation of regions of irregular trace of the backbone. All terms of the energy function used were derived from a manually curated database of crystal RNA structures, as a statistical potential. Sampling of the conformational space was accomplished by the use of asymmetric Metropolis algorithm coupled with a dedicated set of moves. The algorithm was embedded in either simulated annealing or replica exchange Monte Carlo method. Recent tests demonstrated that SimRNA is able to predict basic topologies of RNA molecules with size up to about 50 nucleotides, based on their sequences only, and larger molecules if supplied with appropriate distance restraints. The user can specify various types of restraints, including restraints on secondary structure, distance and positions. SimRNA can be applied for systems composed of several chains of RNA. SimRNA is also able to fold/refine structures with decreased regularity of backbone trace (RNA pseudo-knots, coaxial stacking, bulges). As SimRNA is constructed as a folding method based on physical foundations it also allows for examining folding pathways, getting approximate view of the energy landscape, and investigations of thermodynamic issues of RNA systems. Acknowledgements: The work was supported by the Polish Ministry of Science (HISZPANIA/152/2006 grant to Janusz M. Bujnicki and PBZ/MNiSW/07/2006 grant to Michał Boniecki) and by the EU (6FP grant “EURASNET” LSHG-CT-2005-518238) and DFG (SPP 1258).

56

Posters

Poster 5: Marcin Borowski De novo peptide sequencing with using "meta-spectra" Marcin Borowski e-mail: [email protected]

In recent years we have witnessed a massive flow of new biological data. Large-scale sequencing projects throughtout the world turn out new sequences, and create new challenges for investigators. These ongoing sequencing efforts have already uncovered the sequences of over 100,000 proteins. De novo methods are essential to identify proteins when the genomes are not known but they are also extremely useful even when the genomes are known since they are not affected by errors in a search database. Another advantage of de novo methods is that the partial sequence can be used to search for post translation modifications or for the identification of mutations by homology based software. The tandem mass spectrometry fragments a large number of molecules of the same peptide sequence into charged prefix and suffix subsequences, and then measures mass/ charge ratios of this ions. The de novo peptide sequencing problem is to reconstruct the peptide sequence from a given tandem mass spectral data o k ions. By simplicity transforming the spectral data into an special graph G = (V, E). In some cases posttranslational modifications or other exceptions may lead to difficulties and errors in de novo identification process. To improve quality of identification and reduce number of possible error, modification of approach and algorithm based on dynamic programming has been proposed [1]. The new approach is based on constructing "meta-spectra" from few repetitions of biochemical experiment. The next step is to transform such a spectra data into directed graph, where |V| = 2k + 2. The solution can be found in O(|V|+|E|) time and O(|V|) space using dynamic programming. Our approach has been tested on few peptides of known sequences. [1] T. Chen, M. Y. Kao, M. Tepel, J. Rush, G. M. Church. A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry. Journal of Computational Biology, 2001, 8:325–337.

57

Posters

Poster 6: Maciej Bratek Comparison of parameters of model linear alkanes in different all-atom all forcefields Maciej Bratek*, Krzysztof Murzyn Department of Computational Biophysics and Bioinformatics, Faculty of Biochemistry, Biophysics, and Biotechnology, Jagiellonian University, Kraków, Poland *presenting author, aut e-mail: [email protected]

The OPLS-AA AA force field [1] has been successfully used in Molecular Dynamics (MD) simulations of proteins, carbohydrates and nucleic acids. While its parameterization of decane and shorter alkanes provides satisfactory reproduction of experimental results, it fails in the simulations involving longer alkanes. There have been several attempts to correct this flaw, including alternating point charges of H and C atoms in hydrocarbon chains in such a way that they match the calculated dipole moment of C-H C H bonds in hydrocarbons [2,3] and reparameterization of selected torsion potentials. In this study we have determined various bulk liquid phase properties of pentadecane and compared them with relevant experimental results. The The calculated properties included: enthalpy of vaporization, density, and self-diffusion self diffusion coefficient. To describe conformational flexibility of pentadecane and heptane, we determined trans/gauche populations in a hydrocarbon chain. The height of energy bariers ers in conformational transitions together with the energy difference between low-energy energy conformers were determined from free energy profiles of rotations around selected CC C bonds obtained in the gas phase using Umbrella Sampling MD simulation. The MD simulations ulations were performed with Gromacs v4.5 [4] in four different sets of forcefield paremeters: original OPLS-AA [1], OPLS-AA AA with alternated point charges [3], OPLS-AA OPLS AA with a new set of torsion potentials (Murzyn, unpublished results) and parameters of Jämbeck Jämbeck and Lyubartsev [5] (SLipids), originally developed for a series of phospholipids. SLipids forcefield parameter set is heavily based on CHARMM C27 parametrization of saturated and unsaturated hydrocarbons, with several adjustments that render it compatible compat with AMBER forcefield. This work is supported by the Polish National Science Center under grant no. 2011/01/B/NZ1/00081. [1] Jorgensen WL, Maxwell DS, Tirado-Rives Tirado Rives J, Development and Testing of the OPLS AllAll Atom Force Field on Conformational Energetics Energetics and Properties of Organic Liquids, J. Am. Chem. Soc. 1996, 118 (45): 11225–11236 11236 [2] Vereshchagin A.N., Vul'fson S.G, Inductive interaction of weak dipoles in saturated hydrocarbons and their derivatives, Russian Chemical Bulletin, 1966,, 16(6):1186-1189 16(6):1186 [3] Pasenkiewicz-Gierula Gierula M., Baczynski K., Murzyn K., Markiewicz M. Orientation of lutein in a lipid bilayer – revisited, Acta Biochim. Polon. Polon , 2012, 59(1):115-118 [4] Hess, B., Kutzner, C., van der Spoel, D., Lindahl, E. GROMACS 4: Algorithms for Highly High Efficient, Load-Balanced, Balanced, and Scalable Molecular Simulation, J. Chem. Theory Comp., Comp 2008, 4(3):435–447 [5] Jambeck J.P.M., Lyubartsev A.P., Derivation and Systematic Validation of a Refined AllAll Atom Force Field for Phosphatidylcholine Lipids , J. Phys. Chem. B, 2012,, 116, 3164−3179 3164

58

Posters

Poster 7: Marcin Dąbrowski Docking of Lqhα-toxin to homology model of domain IV of Nav sodium channel and MD study of its dynamics M. Dąbrowski1,2*, W. Nowak1, M. Stankiewicz2 1Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland 2Faculty of Biology and Environment Protection, Nicolaus Copernicus University, Lwowska 1, 87-100 Toruń, Poland *presenting author, e-mail: [email protected]

Voltage dependent sodium channels (Nav) are important membrane proteins that can change conductivity of a cell membrane. These channels are critical for proper functioning of nervous system both - invertebrates and vertebrates. Toxins may change structure and functions of one or more groups of channels in a neuronal membrane. We can measure these changes using electrophysiological methods such as Double Oil Gap or Patch Clamp. Interactions between toxins and binding sites in channels may be also studies by in silico calculations. In the poster molecular dynamics studies (MD) of a fragment of Nav channel with a docked toxin will be presented. 6 ns simulations of this system were performed with (and without) an external electric field added. In electrophysiological experiments we can see that if we increase the voltage across the cell membrane, the toxin dissociate faster from the channel. In the simulations there is a correlation between a number of hydrogen bonds and a movement of the segment S4 in the domain IV (voltage sensor) of Nav channel after switching on the electric field. All studies were performed for homology model of domain IV of cockroach Periplaneta Americana Nav channel and a crystal model of Lqhα insect toxin. MD simulations were performed by using the CHARMM force field and the NAMD code. Unique atomic scale details of the toxin interaction with the channel will be revealed.

59

Posters

Poster 8: Michal Dabrowski Comparison of Jaspar, Transfac and Genomatix motif libraries on chip-seq data for 44 transcription factors Michal Dabrowski1*, Norbert Dojer2, Izabella Krystkowiak1, Bartek Wilczynski2, Bozena Kaminska1 1 2

Nencki Institute of Experimental Biology, Warsaw, Poland

Faculty of Mathematics Informatics and Mechanics, University of Warsaw, Poland *presenting author, e-mail: [email protected]

New experimental techniques, of measuring transcription factor (TF) occupancy in DNase I hypersensitive regions, renewed interest in genome-wide identification of known transcription factor binding sites (TFBS). In the vertebrates there are three major collections of known TFBS motifs, one of them – Jaspar – is in the public domain, and the other two; Transfac and Genomatix, are commercial. Given the substantial cost of the commercial licenses, we were interested if there are differences in performance of the three libraries. Using the chip-seq data for 44 transcription factors (TFs) as the positive sets, and third exons identified genome-wide as the negative set, we compared performance of the three libraries, in terms of specificity, sensitivity, and coverage. The specificity and sensitivity were compared in two ways: (1) each commercial library was used with its defaults scanner (Transfac – Match, Genomatix – MatInspector) and the thresholds controlling the false positive rate (FPR) as provided by the supplier; while Jaspar was used with two open-source motif scanning programs: Bio.Motif and matrix-scan, using a uniform FPR threshold; (2) all three libraries were used with the same two scanners i.e. Bio.Motif and matrix-scan. Bio.Motif was used with the GC-content as the background model, and matrix-scan was used with the 1st order Markov chain background model. The coverage – i.e. number of represented TFs – was highest for Genomatix – 37, followed by Transfac – 33, and by Jaspar – 21. When used with the same scanner (matrix-scan or Bio.Motif), the average specificity and sensitivity obtained with the motifs from all the three libraries was practically identical. The two modern scanners using background models outperformed the two commercial scanners, by resulting in higher average sensitivity for the same average specificity. The use of Genomatix matrix families (families of similar motifs predicted to bind the same set of TFs) busted the average sensitivity, at the cost of a drop in the average specificity. Using the Bio.Motif scanner, we analyzed the full ROC curves for all the motifs from the three libraries. We are currently investigating utility of different ways of parametrization of the ROC curves to automatically set the thresholds in a way that maximizes the balanced accuracy (average of specificity and the sensitivity). Our results demonstrate that the value of the threshold is dependent on the information content of the motif.

60

Posters

Poster 9: Mateusz Dobrychłop Pyry3D UCSF Chimera extension and its application in prediction of macromolecular complexes’ structure Mateusz Dobrychłop1,2* , Janusz M. Bujnicki1,2, Joanna M. Kasprzak1 1

Laboratory of Structural Bioinformatics, Institute of Molecular Biology and Biotechnology, Collegium Biologicum, Adam Mickiewicz University, ul. Umultowska 89, 61-614 Poznan, Poland. 2 Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, ul. Ks. Trojdena 4, 02-109 Warsaw, Poland. *presenting author, e-mail: [email protected]

PyRy3D is a software tool that creates ranked 3D models of macromolecular complexes based on experimental restraints and the shape of the whole complex. The program performs Monte Carlo simulations in order to find the best arrangement of the components inside a density map. To let the user compose his/her own arrangements of components inside the defined complex shape in the easiest and most intuitive way, we have created a tool that associates PyRy3D with UCSF Chimera[1], a popular program for interactive visualization and analysis of molecular structures. PyRy3D Chimera extension is a plugin, that provides a user-friendly graphical interface, letting the user to generate a set of PyRy3D input files interactively, or to calculate a score for a set of different components' arrangements, based on default or user-defined parameters, directly from the extension’s interface. The poster shows the PyRy3D extension’s basic features, and their exemplary use in predicting human DNA polymerase gamma holoenzyme’s three dimensional structure. [1] Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE. UCSF Chimera--a visualization system for exploratory research and analysis. J Comput Chem. 2004 Oct;25(13):1605-12.

61

Posters

Poster 10: Finn Drabløs Transcription profiling during the cell cycle shows that a subset of Polycomb-targeted genes is upregulated during DNA replication Javier Peña-Diaz1, Siv A. Hegre1,2, Endre Anderssen1, Per A. Aas1, Robin Mjelle1, Gregor D. Gilfillan3, Robert Lyle3, Finn Drabløs1,*, Hans E. Krokan1, Pål Sætrom1,4 1) Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, NO-7491, Trondheim, Norway. 2) St. Olavs Hospital, NO-7006 Trondheim, Norway. 3) Department of Medical Genetics and Norwegian Sequencing Centre, Oslo University Hospital, NO-0407 Oslo, Norway. 4) Department of Computer and Information Science, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway *presenting author, e-mail: [email protected]

Genome-wide gene expression analyses of the human somatic cell cycle have indicated that the set of cycling genes differ between primary and cancer cells. By identifying genes that have cell cycle dependent expression in HaCaT human keratinocytes and comparing these with previously identified cell cycle genes, we have identified three distinct groups of cell cycle genes [1]. First, housekeeping genes enriched for known cell cycle functions; second, cell type-specific genes enriched for HaCaT-specific functions; and third, Polycomb-regulated genes. These Polycombregulated genes are specifically upregulated during DNA replication, and consistent with being epigenetically silenced in other cell cycle phases, these genes have lower expression than other cell cycle genes. We also find similar patterns in foreskin fibroblasts, indicating that replicationdependent expression of Polycomb-silenced genes is a prevalent but unrecognized regulatory mechanism. [1] J. Peña-Diaz, S.A. Hegre, E. Anderssen, P.A. Aas, R. Mjelle, G.D. Gilfillan, R. Lyle, F. Drabløs, H.E. Krokan, P. Sætrom, Nucleic Acids Res, 2013, 41(5), 2846-2856.

62

Posters

Poster 11: Małgorzata Dudkiewicz Identification of novel HExxH metaloprotease domains in the proteome grey zones using bioinformatics methods. Krystyna Bińko1, Małgorzata Dudkiewicz1*, Krzysztof Pawłowski1 Department of Experimental Design and Bioinformatics, Warsaw University of Life Sciences SGGW, 159 Nowoursynowska Str. 02-776 Warsaw *presenting author, e-mal: [email protected]

Zincins, the zinc-dependent metalloproteases, possesing characteristic His-Glu-x-x-His (HExxH) active site motif, are a broad group of proteins involved in many metabolic and regulatory functions, present in proteomes of most organisms. Human genome contains more than 100 genes encoding proteins with known zincin-like domains. Domain families with majority of members possessing a conserved HExxH motif include, not surprisingly, many known and putative metalloproteases [1]. A survey of all proteins containing the HExxH motif shows that approximately 52% of HExxH occurrences fall within known protein structural domains (as defined in the Pfam database), the rest can be found in “anonymous” regions, which still wait for annotation. These “out-of-domain” occurrences were carefully investigated here, using combined bioinformatics methods. Our aim was to identify and indicate the most probable candidates for novel metalloprotease domains, based on the presence of HExxH motif and several features identified through a pipeline of sensitive bioinformatics algorithms created for searching distant homologues. Nearly 95 000 sequences possessing non-domain occurrences of the HExxH motif had redundancy at 80% sequence identity removed, their lengths were optimalized for better alignments, and were clustered at 40% sequence identity threshold. Among the 24000+ clusters, most were small (between 1 and 10 sequences) while few were large clusters (between 10 and 50 sequences). Approximately one fourth of the clusters exhibited significant or “borderline significant” sequence similarity to known zincins, and had the HExxH motif conserved among most close homologues. Thus many potential novel metalloprotease families or extensions of existing metalloprotease families were identified. Examples of such families are presented. [1] A. Lenart, M. Dudkiewicz, M. Grynberg, K. Pawłowski. CLCAs - A Family of Metalloproteases of Intriguing Phylogenetic Distribution and with Cases of Substituted Catalytic Sites. PLoS One, 2013; 8 e62272.

63

Posters

Poster 12: Przemysław Gagat Model for protein import into the photosynthetic organelles of Paulinella chromatophora inferred from bioinformatics analyses Przemysław Gagat1*, Paweł Mackiewicz1, Andrzej Bodył2 1

Department of Genomics, Faculty of Biotechnology, 2Laboratory of Evolutionary Protistology, Division of Invertebrate Biology, Evolution and Conservation, Faculty of Biological Sciences, University of Wrocław, ul. Przybyszewskiego 63/77, 51-148 Wrocław, Poland *presenting author, e-mail: [email protected]

Paulinella chromatopora is a testate filose amoeba, belonging to the supergroup Rhizaria. It harbors two photosynthetically active endosymbionts of cyanobacterial origin (chromatophores), acquired independently of classic primary plastids, i.e. organelles of glaucophytes, red algae and green plants, ~60 million years ago. These endosymbionts have lost many essential genes and transferred at least 32 genes to the host nuclear genome during endosymbiotic gene transfer (EGT). This indicates that, similar to classical primary plastids, Paulinella chromatophores have evolved a transport system to import their EGT-derived proteins. To check if this system is similar to the Toc-Tic-based import machinery of primary plastids, we performed sensitive search, using FASTA algorithm, for homologs of Toc and Tic genes in two Paulinella sequenced chromatophore genomes and, for comparison, in 33 sequenced genomes of free-living cyanobacteria. The homology was verified using protein domain and motif searches. We found that the Paulinella chromatophore genomes encode homologs to Toc12, Toc64, Tic21 and Tic32 but lost those of Toc75, Tic20, Tic55 and Tic62. Because the missing Toc genes, especially Toc75, has not been detected in the preliminary analyses of the Paulinella nuclear genome, other alternative import pathways to Toc-Tic-based route had to be considered. Therefore, we used 25 bioinformatics tools to search for potential targeting signals in 10 EGT-derived proteins involved in photosynthesis. Our studies demonstrate that five of them carry signal peptides, implying their targeting via the host endomembrane system. The remaining five could utilize alternative targeting information. The predicted low molecular weight and nearly neutral charge are characteristic of these ten EGT-derived proteins and can be interpreted as adaptations to their passage through the peptidoglycan wall still present in the intermembrane space of the chromatophore envelope. In order to complete our model for protein import into the chromatophores of Paulinella, we have also looked for potential chaperons and components of the molecular motor that could assist imported proteins on their way to the chromatophore matrix.

64

Posters

Poster 13: Wiktoria Giedroyć-Piasecka Prediction of binding affinity of menin-MLL small molecule inhibitors Wiktoria Giedroyć-Piasecka*, Jolanta Grembecka5, Tomasz Cierpicki5, Jonathan Pollock5, Edyta Dyguda-Kazimierowicz, W. Andrzej Sokalski 

Institute of Physical and Theoretical Chemistry, Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland 5

Department of Pathology, University of Ann Arbor, MI, USA

*presenting author, e-mail: [email protected]

Menin-mixed lineage leukemia (MLL) fusion proteins (FPs) interaction is a common leukemogenic factor. [1] Binding of MLL-FP to menin, a nuclear tumor suppressor protein, leads to hematopoietic differentiation blockage that results in aggressive leukemias. [2] Patients suffering from menin-MLL based acute leukemias have very poor prognoses. As currently available treatments are considered ineffective, there is a need for development of new therapies. [3] It has been proven [4] that specific menin-MLL interaction could serve as an effective therapeutical target for inhibition by small molecule ligands. The proposed novel inhibitors of menin-MLL interaction are aromatic compounds that occupy MLL-FP binding place, preventing the peptide attachment. [5] To improve the inhibitors development process, we have proposed an in silico model of inhibitor activity. Ab initio calculations of intermolecular interaction between menin active site and a set of chosen inhibitors originating from experimental structures allowed us to correlate the inhibitor binding energy with the experimental values of binding affinity. In addition, application of the hybrid variation-perturbation scheme of energy decomposition [6] enabled the investigation of physical basis of the interaction between inhibitors and the amino acid residues included in the menin active site model. The intermolecular interaction was assessed on several consecutive levels of theory of increasing accuracy, providing insights into each amino acid residue contribution to binding specificity. As a result, our theoretical model of inhibitor activity can successfully serve as a tool for prediction of binding affinity of newly proposed inhibitor structures. If applied during the inhibitor design phase, the model eliminates the otherwise inevitable need of synthesis of every novel ligand structure, facilitating the meninMLL mediated leukemias drug development.

This work was supported by Leukemia and Lymphoma Society TRP Grant 6070-09 [1] A. T. Thiel et al., Bioessays, 2012, 34, 771-780. [2] J. Grembecka et al., JBC, 2010, 285, 40690-40698. [3] A. Shi et al., Blood, 2012, 120, 4461-4469. [4] M. J. Murai et al., JBC, 2011, 286, 31742-31748. [5] J. Grembecka et al., N Chem Bio, 2012, 8, 277-284. [6] W. A. Sokalski et al., Chem Phys Lett, 1988, 153, 153-159.

65

Posters

Poster 14: Tomasz Głowacki On peptide assembling problems Tomasz Głowacki1,*, Adam Kozak1, Piotr Formanowicz1,2 Institute of Computing Science, Poznań University of Technology, Poznań, Poland Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznań, Poland *Tomasz Głowacki, e-mail: [email protected]

Peptides are chemical compounds formed by linking amino acids of 20 types. Long peptides are called proteins and they consist of at least 100 about amino acids [1]. Existing methods for peptide sequencing allow to determine only short fragments up to 50 amino acids, so there is a need for an assembling method to bring these pieces together. This work presents selected combinatorial variants of peptide assembling problem. Computational complexity of problems is also described. Potential errors and additional information about amino acid distribution are considered. For the assembling problem without errors the graph model is described [2]. For the strongly NP-hard variant of a problem three methaheuristic algorithms were developed and tested on real sequences from GenBank [3][4]. The obtained results are given. [1] L. Stryer Biochemistry. New York, W.H. Freeman and Company, 4th edition, 1995. [2] J. Blazewicz, M. Borowski, P. Formanowicz, and T. Glowacki. On graph theoretical models 2 for peptide sequence assembly. Foundations of Computing and Decision Sciences, 2005, 30:183–191,. [3] J.Błażewicz, M.Borowski, P.Formanowicz, T.Głowacki. Genetic and tabu search algorithms for peptide assembly problems, RAIRO - Operations Research, 2010, 44: 153-166 [4] T. Głowacki, A. Kozak, P. Formanowicz, Asemblacja długich łańcuchów peptydowych przy wykorzystaniu metaheurystyki GRASP, Zeszyty Naukowe Politechniki Śląskiej, 2008, 150: 203209

66

Posters

Poster 15: Anna Gogolinska New ideas on using Petri nets in molecular dynamics simulations Anna Gogolinska1,2*, Wieslaw Nowak2 1. Faculty of Mathematics and Computer Science, Nicolaus Copernicus University 2. Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University *presenting author, e-mail: [email protected]

Petri nets (PNs) are one of the mathematical modeling languages. PN has a form of a bipartite graph with two kinds of nodes: places and transitions. Transitions can represent actions, events, and places are suitable for describing objects, elements and they can also contain tokens. Dynamics in PNs is obtained by transferring tokens by transitions from their input places to the output places. The PN idea is very simple but thanks that it is very flexible and universal. However many extensions of the basic type of the PN exists, for example Petri nets with time, with priorities, stochastic Petri nets. PNs are used in many disciplines and they are very popular in systems biology [1]. Molecular dynamics simulations (MD) (see for example [2] and references therein) are computational methods for studying large chemical systems (like proteins, DNA, etc.). A known structure is usually the starting point for the simulation of its time evolution. The new structures are obtained by solving the Newton equations of motion. The output from the MD is a trajectory file with coordinates of atoms in each time-frame of the simulation and this file is usually very large. The analysis of the MD outputs is time consuming and difficult task. New methods for efficient extraction of information are highly desirable, since amount of MD trajectories increases tremendously every year. Our work focuses on using Petri nets to represent approximate MD trajectories. This approach may create a new way of MD outputs analysis and may easily give interesting data contained in the trajectories. Three general ideas connecting PN formalism and MD “philosophy” will be presented. In first method PN places represent points in the Cartesian R3 space and transitions represent movement of the atoms calculated in the simulation. The second method is similar but here one place describes the position of all amino acids of the protein – one conformation is equivalent to a point in R3N conformational space. A transition represents change of the conformation. The last idea focuses on contacts between the atoms. Our ideas have been implemented. A single Petri net can be generated from a bunch of trajectories calculated for one protein – examples of such nets will be presented. Another feature of this approach is possibility of simulation of PN and a fast generation of new files with protein structure models based on PN. They also will be presented. [1] I. Koch, et al., Modeling in systems biology the petri net approach., 2011, London: Springer. [2] W. Nowak, Applications of computational methods to simulations of proteins dynamics. In “Handbook of Computational Chemistry”, Springer, 2012, pp.129-1149.

67

Posters

Poster 16: Małgorzata Grabińska Comparison of bacterial genome evolution under real and artificial mutational pressures Małgorzata Grabińska*, Paweł Błażej, Paweł Mackiewicz, Stanisław Cebrat Faculty of Biotechnology, University of Wrocław *presenting author, e-mail: [email protected]

Mutational pressure is one of the main force shaping DNA composition and playing a crucial role in genome evolution. Together with recombination, it increases genetic variation of organisms necessary for their adaptation to changing environments. On one hand, most mutations are deleterious and generate energetic costs of their repairing. Therefore, mutations occurring in biological DNA sequences are not completely random but are the result of coevolution between mutational pressure with selection constraints around the genetic code [1] and can be optimized to some extent during evolution [2]. Here, we used a computer simulation model of bacterial genome evolution worked out by Błażej and coworkers [3] to test the optimal mutational matrices that were found using Evolutionary Strategies approach. The mutational pressures were a realization of appropriate continuous in time and time-reversible Markov process of nucleotide substitutions and were optimized according to criteria: the minimum and the maximum number of non-synonymous substitutions, i.e. missense mutations, which change one coded amino acid to another. We also used a modified version of GeneMark [4], an algorithm for finding protein coding sequences to model selection for protein coding sequences. During simulations we recorded many parameters such as: strength of coding signal, number of genes which lost their coding signal, number of genes interrupted by stop codon appearance in gene sequences, and number of eliminated individuals (genomes) in the evolving population. All results were compared with simulations carried out under empirical mutational pressure, which was found in a bacteria genome [5]. Interestingly, values of parameters recorded in the simulation with the real matrix located between values from simulations using two extreme optimized substitution matrices. [1] P. Mackiewicz, P. Biecek, D. Mackiewicz, J. Kiraga, K. Bączkowski, M. Sobczyński, S. Cebrat, Optimisation of asymmetric mutational pressure and selection pressure around the universal genetic code, Lecture Notes in Computer Science, 2008, 5103, 100-109. [2] P. Sniegowski, P. Gerrish, T. Johnson, A. Shaver, The evolution of mutation rates: separating causes from consequences, Bioessays, 2000, 22, 1057-1066. [3] P. Błażej, P. Mackiewicz, S. Cebrat, Simulation of bacterial genome evolution under replicational mutational pressures, Proceedings of the BIOSTEC 2012, Bioinformatics 2012, International Conference on Bioinformatics Models, Methods and Algorithms, Vilamoura, Algarve, Portugal, 1-4 February, 2012, 51-57. [4] M. Borodovsky, Y. A. Sprizhitskii, E. I. Golovanov, A. A Aleksandrov, Statistical patterns in primary structures of the functional regions of the genome in Escherichia coli, Molecular Biology, 1986, 20, 826–840, 1144–1150. [5] M. Kowalczuk, P. Mackiewicz, D. Mackiewicz, A. Nowicka, M. Dudkiewicz, M. R. Dudek, S. Cebrat, High correlation between the turnover of nucleotides under mutational pressure and the DNA composition, BMC Evolutionary Biology, 2001, 1, 13.

68

Posters

Poster 17: Aleksandra Gruca Estimation of rule evaluation measure designed for functional genes description, based on individual expert preferences Aleksandra Gruca1*, Marek Sikora1,2 1)Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland 2)Institute of Innovative Technologies EMAG, Leopolda 16, 40-188, Katowice, Poland *Aleksandra Gruca, email: [email protected]

High-throughput methods of expression data analysis provide abundance of data that must be interpreted and analyzed by data mining methods and machine learning algorithms. Many of these methods can reveal existence of significant gene signatures discriminating control and analyzed class, however before further analysis, there is a need to select the most informative genes sharing common biological functions. In this work we present new extension of RuleGO rules generation method [1]. The method was designed to discover logical rules including combination of GO terms [2] in their premises in order to provide functional description of analyzed gene signatures. As the number of obtained rules is typically huge, filtration algorithm is required to reduce rules number and to select only the most interesting ones. Rule interestingness measures currently used within the RuleGO method do not always allow for the selection of the rules according to user’s subjective preferences. Here, we propose an application of the UTA method [3] for estimation of the multicriteria rule interestingness measure reflecting subjective rule evaluation, defined by an expert. Each rule is characterized by a vector of values reflecting its quality due to the different (objective and subjective) interestingness measures [4] (so called partial interestingness measures). From the designated set of rules a set of representative rules is selected and presented to an expert. The expert orders rules based on his preferences. Using the information about the order and information about the values of the partial interestingness measures that characterize the rules, the additive multicriteria interestingness measure is estimated. The measure estimation is carried out in such a way that the rule ranking obtained by this function is consistent with the ranking given by an expert. The presented approach is applied to three gene expression datasets. Using the RuleGO method we generated rules for the gene groups from experimental datasets and presented them to the user (domain expert). Based on the expert rule evaluation, the multicriteria interestingness measure reflecting subjective user preferences was estimated and applied to the whole set of rules. Obtained rule orders were compared with rule orders generated on the basis of the standard RuleGO rule evaluation method. The proposed method allowed obtaining the rule ranking that is better correlated with user ranking than the ranking obtained in the standard way; therefore the new ranking is more consistent with the user preferences. Obtained ranking is further used in the filtration step, controlling the process of selection of most interesting rules from the whole set of generated, statically significant, rules. [1] Gruca A., Sikora M., Polanski A, Nucleic Acids Res., 2011, Vol. 39(suppl 2), W293-W301 [2] Ashburner, M., Ball, C.A., Blake, J.A. et al. Nat Genet, 2000, Vo.25, 1, 25-29. [3] Siskos, Y., Grigoroudis, E., and Matsatsinis, N.F, Springer-Verlag, 2005, 297-344. [4] Geng, L. and Hamilton, H.J. ACM Computing Surveys, 2006, 38, 3, 9

69

Posters

Poster 18: Md. Anayet Hasan An In Silico approach to design of potential siRNA molecules for ICP22 (US1) gene silencing of different strains of Human Herpes Simplex 1 Md. Anayet Hasan1*, Suza Mohammad Nur1, Mohammad Al Amin1, Rashel Alam1 and Adnan Mannan1, 2 1. Department of Genetic Engineering and Biotechnology, Faculty of Biological Sciences, University of Chittagong, Chittagong-4331, Bangladesh 2. School of pharmacy, Faculty of Health Sciences, Curtin University, Perth WA 6845, Australia. Md. Anayet Hasan- Email: [email protected]; Phone No: +8801717344389; *Presenting author Keywords: HSV-1, Antiviral, ICP22 (US1) gene, RNAi, siRNA, Thermodynamics

The herpes simplex virus (HSV-1) also known as cold sore, fever blister or night fever is a virus that manifests itself in viral infection and marked by painful, watery blisters in the skin or mucous membrane (such as the mouth or lips) or on the genitals. The disease is contagious, particularly during an outbreak, and is irredeemable with present technology. Genetic studies of HSV-1 have shown that ICP22 (US1) gene is an immediate early gene and is responsible for replication of genome and also play a vital role in viral infection. Therefore, ICP22 (US1) may be used as suitable target for disease diagnosis. Viral activity can be controlled through RNA interference (RNAi) technology, a significant method for post transcriptional gene silencing in a sequence specific manner. However, there is a genetic variability in different viral isolates; it is a great challenge to design potential siRNA molecules which can silence the respective target genes rather than only other viral gene concurrently. In current study two effective siRNA molecules for silencing of seven different strains of HSV-1 were rationally designed and validated using computational methods, which may be lead to knockdown the activity of virus. Thus, this approach may deliver an insight for the chemical synthesis of antiviral RNA molecule for treatment of HSV-1, at genomic level.

70

Posters

Poster 19: Marta Iwanaszko NF-κB and IRF: cross-regulation between two major pathways Marta Iwanaszko1*, Marek Kimmel1,2 1

Department of Automatic Control, Silesian University of Technology, Gliwice, Poland 2 Department of Statistics, Rice University, Houston, TX, USA *presenting author, e-mail: [email protected]

Identification of pathogen-associated molecular patterns, such as dual stranded RNA (dsRNA) and lipopolysaccharide (LPS), by host pattern recognition receptors (PRRs) is a critical step in innate immune response (IIR). Stimulation of TLRs by infecting pathogen induces activation of signal transduction cascades, which leads to translocation of nuclear factor-κB (NF-κB) to the nucleus [1], activation of interferon regulatory factor 3/7 (IRF3/7) which cooperate to induce transcription of various cytokines such as alpha/beta interferon (IFN-α/β) to dispose of infectious pathogens [2,3]. We analyze the cross-talk between two major signaling pathways in the IIR, namely NF-κB and IRF pathways. There is not enough data on how the activation of these two major signaling arms of the IIR is controlled or how they interact with each other. Recent experimental work by Brasier’s group and others has shown that adapter molecules regulating the IRF3 signaling pathway is connected with that of NF-κB at multiple steps [4,5], but it is still not known how the NF-κB and IRF pathways interact with each other. An attempt to the explanation of this interaction was made using mathematical modeling and other in silico methods, presented in Bertolusso et al. [6: 2013, in press]. Using computational methods and phylogenetic approach we analyzed promoters of genes coding for transcription factors, interacting in IRF and NF-κB pathways. In the first step of analysis we were looking for transcription factor binding sites (TFBSs) across given promoter region and in the second step we analyzed if these TFBS were conserved among species in conserved domains. Similar method TFBS analysis was used in Iwanaszko et al. [7]. Promoters of downstream genes in analyzed pathways, mainly coding for transcription factors, contain one or few transcription factor binding sites for IRF transcription factors and usually more binding sites for NF-κB family. Another finding is that IRF family genes regulation may be more sensitive to the direct NF-κB family activity, rather than IRF family itself, and some other unknown transcription agent could be involved. This may be supported by research on NF-κBdeficient cells, which has shown that the initial kinetics of the type I interferon (IFN) response is dependent on concurrent NF-κB activation [6]. Absence of NF-κB is the cause of blunted IFNβ expression, which result in reduced propagation of anti-viral signals in the mucosal surface. NFκB also controls expression of the downstream IFN auto-amplification loop through STAT1, IRF-1, -5, and -7 transcription factors. Research indicate that the two NF-κB and IRF3 signaling arms are highly interconnected and that these interconnections influence the kinetics of the IIR. Knowledge about this crosstalk may be crucial for determining the outcome of viral infection. [1] [2] [3] [4] [5] [6] [7]

Hoffmann A et al. The Ikappa B-NF-kappaB Signaling Module: Temporal Control and Selective Gene Activation. Science, (2002) 298:1241-1245. Akira S et al. Cell, .(2006) 124:783-801. Brasier AR et al. American Soc. for Microbiol., (2008) 119-135. Liu P et al. PLoS ONE, (2009) 14:e8079. Zhao T et al.. Nat Immunol, (2007) 8(6), 592-600. Bertolusso R et al.. In press (2013) Iwanaszko M et al.. BMC Genomics, (2012) 13:182

71

Posters

Poster 20: Rafał Jakubowski Molecular interactions at the origin of amyloidosis: transthyretin stability revealed by MD simulations Rafał Jakubowski*, Łukasz Pepłowski, Wiesław Nowak Theoretical Molecular Biophysics Group, Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, ul. Grudziądzka 5, 87-100 Toruń, Poland *presenting author, e-mail: [email protected]

Amyloidosis is a wide family of diseases that are caused by forming amyloid fibrils, that may lead to various health issues. One of them is senile systematic amyloidosis (SSA), that affects about 25% of human population over 80th year of life. Some amyloidosis-related disorders may be lethal, like familial amyloid polyneuropathy. Mentioned disorders have strong connections with a protein occurring in plasma and cerebral spinal fluid – transthyretin (TTR) [1]. This tetrameric protein is responsible for thyroxine (T4) and retinol transport. TTRs stability is crucial to avoid TTR-based amyloidosis (widely known as a-TTR). To run a-TTR cascade it is necessary to dissociate sequentially a tetramer into two dimers, then each dimmer has to split to monomomers, and at the end the TTR monomer has to misfold. Only misfolded TTR monomers can create dangerous fibrils. This process may be accelerated by, for example, point mutations [2]. In this work we present results of molecular dynamics simulations of a wild type and V30M mutant human TTR performed on hundred nanoseconds timescale, based on the PDB file 1ICT [3]. We try to determine the impact of single amino acid mutations on TTR tetramers stability and extract most important interaction between each TTR monomer module contributing to the whole complex stability. This work was supported by NCN grant no. N202 262083 (WN) and Institute of Physics, NCU grant no. 1394-A (RJ).

[1] G.A. Hagen, WJ. Elliott, J Clin Endocrinol Metab, 1973, 37, 415-22. [2] C.E. Bulawa, et al., Proc Natl Acad Sci USA, 2012, 109(24), 9629-34. [3] A. Wojtczak, et al. Acta Crystallogr D Biol Crystallogr , 2001, 57, 957-67.

72

Posters

Poster 21: Witold Januszewski MetaMisTher: a machine learning meta-predictor for the influence of missense mutations on thermodynamical protein stability Witold Januszewski1,*, Łukasz Kozłowski1, Marcin Magnus1 , Tymon Rubel2 and Janusz M. Bujnicki1,3 1 International Institute of Molecular and Cell Biology in Warsaw, ks. Trojdena 4, 02-109 Warsaw, Poland 2 Institute of Radioelectronics, Nuclear and Medical Electronics Division, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland 3 Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, Umultowska 89, 61-614 Poznan, Poland *presenting author, e-mail: [email protected]

Missense mutations, also known as nsSNPs (nonsynonymous Single Nucleotide Polymorphisms) are point mutations in DNA regions leading to a single amino acid substitution in the primary protein structure. The difference between free Gibbs energy of unfolded and folded mutant protein (∆∆G) is the indicator of thermodynamical protein stability. Empirically inforced constraints for ∆∆G were suggested by Khan: ∆∆G within the range {-0.5;0.5 kcal/mol} signifies neutral mutation, above it – destabilizing and below – stabilizing [1]. Although a particular missense mutation cannot be linked to a disease deterministically, several studies [2] show that pathogenicity predictions of missense mutations are also possible. Here, we propose the meta-predictor (MetaMisTher), which predicts ∆∆ G by wrapping 11 local physical potential, knowledge-based potential and machine learning methods into a common independent server framework. To increase precision of ∆∆G scoring, MetaMisTher uses information from .PDB files and predicts solvent accessibility and secondary structure. The gathered predictions are processed with a consensus SVM algorithm to obtain a single score and evaluated with ROC curves and MCC (Matthew's correlation coefficient). We believe MetaMisTher can provide geneticians and epigeneticians with fast, intuitive and accurate decisions and support the work of drug designers as well as other researchers. [1] Khan S.; “Mutational effects on protein structures: Knowledge gained from databases, predictions and protein models”; University of Tampere, PhD thesis [2] Olatubosun A, Väliaho J, Härkönen J, Thusberg J, Vihinen M.;”PON-P: integrated predictor for pathogenicity of missense variants”; Hum. Mutat. 2012 Aug;33(8):1166-74. doi: 10.1002/humu.22102

73

Posters

Poster 22: Barbara Kalinowska In Silico Model of the Early-Stage Intermediate in Protein Folding – Analysis of Structural Predictability Kalinowska Barbara1,2*, Alejster Paweł1,2, Sałapa Kinga 1, Baster Zbigniew3, Roterman Irena1 1 Department of Bioinformatics and Telemedicine, Jagiellonian University – Medical College, Lazarza 16, 31-530 Krakow, Poland 2

3

Faculty of Physics, Astronomy and Applied Computer Science – Jagiellonian University, Reymonta 4, 30-059 Krakow, Poland

Faculty of Physics, University of Science and Technology (AGH) Al. Mickiewicza 30, 30-059 Krakow, Poland *presenting author, e-mail: [email protected]

The presented in silico model of an early-stage (ES) intermediate in protein folding is based on a limited conformational subspace within the Ramachandran plot. The subspace arises from a backbone conformation analysis combined with the information theory [1]. The method for determining the ES structure of protein distinguishes seven structural motifs which reflect local probability maxima within the conformational subspace [2]. It was found that three of them correspond to well-defined secondary structures and the others to various types of random coils [3]. Since the motifs have been assigned literal structural codes, it is convenient to study them by the means of methods dedicated to analysis of letter sequences. In order to estimate an accuracy of the model, the investigation was performed for a set of randomly selected amino acid sequences with a known native structure. The set of proposed ES structures was compared with corresponding structures obtained from the native structures in a “step-back” procedure. The following analysis of the ES prediction accuracy shows that around 46% of the amino acid residues' conformations can be predicted correctly. The presented approach sheds light on reasons behind incorrect predictions since it reveals that their occurrence corresponds with involvement of a given amino acid in ligand or protein binding [4]. [1] I. Roterman, J Theor Biol, 1995, 177, 283-288. [2] W. Jurkowski, M. Brylinski, L. Konieczny, Z. Wiśniowski, I. Roterman, Proteins, 2004, 55(1), 115-27. [3] M. Bryliński, L. Konieczny, P. Czerwonko, W. Jurkowski, I. Roterman, J Biomed Biotechnol, 2005, 2, 65-79. [4] B. Kalinowska, P. Alejster, K. Saウapa, Z. Baster, I. Roterman, J Mol Mod, 2013, in press.

74

Posters

Poster 23: Joanna M. Kasprzak Molecular modeling of the Cascade complex responsible for RNA–guided silencing of exogenous elements in Prokaryota Joanna M. Kasprzak 1*, Mateusz Korycinski 1,3, Janusz M. Bujnicki 2,1 1 Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, Umultowska 89, 61-614 Poznan, Poland 2 International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw, Poland 3 Department of Protein Evolution, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany *presenting author, e-mail: [email protected]

Bacteria and Archaea protect themselves against plasmids and viruses by incorporating foreign DNA into a library of clustered regularly interspaced short palindromic repeats (CRISPRs). crRNA molecule, obtained from this library, is acquired by Cascade complex (CRISPR associated complex for antiviral defense), which allows sequence - specific target DNA silencing. Cascade complex is composed of eleven proteins (CasE1C6B2D1A1) and one crRNA molecule. Here we present a structural analysis of the Cascade complex. We built homology models for all complex components. Then we gathered information about macromolecular interactions between subunits, as well as gathered data about disordered regions. To visualize the structure of whole complex we have fit all the models into the electron density map using a software tool developed in our group – PyRy3D. Procedure used represents components as experimental structures (e.g. X-ray or NMR models), structural models (e.g. homology models) or flexible shapes and applies Monte Carlo approach to find solutions fulfilling experimental restraints. All generated models of the Cascade complex have been clustered, scored, and best ranked complexes are shown. Obtained results provide new information about macromolecular interactions within a complex during its silencing activity. [1] B. Wiedenheft, S.H. Sternberg, J.A. Doudna Nature. 2012 Feb 15; 482(7385):331-8. [2] B. Wiedenheft, G.C. Lander, K. Zhou, M.M. Jore, S.J.J. Brouns, J. van der Oost, J.A. Doudna, E. Nogales. Nature. 2011 Sep 21;477(7365):486-9. [3] D.G. Sashital, B. Wiedenheft, J.A. Doudna Mol Cell. 2012 Apr 17.

75

Posters

Poster 24: Paweł Kędzierski Differential Transition State Stabilization for efficient evaluation of engineered enzymes Paweł Kędzierski* Institute of Physical and Theoretical Chemistry, Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland *presenting author, e-mail: [email protected]

Theoretical engineering of enzymes with desired substrate specificity presents computational complexity comparable to the protein folding problem, yet requires significant accuracy in evaluation of catalytic properties. The Differential Transition State Stabilization (DTSS) method can utilize quantum model of the reaction pathway to evaluate catalytic properties of an enzyme with well defined hierarchy of approximations [1]. Proposed here is an approach to combine the DTSS method with forcefield based generation and evaluation of mutated enzyme variants. [1] W. A. Sokalski, J.Mol.Cat., 1985, 30(3), 395-410.

76

Posters

Poster 25: Boguslaw Kluge RNA modification detection with mass spectrometry Boguslaw Kluge1,*, Krzysztof Skowronek1, Elzbieta Purta1, Janusz M. Bujnicki1,2 Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Ks. Trojdena 4, 02-109 Warsaw1 Laboratory of Bioinformatics, Faculty of Biology, Adam Mickiewicz University, Omultowska 89, 61-614 Poznan2 *presenting author, e-mail: [email protected]

RNA post-transcriptional modifications expand the vocabulary of nucleosides and influence various aspects of the nucleotide chain including structure, thermodynamics and biochemical interactions [2,3]. Comparing modification patterns in different RNA molecules (i.e. in different physiological states) can shed light on the mechanisms and functions of RNA modifications. Mass spectrometry (MS) can be used to search for modified sites in RNA sequences however computational tools are needed to be developed to perform such analyses. There exists a modest number of programs designed for RNA MS data processing [4] although none of them can easily handle this specific task. RNA MS analysis problems include: peak overlapping due to cytosine/uracil mass difference, specific MS/MS fragmentation patterns [1] and database sequence ambiguity (because of the small alphabet of RNA). We describe a procedure for assigning in silico MS/MS fragmented RNA sequences to observed MS/MS ions and a statistical test for correlated binary variables suitable for choosing between competing assignments. By providing various hypothetical modified versions of an RNA sequence as input modified sites can be discovered. The implementation of the procedure uses the OpenMS mass spectrometry data processing tools and Python and R programming languages. We present preliminary results on LC-MS/MS datasets consisting of modified and non-modified tRNA Phe samples cleaved with RNase A or RNase T1 and analyzed using an ESI mass spectrometer. This research is supported by the Polish National Science Centre grant 2012/05/D/ST6/03282. [1] T.E. Andersen, F. Kirpekar, K.F. Haselmann. RNA Fragmentation in MALDI Mass Spectrometry Studied by H/D Exchange: Mechanisms of General Applicability to Nucleic Acids. J Am Soc Mass Spectrom, 2006, 17(10):1353-1368. [2] M. Helm. Post-transcriptional nucleotide modification and alternative folding of RNA. Nucleic Acids Res, 2006, 34(2):721-733. [3] M.A. Machnicka, K. Milanowska, O. Osman Oglu, E. Purta, M. Kurkowska, A. Olchowik, W. Januszewski, S. Kalinowski, S. Dunin-Horkawicz, K.M. Rother, M. Helm, J.M. Bujnicki, H. Grosjean, MODOMICS: a database of RNA modification pathways: 2012 update. Nucleic Acids Res, 2013, 41(D1):D262-D267 [4] H. Nakayama, N. Takahashi, T. Isobe. Informatics for mass spectrometry-based RNA analysis. Mass Spectrom Rev, 2011, 30:1000-1012.

77

Posters

Poster 26: Mateusz Korycinski Stac – new domain in prokaryotic transmembrane signalling Korycinski M.*, Albrecht R., Hartmann M., Coles M., Martin J., Dunin-Horkawicz S., Lupas A. N. Department of Protein Evolution, Max Planck Institute for Developmental Biology, Spemannstr. 35, 72076 Tübingen, Germany *presenting author, e-mail: [email protected]

Two-component transmembrane receptors form the main sensory mechanism in prokaryotes. These receptors share a common dimeric architecture, consisting of an N-terminal extracellular sensor module, the transmembrane helices and other various linking segments, and an intracellular enzymatic effector module [1, 2]. As an exception, we have identified one archaeal family – exemplified by Af1503 from Archaeoglobus fulgidus – that is C-terminally truncated, lacking a recognizable effector module. Rather, these have a HAMP domain, a ubiquitous linking domain, as their entire cytoplasmic part [3, 4]. Here we examine the gene environment of such receptors, finding the vast majority to be associated with transmembrane transport proteins. We also identify a closely associated family of proteins of unknown structure and function. Here we characterize this new family using Af1502 as a model. Structurally, Af1502 forms a fourhelix bundle consisting of two antiparallel helical hairpins. In sequence, we find homologues occurring as domains within extant receptors. In these cases the domain always appears downstream of an SLC–like transporter domain, while its C-terminus is coupled to various arrangements of typical of signal transduction domains. We propose that the domain forms part of a new signal transduction system regulating transport processes, and name the domain STAC – SLC Two-component signaling Associated Compound.

[1] M. Stock, V. L. Robinson, P. N. Goudreau, Annu Rev Biochem, 2000, 69, 183-215. [2] R. Gao, A. M. Stock, Annu Rev Microbiol, 2009, 63, 133–154. [3] M. Hulko, F. Berndt, M. Gruber, J. U.Linder, V. Truffault, A. Schultz, J. Martin, J. E. Schultz, A. N. Lupas, M. Coles, Cell, 2006, 126, 929–940. [4] S. Dunin-Horkawicz, A. N. Lupas, J Mol Biol, 2010, 397, 1156–1174.

78

Posters

Poster 27: Joanna Kowalska The RNA prediction story Joanna Kowalska, Marta Szachniuk Institute of Computing Science, Poznan University of Technology, Poland Joanna Kowalska, e-mail: [email protected]

Computer-aided prediction of RNA tertiary structures has been a great challenge for structural biologists and computer scientists from decades. Today we can say that the first barriers to accurate, real-time predictions have been overcome. Two general approaches have evolved that are followed by the leading actors in this story: template-based and template-free modeling. Template-based prediction assumes that sequence similarity entails the resemblance of threedimensional structures, whereas template-free approach concentrates on laws of physics, which regulate the folding process. Up to now, several methods immersed in these approaches have been published for RNA, e.g. RNAComposer [1], MC-Fold/MC-Sym [1], ModeRNA [3], FARNA [4], and the next ones are announced to come. In the presented work, we focus on the general idea of RNA 3D structure prediction and we discuss the basic principles that govern in silico prediction. The main objective of this work is to present a state of the art in the field of RNA prediction, in an informal and easy to understand manner. This work is intended to address diverse customers: researchers from non-biological sciences, teachers, teens after the basic course in structural biology, etc. We are aware of how difficult it is to introduce scientific problems and their solutions to non-experts. But we believe that a skillful presentation of the subject can be inspiring for both, the presenter and the audience. Thus, we have taken the challenge to prepare a poster, which visualizes the basic principles of RNA structure prediction. In the future we plan to use it during science festivals and the European Researchers Night. We hope it will help us increase young people's interest in bioinformatics and structural biology. [1] M. Popenda, M. Szachniuk, M. Antczak, K.J. Purzycka, P. Lukasiak, N. Bartol, J. Blazewicz, [2] R.W. Adamiak, Nucleic Acids Research, 2012, 40, e112. [3] M. Parisien, F. Major, Nature, 2008, 452, 51-55. [4] M. Rother, K. Rother, T. Puton, J.M. Bujnicki, Nucleic Acids Research, 2011, 39, 4007-4022. [5] R. Das, D. Baker, PNAS, 2007, 104, 14664-14669.

79

Posters

Poster 28: Adam Kozak A Petri net based model and analysis of hepcidin-hemojuvelin regulation axis Adam Kozak1*, Dorota Formanowicz2, Tomasz Głowacki1, Marcin Radom1, Piotr Formanowicz1,3 1 2

Institute of Computing Science, Poznań University of Technology, Poznań, Poland

Department of Clinical Biochemistry and Laboratory Medicine, Poznań University of Medical Sciences, Poznań, Poland 3

Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznań, Poland *presenting author, e-mail: [email protected]

Hepcidin is a hormone produced in liver which is the master regulator of human iron homeostasis [1]. Hepcidin downregulates iron level by binding to ferroportin which transports iron from intestine to plasma. Expression of hepcidin is correlated to inflammatory process, hypoxia, anemia and changes of iron level. Iron regulation based on hepcidin depends also on other substances which provide feedback-regulation of hepcidin expression [2, 3]. In this work a Petri net model of iron homeostasis regulation based on hepcidin-hemojuvelin axis is presented and analyzed. Petri net based models are intuitive for humans and they are relatively easy to analyze using mathematical methods. Structural analysis of Petri net model is presented and it includes analysis of t-invariants, MCT-sets and t-clusters [4, 5]. Biological conclusions of the analysis are also presented. [1] T. Ganz. Hepcidin and iron regulation, 10 years later. Blood, 2011, 117, 4425–4433. [2] A.J. Ramsay, J.D. Hooper, A.R. Folgueras, G. Velasco, and C. Lopez-Otin. Matriptase-2 (TMPRSS6): a proteolitic regulator of iron homeostasis. Haematol., 2009, 94, 840–849. [3] A-S. Zhang and C.A. Enns. Molecular mechanisms of normal iron homeostasis. Hematology, 2009, 1, 207–214. [4] A. Sackmann, D. Formanowicz, P. Formanowicz, I. Koch, and J. Błażewicz. An analysis of Petri net based model of the human body iron homeostasis process. Computational Biology and Chemistry, 2007, 31, 1–10. [5] A. Sackmann, M. Heiner, and I. Koch. Application of Petri net based analysis techniques to signal transduction pathway. BMC Bioinformatics, 2006, 482.

80

Posters

Poster 29: Karina Kubiak - Ossowska Karina Kubiak - Ossowska, Paul A. Mulheran ARCHIE-WeSt , Department of Physics, University of Strathclyde [email protected]

The diffusion pathways of lysozyme adsorbed to a model charged ionic surface are studied using fully atomistic steered molecular dynamics simulation. The simulations start from existing protein adsorption trajectories, where it has been found that one particular residue, Arg128 at the N,C-terminal face, plays a crucial role in anchoring the lysozyme to the surface (Langmuir 2010, 26, 15954-15965). We first investigate the desorption pathway for the protein by pulling the Arg128 side-chain away from the surface in the normal direction, and its subsequent readsorption, before studying diffusion pathways by pulling the Arg128 side-chain parallel to the surface. We find that the orientation of this side-chain plays a decisive role in the diffusion process. Initially it is oriented normal to the surface, aligning in the electrostatic field of the surface during the adsorption process, but after resorption it lies parallel to the surface, being unable to return to its original orientation due to geometric constraints arising from structured water layers at the surface. Diffusion from this alternative adsorption state has a lower energy barrier of ~0.9eV, associated with breaking hydrogen bonds along the pathway, in reasonable agreement with the barrier inferred from previous experimental observation of lysozyme surface clustering. These results show the importance of studying protein diffusion alongside adsorption to gain full insight into the formation of protein clusters and films, essential steps in the future development of functionalised surfaces.

81

Posters

Poster 30: Tadeusz Kulinski Interaction of selected antibiotics and their copper(II) complexes with the antigenomic delta virus ribozyme Katarzyna Kulinska, Bartlomiej Gramowski, Jan Wrzesinski, Jerzy Ciesiolka, Tadeusz Kulinski* Institute of Bioorganic Chemistry, Polish Academy of Science, Noskowskiego 12/14, 61-704 Poznan, Poland *presenting author, e-mail: [email protected]

The hepatitis delta virus ribozyme (HDVr) is a small catalytic RNA motif essential for viral replication during the viral life cycle. HDV-like ribozymes have been found to be widely distributed in human genes, where they probably play a variety of important biological roles [1]. Interaction of three antibiotics (neomycin B, amikacin, actinomycin D) and their Cu2+ complexes with trans-acting HDVr has been studied by probing the RNA structure with three different digestion methods (Pb2+-cleavage and S1 and T1 digestion) and SHAPE analysis [2]. To rationalize the experimental results and understand the mechanism of inhibition of these antibiotics, the molecular modeling, docking and molecular dynamic simulations (MD) have been used. The studies revealed a different binding site of neomycin B, amikacin and actinomycin D inside the ribozyme structure. Neomycin B, an aminoglycoside antibiotic, which strongly inhibited the catalytic properties of HDVr, was found to be bound to the pocket formed by the P1 stem, the P1.1 pseudoknot and the J4/2 junction. Amikacin showed less effective binding to the ribozyme catalytic core resulting in weak inhibition. Complexes of these aminoglycosides with Cu2+ ions were bound to the same ribozyme regions.

Acknowledgements: Financial support from the Polish Ministry of Science and Higher Education (Projects No. N N519 4050 37) is gratefully acknowledged. Calculations were performed at the Poznan Supercomputing and Networking Center and with PL-Grid Infrastructure.

[1] CHT. Webb, NJ. Riccitelli, DJ. Ruminski, A.Luptak, Science. 2009; 326:953 [2] J. Wrzesinski, L. Błaszczyk, M. Wrońska, A. Kasprowicz, K. Stokowa-Sołtys, J. Nagaj, M. Szafraniec, T. Kulinski, M. Jezowska-Bojczuk and J. Ciesiołka, FEBS Journal, 2013.

82

Posters

Poster 31: Mateusz Kurciński Ab initio simulation of coupled folding and binding of intrinsically disordered protein Mateusz Kurciński1*, Sebastian Kmiecik1, Andrzej Koliński1 1

Laboratory of Theory of Biopolymers, Chemistry Department, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland *presenting author, e-mail: [email protected]

Molecular complex of the KIX domain of mouse CBP (CREB Binding Domain) with pKID (phosphorylated Kinase Inducible Domain) of rat CREB (Cyclic Amp Response Element Binding Protein)[1] is a model system for studies of mechanisms, by which intrinsically unfolded proteins perform their functions. These mechanisms are still poorly understood, despite previous efforts both experimental[2] and theoretical[3]. Using CABS[4], high-resolution coarse-grained model, we have performed ab initio simulations of coupled folding and binding of KIX/pKID complex using no knowledge about the binding site and conformation of bound pKID. Simulations started from extended pKID structures placed in random locations in respect to the native KIX domain. During the simulations the pKID backbone was allowed to be fully flexible, while the conformational mobility of the KIX has been limited to near-native fluctuations. Several of the simulations which ended up in near-native arrangement of pKID/KIX complex have been further analyzed to investigate the mechanism of folding and binding. Obtained simulation data provide unique insight into the studied mechanism since, in contrast to other theoretical studies, no experimental data about pKID binding, or structure have been utilized, and sampling of the pKID conformational space during search for the binding site has been extremely efficient.

[1] Radhakrishnan, I., et al., Cell, 1997, 91(6): p. 741-52. [2] Sugase, K., H.J. Dyson, and P.E. Wright, 2007, 447(7147): p. 1021-5. [3] Chen, H.F., PLoS One, 2009, 4(8): p. e6516. [4] Kolinski, A., Acta Biochim Pol, 2004, 51(2): p. 349-71.

83

Posters

Poster 32: Kamil Kwarciak Tabu search algorithm for isothermic DNA sequencing by hybridization with partial multiplicity information available Kamil Kwarciak1,*, Piotr Formanowicz1,2 1

Institute of Computing Science, Poznan University of Technology 2 Institute of Bioorganic Chemistry, Polish Academy of Sciences *presenting author, e-mail: [email protected]

The DNA sequencing is one of the most important problems of molecular and computational biology. It focuses on determining a sequence of nucleotides a given DNA chain consists of. The DNA sequencing by hybridization (SBH) is one of many methods which are able to provide this information. It consists of two stages. The first stage is a biochemical experiment and its output is a spectrum, i.e. a set of l-long subsequences (called l-mers) of a given DNA sequence. The second stage is a computational one. The spectrum is used to reconstruct the target sequence. The biochemical experiment in the classical SBH approach identifies subsequences of equal length using DNA chips [1]. It contains a full l-long oligonucleotide library. Such a chip is put into a solution containing many copies of single stranded DNA and some parts of the DNA sequences hybridize to complementary oligonucleotides on the chip. The quality of experiment results depends on how stable are these duplexes. However, particular oligonculeotides from the equal-length library compose stable duplexes in different temperatures and it is hard to set common experiment conditions. To remove this obstacle, an alternative library composition has been proposed. Oligonucleotides of different length but having the same melting temperature are used [2]. A method using such an oligonucleotide library is called isothermic sequencing by hybridization. The classical SBH uses a binary information about DNA sequence composition. A given l-mer is or is not a part of the sequence. However, the development of the DNA chip technology enables to take into account some information about repetitions in the analyzed sequence. Currently, it is not possible to obtain the exact data of such type but even partial multiplicity information significantly improves the quality of reconstructed sequences in the case of using standard oligonculeotides libraries [3,4]. An approach combining two above modifications of the classical sequencing by hybridization is taken into account. Two simple but realistic multiplicity information models are taken into consideration. According to the first model one is able to obtain information if a given l-mer appears in the target sequence once or more than once. The second model assumes that it is possible do determine if a given l-mer occurs in the analyzed sequence once, twice or at least three times. A tabu search algorithm has been implemented to verify these models when experiment results have been obtained using isothermic libraries. It solves the problem with any kind of hybridization errors. A computational experiment results confirm that the information about repetitions leads to an improved reconstruction process also when isothermic libraries are used. They also show that the more precise model of multiplicity information increases the quality of the obtained results. [1] A. C. Pease, D. Solas, E. J. Sullivan, M. T. Cronin, C. P. Holmes, S. P. Fodor, P Natl Acad Sci USA, 1994, 91, 5022–5026. [2] J. Błażewicz, P. Formanowicz, M. Kasprzak, W.T. Markiewicz, Discrete Appl Math, 2004, 145, 40-51. [3] K. Kwarciak, P. Formanowicz, Bull Pol Ac: Tech, 2011, 51, 111-115. [4] K. Kwarciak, P. Formanowicz, under review.

84

Posters

Poster 33: Dorota Latek Multiple templates in structural characterization of the G protein-coupled receptor fold Dorota Latek1,2*, Slawomir Filipek2 1

International Institute of Molecular and Cell Biology, 4 Ks. Trojdena Street, 02-109 Warsaw, Poland 2 University of Warsaw, Faculty of Chemistry, Pasteura 1, 02-093 Warsaw, Poland *presenting author, e-mail: [email protected]

Recent progress in the X-ray crystallography of membrane proteins resulted in many new template structures for G protein-coupled receptors. GPCRs, which participate in cellular signal transduction, are of high importance as they are target of nearly one third of currently available drugs. Our recently developed service GPCRM [1] (http://gpcrm.biomodellab.eu) is a versatile tool for homology modeling of that membrane protein family. One of the functionalities of our method is usage of averaged multiple templates with sequence-dependent contributions. As we proved in [2] and in the last two rounds of the GPCRDock competition [4] (http://gpcr.scripps.edu) such a novel approach to structure modeling of GPCRs is especially useful when the sequence similarity between target and template is exceptionally low. [1] D. Latek, P. Pasznik, T. Carlomagno, S. Filipek, PLOS ONE, 2013, 8(2):e56742, doi:10.1371/journal.pone.0056742. [2] S. Yuan, U. Ghoshdastider, B. Trzaskowski, D. Latek, A. Debinski, W. Pulawski, R. Wu, V. Gerke, S. Filipek, PLOS ONE, 2012, 7(11): e47114, doi:10.1371/journal.pone.0047114. [3] Kufareva I, Rueda M, Katritch V, Stevens RC, Abagyan R; GPCR Dock 2010 participants, Structure, 2011, 19(8):e56742, 1108-1126.

85

Posters

Poster 34: Filip Leonarski RNA one-bead bead coarse grained grained force fields for folding and equilibrium dynamics simulations F. Leonarski1,2 *, F. Trovato3,4, V. Tozzini3,4,5, A. Leś2, J. Trylska1 1. Centre of New Technologies, University of Warsaw, Poland 2. Faculty of Chemistry, University of Warsaw, Poland 3. NEST, Istituto Nanoscienze, Na CNR, Pisa Italy 4. Scuola Normale Superiore, Pisa, Italy 5. Center of Nanotechnology Nanotechnology and Innovation, IIT, Pisa, Italy *presenting author, author e-mail: [email protected]

In the last years interest in RNA biology has grown rapidly rapidly as new classes and functions of RNA have been discovered. Unfortunately, the progress of structural studies of RNA has not been as rapid as in the case of proteins since performing X-ray X ray crystallography of RNA is more troublesome. Therefore there is a great need for computational models that would fill this gap. One of such methods used to help elucidate the RNA structure and internal flexibility is molecular dynamics. However, the standard full-atomistic full atomistic potential energy functions for RNA (such as Amber er or Charmm) have not been optimized for millisecond timescale simulations required for RNA folding. Also, millisecond simulations with all-atomic all atomic description of RNA are not computationally feasible in reasonable time. Another approach is to model RNA in a simplified coarse-grained coarse grained (CG) representation, where atoms are grouped into fewer interaction centers. [1] The behavior of these interaction centers (or pseudo-atoms) atoms) is simulated instead of real atoms, reducing the computational computational cost. However, such simplified models sacrifice either the universality or quality. Methodological development of such models, using optimization algorithms instead of manual trial and error parameterization, might help not only in deriving better better models, but also in understanding their properties and limitations. Here we show two one-bead bead per nucleotide CG models; one suitable for equilibrium dynamics of RNA helices and the other one for RNA structure prediction. These models were optimized using u an in-house house protocol based on evolutionary algorithm. [2] We discuss correlations between the potential energy terms, the differences between the structure prediction and equilibrium dynamics models, and the results of their application to RNA related problems. [3]

Figure 1 RNA hairpin (PDB:2KHY [4]) – fullatomistic structure is presented in the background, and CG representation in the foreground is a prediction of a 3D structure made with our CG model. [1] J. Trylska, J Phys.: Condens. Matter, Matter 2010, 22, 453101 [2] F. Leonarski, F. Trovato, V. Tozzini, J.Trylska, Lect. Notes Comp. Sci., 2011, 2011 6623, 147-152 [3] F. Leonarski, F. Trovato, V. Tozzini, A.Leś, A.Le J.Trylska, In preparation [4] J. Wang, T. Henkin, E. Nikonowicz, Nucleic Acids Res., 2010, 38, 3388-3398 3398

86

Posters

Poster 35: Grzegorz Łach Artificial design of RNA sequences. Grzegorz Łach1*, Janusz M. Bujnicki1,2 1

2

International Institute of Molecular and Cell Biology, ul. Ks. Trojdena 4, 02-109 Warsaw, Poland

Faculty of Biology, Adam Mickiewicz University, ul. Umultowska 89, 61-614 Poznan, Poland *presenting author, e-mail: [email protected]

Molecules of RNA are know to perform variety of roles in living cells: transfer of genetic information, being templates for the protein synthesis, regulators and catalysts of biochemical reactions. One way to understand the relationship between the RNA sequence and its structure and function is to artificially design the RNA sequences and test them (both in silico and experimentally). The key computational problem is: how to design an RNA sequence or a set of sequences, which preferentially fold into the desired structure [1]. Variety of software has been developed for making the design process automatic, but all of them suffer from serious deficiencies [2]. Majority of the existing software also lose in competition with human designers for most difficult cases. We present a new method of RNA design based on the complete partition function and novel algorithm for the selection of promising mutation sites. Finally, we present a benchmark based on design cases from EteRNA internet experiment when challenging RNA design problems are both created and solved by both software and large number of human players. [1] Dirks R.M., Lin M., Winfree E., Pierce N.A.: Paradigms for computational nucleic acid design. NAR 32:1392-1403 (2004). [2] Shukla G.C. et al. : A Boost for the Emerging Field of RNA Nanotechnology Report on the First International Conference on RNA Nanotechnology, ACS Nano 5:3405–2418 (2011).

87

Posters

Poster 36: Michał Łaźniewski Evaluation of commonly used docking programs on PDBbind database Michał Łaźniewski1,2*, Dariusz Plewczyński1,2 and Krzysztof Ginalski1 1

Laboratory of Bioinformatics and Systems Biology, CeNT, University of Warsaw, Żwirki i Wigury 93, 02-089 Warsaw, Poland

1

Department of Physical Chemistry, Faculty of Pharmacy, Medical University of Warsaw, Banacha 1 Street, 02-097 Warsaw, Poland *presenting author, e-mail: [email protected]

Docking is one of the most commonly used techniques in a rational drug design. It is employed for both identifying a correct pose of a ligand in the binding site of the protein, as well as for an estimation of the protein-ligand interaction strength. The purpose of our work was to evaluate seven popular docking programs (Surflex, LigandFit, Glide, GOLD, FlexX, eHiTS and AutoDock) on the extensive dataset composed of 1300 protein-ligands complexes from PDBbind 2007 database (1), where experimentally measured binding affinity values were also available. We compared independently the ability of proper posing (according to RMSD of predicted conformations versus the corresponding native one) and scoring (by calculating the correlation between docking score and ligand binding strength) for a wide range of different protein families and inhibitor classes. To our knowledge it is the first large-scale docking evaluation that covers both aspects of docking programs, that is predicting ligand conformation and calculating the strength of its binding. Our results clearly show that the ligand binding conformation could be identified in most cases by using the existing software, yet we still lack an universal scoring function for all types of molecules and protein families (2). [1] R. Wang et al., J Med Chem, 2004. 47: 2977-2980. [2] D. Plewczynski et al., J Comput Chem, 2011, 32: 742-755.

88

Posters

Poster 37: Magdalena Machnicka Bioinformatic analysis of GmrSD, a Type IV ModificationDependent Restriction System Magdalena A. Machnicka1*, Katarzyna H. Kamińska1 and Janusz M. Bujnicki1,2 1

Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4 02-109, Warsaw, Poland 2

Laboratory of Bioinformatics, Institute of Biotechnology and Molecular Biology, Faculty of Biology, Adam Mickiewicz University, Umultowska 89, 61-614 Poznan, Poland *presenting author, e-mail: [email protected]

Restriction-modification (RM) systems defend cells against invading foreign DNA due to their ability to distinguish the host’s DNA from the foreign DNA based on the pattern of DNA modification. They have been subdivided into four main types (I-IV). Types I-III attack DNA that lacks the host-like modification patterns, while Type IV targets DNA with specific modification present only on the invading DNA. GmrSD is a Type IV restriction system first discovered in Escherichia coli CT596, which targets and digests glucosylated (glc)-hydroxymethylcytosine (HMC) DNA [1]. Nuclease activity is dependent upon the presence of both the GmrS and GmrD proteins. In another E. coli strain (UTI89), GmrSD was found to occur as a fusion protein [3]. The GmrS was initially predicted to act as an endonuclease and NTPase while the GmrD was proposed to bind the DNA [2]. We present the results of an extensive bioinformatic analysis of sequence-function-structure relationships in the GmrSD protein family. Protein fold-recognition analyses revealed that GmrD contains the HNH endonuclease domain, while GmrS potential DNA-binding and NTPase domains. We have also found that the fusion version of the GmrSD system is more common than the heterodimer. We have analyzed the genomic context of GmrSD system proteins and evolutionary relationships between them. Our results provide a stepping stone towards the understanding of the GmrSD mechanism of action. [1] CL Bair, LW Black, J.Mol.Biol., 2007, 366(3), 768-778. [2] CL Bair, D Rifat, LW Black, J.Mol.Biol., 2007, 366(3), 779-789. [3] D Rifat, NT Wright, KM Varney, DJ Weber, LW Black, J.Mol.Biol., 2008, 375(3),720-734.

89

Posters

Poster 38: Paweł Mackiewicz Structural and comparative analysis of the peculiar plastid genome from peridinin dinoflagellate algae Paweł Mackiewicz1*, Krzysztof Moszczyński2, Andrzej Bodył3 1

Faculty of Biotechnology, University of Wrocław, Wrocław, Poland; Institute of Journalism and Communication, Faculty of Humanities and Social Sciences of Warsaw School of Social Sciences and Humanities (SWPS), Campus in Wrocław; 3 Department of Biodiversity and Evolutionary Taxonomy, Zoological Institute, University of Wrocław, Wrocław, Poland 2

*presenting author, e-mail: [email protected]

Dinoflagellates are unicellular eukaryotic algae of ecological and evolutionary importance, most of which harbor the so-called peridinin plastid. The most characteristic feature of this plastid is a peculiar genome organized in numerous, plasmid-like chromosomes called minicircles [1, 2]. They have size of 0.4-10 kb only, which is much lesser than 100-150 kb – the length of typical plastid genome. This unusual plastid genome organization raises questions about the role of minicircle non-coding regions, mechanisms of their replication, their potential coding capacity, phylogenetic relationships among them, and their evolution. To answer some of these issues, we performed a wide-scale comparative analysis on 103 minicircle sequences using BLAST, CLANS, and SplitsTree bioinformatics tools. The observed extensive distribution of quite short homologous regions within and between minicircles indicates numerous intra- and interchromosomal recombination events. Such view is supported by the CLANS [3] clustering and the detailed phylogenetic analyses of defined groups of minicircles. Some of them did not show similarity to any minicircle whereas detailed phylogenetic analyses showed that they were acquired via horizontal gene transfer from bacteria [4]. The phylogenetic signal detected by phylogenetic nets is incongruent with a tree-based evolutionary model. Using the DNA walks method visualizing the asymmetry in DNA strands [5], we identified potential origins of replication in 56 minicircles, suggesting their replication via bidirectional forks according to the theta model. The remaining 47 minicircles replicate probably by the rolling circle model [6]. The proposed replication origins coincide well with the occurrence of inverted repeats, tandem repeats, and palindromes, which may play a role as structural elements in the initiation of replication. [1] Z. Zhang, B.R. Green, T. Cavalier-Smith, Single gene circles in dinoflagellate chloroplast genomes, Nature, 1999, 400, 155-159. [2] C.J. Howe, R. E. Nisbet, A.C. Barbrook, The remarkable chloroplast genome of dinoflagellates, J. Exp. Bot., 2008, 59, 1035-1045. [3] T. Frickey, A.N. Lupas, CLANS: a Java application for visualizing protein families based on pairwise similarity, Bioinformatics, 2004, 20, 3702-3704. [4] K. Moszczyński, P. Mackiewicz, A. Bodył, Evidence for horizontal gene transfer from bacteroidetes bacteria to dinoflagellate minicircles, Mol. Biol. Evol., 2012, 29, 887-892. [5] P. Mackiewicz, A. Gierlik, M. Kowalczuk, M.R. Dudek, S. Cebrat, How does replicationassociated mutational pressure influence amino acid composition of proteins? Genome Res., 1999, 9, 409-416. [6] S.K. Leung, J.T.Y. Wong, The replication of plastid minicircles involves rolling circle intermediates, Nucleic Acids Res., 2009, 37, 1991-2002.

90

Posters

Poster 39: Dorota Mackiewicz The role of purifying selection in hotspot distributions along human chromosomes. Dorota Mackiewicz1*, Paulo Murilo Castro de Oliveira2, Suzana Moss de Oliveira2, Stanisław Cebrat1 1

2

Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Przybyszewskiego 63/77, 51-148 Wrocław, Poland

Instituto de Física, Universidade Federal Fluminense; Av. Litorânea s/n, Boa Viagem, Niterói 24210-340, RJ, Brazil *presenting author, e-mail: [email protected]

Homologous recombination is a crucial process in generating new combinations of alleles, which greatly increases the potential for adaptive diversity of genomes and promotes efficient selection against deleterious mutations. Defects in the frequency or positioning of recombination can lead to genome instability or other forms of genomic disorders. In many diploid organisms including humans, the average meiotic recombination rate is higher in the subtelomeric regions than in the middle parts of chromosomes [1,2], whereas the vast majority of recombination events are confined to narrow DNA regions called hotspots [3] in which characteristic DNA motifs are found [4]. Our computer simulations based on Monte Carlo model of eukaryotic chromosome evolution predicted that purifying selection eliminating defective alleles resulting from mutations is strong enough to distribute recombination events unevenly along virtual chromosomes, as is observed in real ones. Moreover, we observed a very frequent relocation of recombination positions in the time of simulations, which agrees with non-conserved hotspot locations in human and chimpanzee genomes. Detailed analyses of the distribution of DNA motifs presumably related to homologous recombination in the human genome showed that clusters of DNA motifs rather than single motifs are involved in the positioning of recombination hotspots. [1] M.I. Jensen-Seaman, T.S. Furey, B.A. Payseur, Y. Lu, et al., Genome Res. 2004, 14, 528-538. [2] D. Serre, R. Nadon, T.J. Hudson, Genome Res., 2005, 15, 1547-52. [3] A.J. Jeffreys, L. Kauppi, R. Neumann, Nat. Genet., 2001, 29, 217-222. [4] S. Myers, L. Bottolo, C. Freeman, G. McVean, P. Donnelly, Science, 2005, 310, 321-324.

91

Posters

Poster 40: Marcin Magnus Prediction of accuracy of RNA 3D models Marcin Magnus 1

1,*

1 1,2 , Albert Bogdanowicz and Janusz M. Bujnicki

Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, ul. Ks. Trojdena 4, 02-109 Warsaw, Poland 2

Laboratory of Bioinformatics, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University, ul. Umultowska 89, 61-614 Poznan, Poland *presenting author, e-mail: [email protected]

The understanding of the importance of RNA molecules has dramatically changed over the recent years. As in the case of proteins, the function of an RNA molecule is encoded in its threedimensional structure, which in turn is determined by the molecule's sequence. Therefore, there is a need to develop computational methods that are able to provide reliable models of RNA molecules based only on nucleic acid sequences. A standard workflow of structure prediction looks as follows: based on a nucleic acid sequence, a set of RNA 3D models is produced, the accuracy of the models is assessed, and based on the scores the final prediction is made. The scope of this project is to develop a new scoring method that can be incorporated into a more accurate RNA 3D modeling workflow. First, we prepared a dataset of known 3D structure of RNA. Second, structures where carefully annotated: breaks in chains, missing atoms, modified residues were identified and nucleic acid sequences were generated from PDB files. Next, to remove RNA sequences that shared high sequence identity from our dataset, we clustered them with CLANS. Based on the dataset, using methods for RNA 3D modeling, we generated a set of 3D decoys. Then, the performance of already existing methods for model quality assessment, such as RASP, RNAkb, FARNA, NAST, SimRNA, QRNA (AMBER force field), was measured on the 3D decoy dataset. Based on the statistical analysis of results, the meta-approach was proposed to improve the prediction of quality of RNA 3D models. The web server was developed to allow users to submit their own RNA models. Additionally, we created a plugin to get quality scores from our method and color a model according to the predicted quality directly in PyMOL.

Figure 1. Exemplary RNA model colored according to predicted quality (low quality parts are thick and in red, high quality are thin and in blue).

92

Posters

Poster 41: Jan Majta Effect of a polar environment on the conformational flexibility of methyl phosphoethanolamine Jan Majta*, Krzysztof Murzyn Department of Compuational Biophysics and Bioinformatics, Faculty of Biochemistry, Biophysics, and Biotechnology, Jagiellonian University, Kraków, Poland *presenting author, e-mail: [email protected]

Molecular dynamics (MD) simulation is a powerful method for analysis of dynamics and structure of molecules in gas and condensed phases. This computational method is often employed to complement and explain results obtained in experimental studies. MD simulations allow to gain an insight into atomic-scale interactions of a model system which is often inaccessible even in very sophisticated experimental studies. On the other hand, MD simulation method has its own problems which need to be carefully tackled with. One of them is inability of the classical MD simulation to sample properly highly unstable, high-energy states of molecules which determine conformational transitions and other dynamical properties of molecules under study. Umbrella Sampling method [1] is the excellent solution for this problem. The basic principle of this method is to force the system to stay in a high-energy state and then to obtain a reasonable estimate of the unbiased probability distribution of molecule conformers which is used to calculate the free energy change in function of the reaction coordinate, which in case of the analysis of conformational flexibility of a molecule is simply the monitored dihedral angle. In this work we focus on conformational flexibility of methyl phosphoethanolamine (MPEA) molecule which can be conveniently described by dynamics of four main dihedral angles. What we want to analyze is the influence of water on the location and height of energy barriers between low-energy conformers of the molecule. The second purpose of this work is to perform a comparative analysis of two forcefields: OPLS/AA [2] and SLipids [3]. The computational procedure involves carrying out a series of MD simulations with dihedral restraints stabilizing the chosen dihedral angle in a certain conformation. Once such Umbrella Sampling (US) MD simulations for all chosen conformers are completed, WHAM algorithm [4] is employed to combine information from each single simulation and to create a profile of the free energy versus dihedral angle value. To make this procedure semi-automatic and user-friendly we have developed a Python script which prepares all necessary input files, runs the simulations and finally performs the WHAM analysis. Our results show that water has marked influence on MPEA dynamics since it can dramatically change the preferred conformation of a molecule as compared to the gas phase. What is more using different forcefields leads to significantly different results emphasizing the need for careful validation of employed forcefield parameters. Finally, it is worth noting that the US MD simulation is a method that can be easily and successfully applied to conformational flexibility analyses and to other related studies. This work is supported by the Polish National Science Center under grant no. 2011/01/B/NZ1/00081. [1] G.M. Torrie, J.P. Valleau, Journal of Computational Physics, 1977, 23(2):187–199 [2] WL. Jorgensen, J. Tirado-Rives, J. Am. Chem. Soc., 1988, 110(6):1657–1666 [3] J. P. M. Jämbeck, A. P. Lyubartsev, J. Phys. Chem. B, 2012, 116(10):3164–3179 [4] Grossfield, A, http://membrane.urmc.rochester.edu/content/wham/, version 2.0.6

93

Posters

Poster 42: Damian Marchewka Structural analysis of the lactoferrin Marchewka D.1,2,*,Roterman I.1 1Department of Bioinformatics and Telemedicine, Jagiellonian University – Medical College, Lazarza 16, 31-530 Krakow, PL 2Faculty of Physics, Astronomy and Applied Computer Science – Jagiellonian University, Reymonta 4, 30-059 Krakow, PL *presenting author, e-mail: [email protected]

Lactoferrin (Lf) is a member of the transferrin family of proteins. Lactoferrin has many different functions in the body of mammals. It is present in various secretory fluids like milk, teardrops and blood. One of the most important features of lactoferrin is that it is a regulator of the levels of free iron in the body fluids of mammals which makes the protein bacteriostatic. The crystallographic structure of lactoferrin (1BLF) was selected from PDB to studies. The three molecular dynamic (MD) simulations were performed using the Gromacs program. In the first MD simulation we used the 1BLF structure with carbohydrates but without Fe3+ and CO32ligands. In the second MD simulation we used the 1BLF structure with carbohydrates without Fe3+ and CO32- ligands but with distance restrains between residues in the N-lob lactoferrin iron binding pocket. In the third case we used the 1BLF structure with carbohydrates without Fe3+ and CO32- ligands but with distance restrains between residues in the N-lob and C-lob lactoferrin iron binding pockets. In each case, the time of simulation was equal to 150 ns. The analysis of changes in the tertiary structure of the iron-binding pockets during the simulation were performed using Gromacs program. Finally, different tertiary structure changes during the simulations were observed. [1] H. Baker, E. Baker, Lactoferrin and iron structural and dynamic aspects of binding and release , BioMetals, 2004, 17, 209-16. [2] H. Baker, E. Baker, Molecular structure, binding properties and dynamics of lactoferrin, CMLS, 2005, 62, 2531-9. [3] D. Marchewka, I. Roterman,M. Strus, K. Śpiewak, G. Majka, Structural analysis of the lactoferrin iron binding pockets, BAMS, 2012, 8(4), 351-359 . [4] N. Zhou, D. Tieleman, H. Vogel, Molecular dynamics simulations of bovine lactoferricin: turning a helix into a sheet, BioMetals, 2004, 17, 217-223.

94

Posters

Poster 43: Grzegorz Markowski Construction of the training set to predict contact maps based on unbalanced train data. Grzegorz Markowski, Rafał Adamczak Faculty of Physics, Astronomy and Informatics N. Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland {grzegorz,raad}@fizyka.umk.pl

The prediction of a protein contact maps has become in recent years an important step for the prediction of complete 3D protein structure. Therefore we need a new computational methods which can solve this problem. All existing methods that are utilizing machine learning algorithms for contact map prediction balance the training set. Since there are much more of non-contact than contact cases the training set is prepared in the way to have balanced number of vectors from each class. Usually it is done by randomly sampling non-contact instances. Unfortunately using balanced training set has a drawback – predictor with even high accuracy on the training set may lead to poor results on the test sets that are in reality highly unbalanced. That is why we decided to check influence of unbalance training set on contact map prediction. We created new method called ProMap (PLCTin CASP) that has feature space built on typical features that are used by most of the existing methods for contact map prediction. The only difference is unbalanced training set and different coding of secondary structure states. Following definition of contacts range used in CASP we divided contacts into 3 categories: 1. SR - short range contacts 6≤|i-j|