Guest Editorial: Special Section on Grid, Web Services ... - IEEE Xplore

3 downloads 43261 Views 728KB Size Report
Services, Software Agents, and Ontology. Applications for Life ... RECENT progress in the application of bioinformatics, ... hosting the requested services.
IEEE TRANSACTIONS ON NANOBIOSCIENCE, VOL. 6, NO. 2, JUNE 2007

101

Guest Editorial: Special Section on Grid, Web Services, Software Agents, and Ontology Applications for Life Sciences

R

ECENT progress in the application of bioinformatics, computational biology, and systems biology to biomedicine has made it essential to devise technological platforms able to ensure appropriate support to research activities performed in the area of life sciences. In fact, the increasing amount, complexity, and heterogeneity of biological data, together with the increasing production of the corresponding scientific literature, raised novel challenges. It is clear that no generic description can be given which is able to encompass all requirements imposed by different disciplines; nevertheless a new generation of systems, tools, and applications is being developed, aimed at ensuring the feasibility of significant advances in many relevant subfields of life sciences. This special section is the result of a remarkable effort made by the authors, who described their work and results with the goal of producing high-quality papers. Most of them are focused on Grid and distributed computation, which are matters of paramount importance to tackle the problems raised by the inherent computational complexity of many tasks in life science research topics. Others are concerned with advanced techniques and methodologies, like Web services and agent-based systems. The need for organizing the domain knowledge in form of taxonomies or ontologies is also a primary concern for some papers, and an underlying issues for others. To facilitate the reader in the attempt of finding papers s/he may be interested in, we decided to briefly sketch them according to the topic that is apparently preeminent. Grid is an emerging computing model that distributes processing across a parallel infrastructure. For a computing problem to benefit from a grid, it must require either large amounts of computation time or large amounts of data, and it must be reducible to parallel processes that do not require intensive communication among the nodes involved in the computation. Any computer in the world can host a task of a computation issued by a remote user provided that it has the ability of offering the required service. In this special section, Arbona et al. provide an overview of the @neurIST Grid middleware and outlines an infrastructure devised to support modeling and simulation tasks and to access heterogeneous distributed data sources through semantic integration. Cannataro et al. provide a complete grid environment for proteomics data analysis. They integrate, in a single environment for proteomics spectra management, existing platforms (MS-Analyzer and BioDCV), emerging proteomics standards (the mzData data structures proposed by the HUPO-PSI initiative), and the EGEE Biomed VO grid infrastructure. Mirto

Digital Object Identifier 10.1109/TNB.2007.897434

et al. describe IRIS, a system for the prediction of protein secondary structures. The prediction task is carried out by resorting to a multilayer perceptron, which—as usual—takes into account the evolutionary multiple alignment information. The novelty of the proposal consists of resorting to a Grid-enabled environment, including the adoption of an optimized parallel version of the PSI-BLAST tool, based on the MPI Master–Worker paradigm. Pierson et al. focus their attention to the Grid for Geno Medicine, a project started in 2004 and aimed at providing a comprehensive grid software infrastructure to allow biologists to mine and analyze relationships between medical, genetic and genomic data stored in distributed data warehouses. Salzeman et al. propose a service to automatically update the molecular biology databases from a single changing reference using Web services. They report the components, the architecture, and the deployment of the update service on the grid infrastructure. Van der Wath et al. focuses on gene regulation mechanisms, which is an important and challenging task of bioinformatics. They show that gene expression information can be used to annotate transcription binding sites upstream co-regulated genes. They also relate gene expression levels to the matching scores of nucleotide patterns—so that DNA-binding sites can be identified from a collection of noncoding DNA sequences from co-regulated genes. The authors also discuss extending the approach to multi-species by exploiting the gLite GRID framework. Within a distributed computation framework, Boccia et al. describe a scheduling architecture able to distribute jobs over a large number of hierarchically organized nodes. The corresponding system takes advantage of tools, such as relational database servers, to produce low latency and performance which compares well and often surpasses that of more traditional, dedicated schedulers. Batteries of computational nodes, such as those found in parallel clusters, provide a platform of choice for this application, especially when a relatively large number of concurrent requests is expected. A Web service is a software system designed to support interoperable machine-to-machine interaction over a network. Web services are frequently application program interfaces that can be accessed over the Internet and executed on a remote system hosting the requested services. In this special section, Carrabino et al. propose mepsMAP, a new bioinformatics Web server aimed at identifying the recognition sites between antibodies and their cognate antigens. A facility on the server allows the user to search putative conformational epitopes on protein surface, querying the system for proteins with a given annotation. From a software engineering perspective, software agents are a powerful computational paradigm aimed at performing analysis, design, and implementation of complex software systems. In their paper, Bartocci et al. present an agent-based, multi-

1536-1241/$25.00 © 2007 IEEE

102

IEEE TRANSACTIONS ON NANOBIOSCIENCE, VOL. 6, NO. 2, JUNE 2007

layer architecture for bioinformatics Grids intended to support both the execution of complex in silico experiments and the simulation of biological systems. In the architecture a pivotal role is assigned to a semantic index of resources, which is also expected to facilitate users’ awareness of the bioinformatics domain. In this way, modeling and simulation of biological systems and processes, as well as automated bioinformatics analysis of high-throughput data are made easier. Armano et al. describe an agent-based system able to retrieve scientific publications from the web throughout a text categorization process. To this end, a generic multiagent architecture has been customized according to the requirements imposed by the specific task. Finally, we would like to thank all authors for their active commitment and their willing to make this special section successful. It is worth pointing out that the papers included hereinafter have also been selected for their ability to give the interested reader an insight about the possibilities of novel computational paradigms, techniques, and methodologies that

hopefully will allow researchers to make significant steps towards the comprehension of the secrets of life. LUCIANO MILANESI, Guest Editor Italian National Research Council—Institute of Biomedical Technologies (CNR-ITB) Milan, Italy GIULIANO ARMANO, Guest Editor University of Caligari Caligari, Italy VINCENT BRETON, Guest Editor National Center for Scientific Research (CNRS) Paris, France PAOLO ROMANO, Guest Editor National Cancer Research Institute Genoa, Italy

Luciano Milanesi received the Ph.D. degree in health physics from the University of Milan, Italy, in 1986. He is currently a Researcher with the Italian National Research Council—Institute of Biomedical Technologies (CNR-ITB), Milan. He is leader of Bioinformatics for the University Center for Excellence, Center for Bio-molecular Interdisciplinary Studies and Industrial Applications. He is the coordinator of the European BIOINFOGRID Bioinformatics Grid Applications for Life Science and the LITBIO Laboratory of Bioinformatics Technologies. He is an editorial board member of the IEEE TRANSACTIONS ON NANOBIOSCIENCE and Briefings in Bioinformatics. He has published several refereed publications in journals, books and conference proceedings relating to the areas of bioinformatics, system biology, and medical informatics. He has coedited four books in the area of bioinformatics.

Giuliano Armano received the Ph.D. degree in electronics and computer engineering from the University of Genoa, Italy, in 1989. He is currently Associate Professor of Computer Engineering at the University of Cagliari, Italy, also leading the Intelligent Agents and Soft Computing (IASC) group. His educational background ranges over machine learning and software agents. The above research topics are mainly experimented in the field of bioinformatics and information retrieval.

IEEE TRANSACTIONS ON NANOBIOSCIENCE, VOL. 6, NO. 2, JUNE 2007

103

Vincent Breton received the Engineer degree from Ecole Centrale de Paris, France, in 1985 and the Ph.D. degree in Nuclear Physics from the University of Paris XI—Orsay, France, in 1990. Since 1990, he has been a Research Associate at the French National Centre for Scientific Research (CNRS), Paris. In 2001, he founded a research group (http://clrpcsv.in2p3.fr) on the application to biomedical sciences of the IT technologies and tools used in high energy physics. Cofounder of the GATE collaboration (http://opengate.in2p3.fr) gathering more than 20 research laboratories around the world and cofounder of the Healthgrid and WISDOM initiatives and chairman of the first European conferences on grids for health in January 2003 and January 2004, he is involved in several FP6 European projects dealing with grids for life sciences and healthcare (Embrace, EGEE-II, BioinfoGRID, Share).

Paolo Romano received the degree in electronic engineering at the University of Genoa, Italy, in 1982 and the Ph.D. degree in bioengineering at the Polytechnic of Milan, Italy, in 1987. He was a Researcher in the Faculty of Medicine of the University of Genoa from 1990 to 1993. Since 1993 he has been a Researcher at the National Cancer Research Institute of Genoa. His research interests include data modeling, ontologies, Web Services, and data integration. He is involved in the development of related tools and of workflows for biological data analysis processes.