A Generalized Framework for Pathosystems

4 downloads 0 Views 365KB Size Report
Abstract – This paper describes a generalized web-based framework for VBI ... running on Apache Jakarta Tomcat; a business object layer, server and web service ... are diverse – they have been implemented in different languages, running on .... to send back the result, since both web browsers as well as web proxies will ...
A Generalized Framework for Pathosystems Informatics & Bioinformatics Web Services Tian Xue, Boyu Yang, Rebecca Will, Bruce Sharp, Ronald Kenyon, Oswald Crasta, Bruno W. Sobral Cyberinfrasgtructure Group Virginia Bioinformatics Institute (0477) Washington Street Blacksburg, VA 24061 Abstract – This paper describes a generalized web-based framework for VBI pathosystems informatics & bioinformatics web services. The framework provides a universal mechanism for accessing web services through web browsers. A web service can be configured to the framework easily. The framework is designed using object-oriented methodology and implemented using Java 2 Enterprise Edition, relational databases, XML technology and various other open source technologies. It is designed to conform to a ntiered architecture that includes several layers: a web-based graphical user interface built using Apache Struts web framework running on Apache Jakarta Tomcat; a business object layer, server and web service components that process the queries and manipulate the result sets; and a robust data access layer. The system provides an extensible, flexible, and user-friendly framework for the cyberinfrastructure group’s bioinformatics projects, including genome curation, microarray gene expression profiles, pathogen information curation and pathway analysis, etc. Key words: Struts Framework, Web Services, Pathosystems Informatics

1 Introduction Bioinformatics tools and applications are playing increasingly important roles in biological research, with the ever expanding breadth and depth of research and accumulation of data. However, bioinformatics tools are diverse – they have been implemented in different languages, running on different platforms, and returning data with incompatible data formats. The Cyberinfrastructure Group (CIG) strives to solve these problems and provide a unified and universally accessible tool framework for bioinformatics using pathosystems biology (infectious diseases) as a scientific focal point. The CIG has designed and implemented a web service framework to integrate different bioinformatics tools together and wrap them into web services. Not only can different tools be accessed by the same client and same mechanism, but they can also coordinate into a single workflow to provide more power whenever needed. We have also provided a rich Java client, ToolBus [1], for accessing those web services. Now, as a second step, we have designed and implemented a web-based application for accessing pathogen portal web services (PathPortWeb, http://pathport.bioinformatics. vt.edu:6565/pathportWeb/jsp/home.jsp). Since web applications are universally accessible and they do not require installation of client tools, they will make CIG’s pathogen portal (PathPort) web services and their underlying Bioinformatics tools more widely and readily available. This paper is focused on the design and implementation of PathPortWeb web application. Web applications built for bioinformatics typically have different requirements from other web applications. They have high computational load per access and each access may require long time lapse before the

result can be presented to the user. Also, the application core is frequently more complicated – it can be implemented in different languages such as Perl, C/C++, etc., and run on different platforms. PathPortWeb design has clearly separated the presentation tier from the computational tier to meet these challenges. The computational part of the application is designed using web services, which also serve the dedicated, non-web based client, ToolBus. The presentation tier is developed using Struts [2] framework. There are several advantages to this. First it makes reuse of the user interface and the computational code easier, and modifications to one part (user interface) will not affect the other part (application core). Secondly, the web interface and application core may be run in separate environments that are suitable for each. We are also able to scale-up the resources and provide the potential to load-balancing in cluster environments.

2 Architectural Design As mentioned above, web applications for Bioinformatics have to face a different set of requirements than other typical web applications: user-perceived responsiveness; long and resource-intensive computations, multiple tiers of communication, including various external tools. To meet these requirements, we designed the system by clearly decoupling the different layers, choosing proven technologies at each layer, and using proper communication protocols and message formats. The following sections will first introduce the Model-View-Controller (MVC) pattern in general and Struts framework in particular, then present the PathPort web services framework that serves as the data source for the Struts model. Finally we will put everything together in a single architecture, from user interface to web services, and briefly discuss some remaining challenges.

2.1 The MVC pattern MVC [3] is a proven and widely adopted design pattern and there exist a number of matured frameworks, most notably Struts. There are several variations to the MVC pattern, but they are all based on the same underlining structure: an application's data models (the Model), presentation code (the View), and program control logic (the Controller). The Model component represents and processes application data. The View is the user interface; it reflects the Model data and presents it to the user. The Controller maps actions performed on the View (for example, pressing a Submit button) to operations on the Model (for example, retrieving user details). After the Model is updated, the View is refreshed and the user can perform more actions. The MVC pattern clarifies code and facilitates code reuse; in addition, in many projects the View is frequently updated, and the MVC pattern insulates the Model and Controller from those changes. Figure 1 outlines the MVC pattern.

Figure 1. MVC design pattern

2.2 Struts: A solid MVC-based framework Struts [2] is an open-source framework for building Web applications based on the MVC pattern. It provides services common to most Web applications. In a Struts application, the Controller component (the ActionController classes) glues together the view and the model.

The Model is responsible for implementing business logic, retrieving and storing data for the view layer. The View is responsible for displaying the data to users. Struts Model (Model classes) can work with any standard data access technology, including EJB (Enterprise Java Beans) and OR (ObjectRelational) mapping between Java objects and relational databases. For the View, Struts works well with Java Server Pages (JSP) and many other presentation systems. Figure 2 illustrates the logical flow of a Struts-based application. Figure 2. Logical flow of a Struts application

2.3 PathPort web services framework PathPort web services [4] [5] are aimed at building a flexible bioinformatics infrastructure to integrate any number of heterogeneous tools into one framework. There are many advantages to this approach. First, since all tools are exposed as web services, they are location and implementation-transparent. Clients only need to know the advertised web services and need not to concern themselves about where they are installed and how they are implemented. Second, execution of those web services are coordinated under one application framework. Third, use of compatible XML documents to communicate among different web services facilitates automated orchestration of these services. Last, but not least, the same web services framework can serve different client types. We already have developed ToolBus to do in-depth analysis and viewing, and we have implemented web access as described in this paper. Apache Axis [6] is used as the web-service application framework and is essentially the web services engine. The business logic layer is where data are retrieved and processed, from backend databases, from processing results of native programs, or from publicly available services. The data layer is where data are stored or processed. Figure 3 illustrates the web-service architecture design.

Figure 3. Web service architecture design

2.4 From Struts to web services The Struts framework clearly demarcates a View, a Controller, and a Model. The Model contains all the business logic necessary to retrieve data from any data source. In PathPortWeb, the model retrieves data from web services. Figure 4 illustrates the logical flow of PathPortWeb architecture.

Figure 4. Logical flow of PathPortWeb architecture The core of PathPortWeb lies in the design and implementation of the web services proxy manager. The web services proxy manager is responsible for creating and initializing proper web services proxies. Each web services proxy corresponds to one web service, and it is responsible for formulating and formatting the request parameters, handling all the low-level SOAP messages and calling the proper web services. The following is a typical sequence for a user request: • • • • • •

User makes request. Struts ActionServlet routes to a particular Action class to process the request, validate the parameters, etc. Action calls web services proxy manager to locate the proper proxy. Proxy processes and formats the parameters to call web services. Web services processes the request, possibly calling one or several tools or data sources, and returns the result. Result is processed (delegate to helper classes) and model is generated and set to the view(JSPs).

3 Use Case Example BLAST [7], one of the bioinformatics tools in PathPortWeb application, offers an interface to user to provide sequence similarity searching. See Figure 5. When a web application backend takes a long time to complete, the application cannot wait till it is finished to send back the result, since both web browsers as well as web proxies will time out and interrupt the computation. In addition the user will often think that the web server or the network is down. A common technique used mostly in older bioinformatics services, is to send the user an email with the result. A different approach that PathPortWeb supports is to use the refresh mechanism of the meta tag in HTML. The web application redirects to a page that shows the progress and refresh frequently before it completes; when the result is ready, the result page is displayed instead of the status page (Fig 6).

Figure 5. BLAST user interface. User selected blastp and submitted a protein sequence.

Figure 6. After query is submitted, BLAST web application redirects to a page that shows progress, refreshing frequently, and when the result is ready, it is displayed instead of the progress page. When user selected Plain Text view, the BLAST tool returns a URL link which directs to a new browser to present the full plain text result (Figs. 7 and 8).

Figure 7. BLAST tool returns a URL link, when the user clicks on the link, a new browser is opened to present detailed alignment information.

Figure 8. Text view of BLAST output. When the user selects Graphic view, the BLAST tool returns output graphically (not fully implemented yet, as graphics design in progress, Fig. 9).

Figure 9. Graphics view of BLAST alignment output.

4 Conclusion, Discussions and Future Work This paper shows the overall design of CIG’s PathPortWeb System, how it is built into the Struts framework and web services architecture, and how XML messaging is used to communicate among different bioinformatics tools. In a broader view, the system offers a generic MVC architecture. This architecture allows most bioinformatics applications to be presented as easy-to-use services to clients. Furthermore, independently developed but related applications can be linked as a workflow process under this framework and offer functionalities that are not possible individually. Since web services can be distributed and discovered, they provide high availability to the biological research communities. Support for client-side graphics is in progress.

5 Acknowledgments This work was supported by US Department of Defense project W911SR-04-0045 to BW Sobral. The development of CIG’s PathPortWeb has benefited from the following open sources: Struts, Tomcat and Axis from Apache, WSDL4J [4] and UDDI4J [8] from IBM, and Castor from Exolab.

6 Reference [1] A Life Scientists Gateway to Distributed Data Management and Computing: The PathPort/ToolBus Framework, OMICS: A Journal of Integrative Biology, Volume 7, Number 1, pp 79-88, J Dana Eckart and Bruno Sobral, Spring 2003. [2] Building web applications with the leading Java framework, Struts in Action. Pp 672, Ted N. Husted, Cedric Dumoulin, George Franciscus, David Winterfeldt, 2002, ISBN: 1930110502 [3] Model-View-Controller http://java.sun.com/blueprints/guidelines/designing_enterprise_applications_2e/web-tier/web-tier5.html [4] Web Service: http://java.sun.com/webservices/ [5] Genome Annotation and Comparison System, The 2006 International Conference on Bioinformatics & Computational Biology (BIOCOMP), Jing Zhao, Tian Xue, Boyu Yang, Kelly Williams, Alice R. Wattam, Rebecca Will, Bruce Sharp, Ron Kenyon, Oswald Crasta and Bruno W. S. Sobral. June 2006, Las Vegas, NV, USA. [6] Apache Axis: http://xml.apache.org/axis/ [7] Basic local alignment search tool, Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J., J. Mol. Biol. 215:403-410, 1990 [8] Universal Description, Discovery and Integration (UDDI): http://www.uddi.org.