Migrating to Web services - Carleton University

JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION: RESEARCH AND PRACTICE J. Softw. Maint. Evol.: Res. Pract. 2004; 16:51–70 (DOI: 10.1002/smr.285)

Research

Migrating to Web services: a performance engineering approach Marin Litoiu∗,† Centre for Advanced Studies, IBM Toronto Laboratory, 8200 Warden Avenue, Markham, Ontario, Canada

SUMMARY In this paper we look at several performance pitfalls that Web services are facing today and at the performance penalties that have to be paid when exposing a legacy application as a Web service. We investigate two performance metrics of Web services, latency and scalability, and compare them with those of legacy middleware. The goal of the paper is to show how the performance penalties can be mitigated by following the principles and methods of performance engineering. Performance models can help the migration decisions, especially when new architectures or new deployment topologies are sought. The paper shows the mechanisms of building and solving a performance model involving Web services. An example is c 2004 John Wiley & Sons, Ltd. presented throughout the paper. Copyright KEY WORDS :

performance modeling; performance evaluation; Web services; distributed application

1. INTRODUCTION Software qualities that are often taken into account when reengineering a distributed application include modifiability, availability and performance. This paper is concerned with preserving and improving performance when migrating a legacy application to a new technology, Web services. Web services consist of a set of protocols, standards and languages that aim at achieving modularity, simplicity, loosely coupling and interoperability in distributed software. In Web services, remote interaction is done through a language independent protocol, Simple Object Access Protocol (SOAP) [1] and any remote object is described by a Web Service Description Language (WSDL) [1]. A typical life cycle of a Web service starts with a Service Provider that implements a service and then publishes its description with a well-known entity, a Service Broker. A Service Requester, that is a

∗ Correspondence to: Marin Litoiu, Centre for Advanced Studies, IBM Toronto Laboratory, 8200 Warden Avenue, Markham,

Ontario, Canada. † E-mail: [email protected]

c 2004 John Wiley & Sons, Ltd. Copyright

Received 1 December 2002 Revised 11 April 2003 Accepted 30 June 2003

52

M. LITOIU

user or a program that wants to programmatically use a published Web service, will find the service at the Service Broker and check if it satisfies the needed requirements by inspecting its WSDL. Finally, the Service Requester will bind its application logic to the Web service that resides on the Service Provider host. By adding a language-independent layer to the communication stack, enhancing the interoperability through a new description language, lessening discovery of services and components, and bringing the promise of easier late binding, Web services attain the characteristics of an attractive middleware technology. As a result, many companies are in the process of planning to make the transition to this new middleware. This paper looks at the performance implications of exposing existing applications as Web services and especially at the implications of using SOAP protocol. Several metrics are of interest to users and service providers, namely, latency, scalability, and throughput. Latency is the time elapsed at the client side from when a remote Web service method is invoked until a response is returned. Scalability is the capacity of the application to accommodate an increasing number of users without a degradation of the performance metrics. Throughput is the number of requests served per unit of time. Results about the latency of several SOAP implementations have been reported in the context of grid computing [2,3]. However, our focus is on a different computing model, namely client/server applications as supported by Web sites through J2EE architectures [4]. Those architectures require high scalability and high throughput. Client/server applications share common resources at the server side and as the number of clients (users) increases, the shared resources quickly bottleneck and scalability is affected. For this computing model, the server side of the Web service should be responsive and scalable. When scalability is considerably affected by the introduction of Web services, we must find ways of alleviating the problem. A performance study is best addressed when performance models are used. Performance models such as layered queuing model [5] and optimization techniques [6,7] can help with design and capacity planning exercises such that the user response time is kept low under different workloads. The remainder of this paper is organized as follows. Section 2 discusses the Web services infrastructure and the main software components in a reengineering scenario supported by WebSphere Studio [8]. Section 3 presents an example of reengineering an application, describes the performance penalties, and explains the sources of the delay. Section 4 shows how to build a performance model for Web services. Section 5 describes alternative configurations that can be studied with the help of performance models. Section 6 presents a summary and conclusions.

2. WEB SERVICES IN A REENGINEERING SCENARIO In this section we show the main software components involved in a typical usage scenario, namely a client server invocation, and the possible sources of delay introduced by the Web services infrastructure. The software components involved are shown in Figure 1 in a UML [9] class diagram. The classes stereotyped with runtime are part of the SOAP protocol. SoapClient and RPCRouter classes are the XML serializers and deserializers at the client and server side, respectively. To serialize (or deserialize) the XML messages, the two classes use XML Parser [10]. The class Provider performs the invocation of the actual service. To select the appropriate Provider, the RCPRouter (which is a JavaTM servlet) uses the information created at deployment time and stored in the Deployment c 2004 John Wiley & Sons, Ltd. Copyright

J. Softw. Maint. Evol.: Res. Pract. 2004; 16:51–70

MIGRATING TO WEB SERVICES

%NKGPV UKFG

Unitest

5GTXGT UKFG

Service

Client Application describes

Service Proxy

XMLParser

53

invokes

Deployment Descriptor

Provider

XMLParser

deploys

SOAP Client

RPCRouter(java servlet)

WSDL

Figure 1. Generated classes and their relationship with the SOAP runtime and legacy service.

100.0

(b)

Figure 2. An example of WSDL and SOAP request message. (a) WSDL for method find of class EJBItemSession; (b) SOAP request message for method find(100.0).



54

M. LITOIU

%NKGPV #RR

5GTXKEG

5GTXKEG 2TQZ[ 51#2%NKGPV

42% 4QWVGT

XML Parser

*662 ,8/

2TQXKFGT

XML Parser

9GD UGTXGT #RR 5GTXGT 5GTXNGV 'PIKPG 9GD5RJGTG 6QOECV GVE Network

Figure 3. An end-to-end computing scenario with Web services.

Descriptor file. There are different Providers for different service implementations, for example there are Providers for Java classes and for Enterprise Java Beans (EJBs). The Web service implementation has the stereotype legacy. This stereotype suggests that the service has been part of a previously deployed application, monolithic or distributed. Examples of legacy services include Java classes, EJBs or even transactions. There are already commercial tools that inspect the legacy service and generate the needed wrappers. For example, WebSphere Studio inspects a legacy service and generates the classes with the stereotype generated. Unitest and Client Application classes are created for the convenience of the user, the rest of the generated classes facilitate Web service discovery and invocation. WSDL is an XML representation of the legacy service interface and is generated by inspecting it, extracting the interface and data types of the method signatures, and then mapping them to WSDLspecific constructs. An example of WSDL construct for a method ‘String find(long)’ of an EJB class named EJBItemSession is shown in Figure 2(a). The EJB class becomes a portyType XML element, and the method find maps to the element operation. The return String value and the long argument of method find() become output and input messages in the WSDL operation. WSDL is then used for Service Proxy generation and for creating the SOAP messages. The Service Proxy is a Java class that provides the client with a Java interface that resembles the interface of the legacy service, maps Java method invocation to SOAP messages, and vice versa. An example of SOAP message content is shown in Figure 2(b). The message is a request that goes from the client to the server in the form of an XML document. A message contains an Envelope which contains a Body element. The Body is the container for the invoked method. The identity of the remote Web service is denoted by the attribute xmlns and the method’s argument is contained in the XML element id. A SOAP response message is similar to that in Figure 2(b) and it carries back the return value of the method find. Figure 3 shows a typical runtime architecture for a client/server computing model that uses Web services, and a path of a Web service invocation. As shown in the figure, Web services infrastructure consists of several layers added on top of the existing J2EE infrastructure (the bottom two layers are basic J2EE and Web layers). Given the layering, there are at least three sources of delay that affect Web services. c 2004 John Wiley & Sons, Ltd. Copyright



55

• Transport protocol. The use of HTTP or SMTP as main transport protocols by SOAP implementation has many advantages, but it can also introduce unacceptable delays. The role of the transport protocol is to deliver the SOAP messages across the Web. • XML parser. The data exchanged by SOAP are in XML format. The parsing of an XML tree to get the envelope, the body, and the header of the SOAP message is done in several passes and it is likely to be time-consuming. Total parsing versus partial parsing, and validating versus non-validating are decisions that profoundly affect the latency and scalability. • SOAP runtime. This includes the creation/extraction of the SOAP envelope by the SOAP client or RPC Router, the conversion of the data types from language-specific formats to XML, and vice versa, the location and invocation of the service by the Provider class. Although extra delays and a degradation of performance are expected given the extra overhead, the question is whether they are prohibitive for some of the computing models, and, if they are unacceptable, are there any ways of improving the performance? In the next section we look at a concrete example, an EJB application exposed as a Web service.

3. A REENGINEERING EXAMPLE Figure 4(a) shows the UML class diagram of an EJB application that implements an Internet Auction and supports three scenarios: createBid() allows the user to publicize an item for sale and a starting price; find() queries for specific items; and makeBid() submits bids for an item. For the sake of simplicity, we choose to omit the presentation object (in J2EE-compliant architectures, the presentation is done through Java Server Pages). As shown in Figure 4(a), ClientEJB interacts with the data object through the session (EJBItemSession) and entity (EJBItem) beans. We reengineered an EJB-based application by creating a Web service from the EJBItemSession bean using the code generators of WebSphere Studio. The reengineered application Figure 4(b) replaces the stub and skeleton objects with a proxygenerated object according to Figure 1 and an RPCRouter object that interact with each other through the SOAP protocol. (For simplicity, we do not show the rest of the generated classes.) RPCRouter directs SOAP requests to the corresponding service, in this case to the EJBItemSession, through a Java invocation. RPCRouter comes with the standard Apache SOAP run time (see also Figure 1). The reengineered application is deployed on the same hardware that hosted the initial Auction application (Figure 5). Client and proxy reside on the Client Host, and the rest of the objects reside on the Application Host. The Client Host has an Intel 450 MHz processor, 600 MB of memory, and runs Windows NT and JDK 1.3. The application host has an Intel 1.2 GHz processor, 1 MB of memory, runs Windows 2000, and IBM WebSphere 4.1 using Apache SOAP 2 and XERCES 1.3 Sections 3.1–3.3 look at two performance metrics, latency and scalability, of both the original and the reengineered applications, and explain the reasons for the degradation in performance when adding the Web service infrastructure. 3.1. Scalability and latency To measure the latency and the scalability, we instrumented the find() method of the EJB and Web service clients to measure the elapsed time between the start and the end of the method. c 2004 John Wiley & Sons, Ltd. Copyright


56

M. LITOIU

Figure 4. Auction class diagrams. (a) Initial application; (b) reengineered application: SOAP+EJB.

%NKGPV *QUV %NKGPV

2TQZ[

#RRNKECVKQP *QUV

51#2 *662

',$+VGO5GU ',$+VGO

42%4QWVGT

#RR 5GTXGT

9GD 5GTXGT

&CVC

&CVC 5GTXGT

'VJGTPGV /

2000 1500

Ap a c h e S o a p C lie n t O p tim ize d C lie n t E JB

1000 500

0 20

0 15

0 10

50

0

1

R esponse Tim e[m s]

Figure 5. Auction application deployed.

97/67 N um be r of C lie nts Figure 6. Measured scalability.




57

Table I. Latency. System EJB ApacheSoapClient OptimizedClient

Latency (ms) 8 250 20

To measure the latency, we run just one client over a period of 10 min and then we took the minimum response time across all those runs. The length of the message sent over the wire was 48 bytes. To measure the scalability in terms of average response time, the client applications repeatedly invoked the server, with a think time between invocations of 3 s. The number of client objects varied from 1 to 300 by spawning a corresponding number of threads on the Client Host machine. As in the case of latency, the length of message sent over the wire was 48 bytes. The results of our experiments are presented in Table I and Figure 6. For Web service implementation we used two different clients, ApacheSOAPClient and OptimizedClient. ApacheSOAPClient uses the Service Proxy and the SOAP Client classes, just like in Figure 1. OptimizedClient, however, uses Java’s HttpURLConnection class, with the ‘keep-alive’ connection parameter set to true. Therefore, OptimizedClient shortcuts the Service Proxy and SOAP Client classes. The EJB column in Table I and EJB curve in Figure 6 denote the performance (latency and scalability, respectively) of the original application for the same method, find(). We note the following results. • As expected, the reengineered application is slower than the original one. Both the latency and the average response times are higher. What is unexpected is the rate at which it underperforms. The server saturates with fewer than 100 clients, while the original application still offers response times of less than 0.5 s for the same number of clients. • ApacheSOAPClient has higher latency and worse scalability than OptimizedClient. Section 3.2 explains the sources of performance degradation. 3.2. Sources of delay The difference in scalability and latency between the original EJB application and the reengineered one comes from several sources. • The high latency (measured with one client) for the ApacheSoapClient case comes from inappropriate use of underlying HTTP and TCP/IP protocols (Figure 7) and is explained by the occurrence of the following events. — Due to Nagle Algorithm [11], TCP waits for the sender process, which is the Web service client, to fill up an IP packet before sending the data. The motivation for introducing this delay is to improve the efficiency of the network by avoiding sending short messages. Note that this latency is not inherent to Web services, but rather to TCP/IP. However, it is c 2004 John Wiley & Sons, Ltd. Copyright


58

M. LITOIU

Client

Server

HTTP POST request Nagle alg. prevents immediate send

TCP delayed ack. prevents server to acknowledge. HTTP body (SOAP Envelope) HTTP POST response HTTP body SOAP envelope

Figure 7. Nagle and TCP delayed ACK.

triggered by the way the SOAP Client runtime uses the protocol. Figure 7 shows how the HTTP POST request and the HTTP body are sent after a delay. — TCP also implements the delayed packet acknowledgement. To give the sender time to collect more data as prescribed by the Nagle algorithm, the receiver does not send an acknowledgement right away. An illustration of this is shown in Figure 7, where HTTP Post response is sent back after a long delay. — Apache SOAP client runtime writes twice to an HTTP connection, first the header and then the payload, triggering the Nagle and TCP delayed ACK algorithms, for any length of the payload. • The latency of the SOAP protocol can be decreased substantially (at least for small payloads) by sending both the header and the payload in the same packet. Table I shows that the latency of the OptimizedClient, which bypasses the Nagle algorithm, is close to that of the EJB. • With SOAP, subsequent requests from the same client will open new HTTP connections, introducing unnecessary overhead. There is no provision to ‘keep-alive’ a connection. With EJB, the TCP connection is kept alive for subsequent calls. OptimizedClient has a better scalability because it keeps the HTTP connections alive. 3.3. XML parser latency and memory usage An important source of performance degradation added on the top of the existing infrastructure is the XML parser. To illustrate this, we conducted an experiment on an Intel 1.2 GHz, Windows 2000 machine, with 1 GB of memory. We use JDK 1.3 and Apache’s Xerces 1.3. The file to parse contains ‘person’ XML elements, in an XML tree with a depth of three. The total size of a ‘person’ element is 400 bytes. c 2004 John Wiley & Sons, Ltd. Copyright



59

Figure 8. XML parser scalability.

Figure 9. XML parser memory usage.

We measured the performance of two different parsers, DOM and SAX parsers [10], with and without validation. The main difference between the two parsers is as follows: DOM creates and keeps in memory a tree (named a DOM tree) while the SAX parser just fires events to the application. SAX does not build any tree. When using validation, the Data Type Definition (DTD)—the grammar of the document—is embedded in the XML file and the validation flag is turned on. When measuring the latency with no validation, the XML files have no DTD. Files with 1, 10, 100, 1000, and 10 000 ‘person’ element records were used. The results of the experiment are presented in Figures 8 and 9 and the conclusions include the following. • DOM parsing takes longer and consumes much more memory than SAX parsing. • SAX memory consumption is practically constant with the size of the XML document. c 2004 John Wiley & Sons, Ltd. Copyright


60

M. LITOIU

• DOM latency and memory consumption increases with the size of the document. • Validation doubles the latency of parsing. The conclusion that we draw from the above experiments is that the XML parsers can potentially create big performance problems. The performance penalty paid by the reengineered application may be prohibitive in some cases. The next section looks at how to address this problem by using performance models.

4. PERFORMANCE MODELS FOR WEB SERVICES Comprehensive performance studies, ‘what if’ exercises and capacity planning, all those in the presence of multiple users, are better performed when an analytic performance model is used. A performance model is built by taking into account the software component interactions when there is only one user in the system. Once built, the performance model allows estimating the performance metrics of the software system in the presence of an arbitrary number of users. The impact of structural changes in the software, alternative deployment scenarios, and runtime configurations can be analyzed fast and at a low cost by using the performance model instead of the real system. An appropriate performance model for Web services is Layered Queuing Model (LQM). LQMs are hierarchical queuing network models (QNMs) [12] that reflect interactions between client and server processes. The processes expose the interfaces of the contained objects (here objects are object-oriented programming languages artifacts and for simplicity of the presentation, we consider Web services as being objects). The processes may share devices such as CPU, Disk, Network, and server processes may also interact with other server processes. This layering leads to the name of the LQM and makes the model appropriate for describing distributed application systems built using technologies such as Web services. In these applications, a process can suffer queuing delays, both at its devices and at its software servers. If these software delays are ignored, response time estimates will be incorrect. 4.1. Model parameters and solvers At the core of the LQM there are several equations whose goal is mainly to estimate the waiting times (or queuing delays) in the system due to the contention of multiple user requests. We will introduce these equations in the remainder of this section. Consider a set of user scenarios that are expected to affect performance the most. Each request of a scenario makes use of a collection of objects with methods that interact with one another. The response time of an object in a scenario request includes its direct demands and queuing delays at its node’s devices, and its nested demands and queuing delays for access to the methods of the other objects that it invokes synchronously. Going back to our example (see Figures 4 and 5), consider EJBItem object in scenario find() in the presence of multiple users in the system. Its response time is the sum of the following factors: (a) queuing delays at the CPU and the Disk of the Application Host (a request has to wait for other requests to end in order to be processed); (b) the CPU and the Disk demand at the same host (the time required to process the request); (c) the time it waits to get a connection at the Data Server; and (d) the response time of Data object. c 2004 John Wiley & Sons, Ltd. Copyright



61

We can generalize the above description for every object and scenario. We will assume that there are no queuing delays at software entities (these queuing delays are caused by a limited number of software resources such as threads, network or database connections. We claim that we can size the application such that there are enough of those resources around). Thus, the average response time Roc of an object o for a request of scenario c can be expressed as: Roc =

Ko i=1

c c (Wio + Dio )+

pεOo

c Vo,p Rpc ; c = 1, . . . , C

(1)

where C is the number of scenarios; Ko is the number of devices at object o node; devices include c is the mean queuing delay of object o at device i for a scenario c request; CPU, Disk and network; Wio c Dio is the mean object o demand at device i for a scenario c request; these quantities are measured for every scenario, with one user accessing the system; Oo is the set of objects visited synchronously c is the number of visits from object o to object by object o (the bold face notation denotes a set); Vo,p p by a scenario c request; ‘Visit’ is a general term for method invocation, procedure call and other synchronous object interaction; Rpc is the mean response time of object p when servicing a scenario c request. Equation (1) contains the external description of the model and some parameters have to be provided by the performance engineer or software designer. More precisely: c c are input parameters of the system’s performance model. K , O , and • Ko , Dio , Oo , and Vo,p o o c c Vo,p can be determined from deployment, class, and collaboration diagrams, respectively; Dio can be measured by instrumenting the code and by using performance profilers or monitors; c • Wio , Rpc , and Roc do not have to be provided by the performance engineer or software designer; they are outputs of a performance model solver as explained in the remainder of this section.

A second set of equations describes the throughput, Xc , of each scenario c. The throughput is defined as the number of requests served per time unit and is the ratio between the number of users Nc in scenario c, and the end-to-end response time. The latter is the sum of the think time Zc (between successive requests, users think) and the response time of the system as perceived by the user of scenario c. The response time perceived by the user is the response time at the client side of the application, that is the response time of the client object, Rcclient . Xc =

Nc + Zc

(2)

Rcclient

Equation (2) helps to find the number of user requests queued at any device i in scenario c with N users in the system, denoted as nci (N), and then the total number of users at the device i: c c + Wio ) nci (N) = Xc (Dio

and ni (N) =

C c=1

nci (N)

(3)

c , Oo , and Vo,c ), Equations (1)–(3) can be solved given the model inputs mentioned above (Ko , Dio per scenario think times Zc and population vector N = (N1 , N2 . . . Nc ), where Nc is the number of users in scenario c. Solvers for the above equations start with one user in the system (for example, a vector N0 = (1, 0 . . . 0)), compute (1), (2) and (3), increment the number of users, and repeat the



62

M. LITOIU

computations until the desired number of users is reached. Solvers for (1)–(3) can be found in many books, a recent one is [13]. While finding an exact solution for Equations (1)–(3) has a combinatorial complexity, there are approximation techniques that find the output parameters mentioned above with an acceptable complexity. Such a method is mean value analysis (MVA) [14]. Our assumption in writing and solving (1)–(3) was that there were no queuing delays at software entities. To make good of that assumption, we have to make sure we have enough software resources (threads, network and database connections). This capacity planning exercise is done by using the utilization law [12]. This law applied to objects captures the fraction of time the object is busy. For example, utilization of object o, Uo , is defined as Uo =

C c=1

Uoc =

C c=1

Xc Roc

(4)

where Uoc is the utilization of object o in scenario c and is the product of the throughput of scenario c and the response time of object o in the same scenario. Looking at the expression (4), we notice that utilization has the unit of the number of users, which allows us to interpret the utilization as the average number of users served simultaneously by an object. An object can serve multiple users at the same time by having multiple replicas or by allowing multithreading. To summarize this section: • Web services can be described as layers queuing models (LQMs); • LQMs are solved by solvers such as MVA under the assumption that there are no queuing delays at software entities; • after the LQM is solved, software entities are sized using Equation (4). This sizing makes sure there are enough threads, database or network connections in the system and a request does not experience any queuing delays at software entities. The approach taken by us to model the Web services is different from that used in modeling general distributed applications. In contrast to our approach, Method of Layers [5] is more complex, assumes that the LQM models all the software entities queuing centers, and requires the multiplicity levels (such as threading or the number of replicas) to be part of the LQM. In terms of solver structure, Method of Layers is more complex. It decomposes an LQM into a series of QNMs and exploits the Linearizer approximation technique [15] for MVA of the QNMs. Performance estimates for each of the QNMs is found and used to affect the input parameters of the other QNMs. To deal with software systems, Linearizer has been adapted to support residence time expressions appropriate for interactions in distributed application systems. The real challenge with performance models is the experimental collection of the input data for Equations (1)–(3). This is the subject of the next section. 4.2. Building the performance model Performance models can be built using structural information from design documents and quantitative data from performance monitors that come with the application servers. For example, WebSphere exposes a Performance Monitoring Interface (PMI) for programmatically querying the elapsed time for different objects and methods on the server. It also has a performance monitor, the Resource Analyzer (RA) [16]. c 2004 John Wiley & Sons, Ltd. Copyright


Client(1) RPCRouter(2) EJBItem Session(3) EJBItem(4) Data(5)

c 2004 John Wiley & Sons, Ltd. Copyright 10 0 0 0 0

Cpu cl(1) 0 8 10 3 1

CPU(2) 0 2 2 10 3

Disk(3)

Scenario 1, createBid()

0 10 0 0 0

Net(4) 10 0 0 0 0

Cpu cl(1) 0 15 35 3 1

CPU(2) 0 1 2 0.5 0.5

Disk(3)

Scenario 2, find()

0 0 0 0 0

Net(4)

c , c = 1, 2, 3; i = 1 . . . 4, o = 1 . . . 5. Table II. Demands (ms), Dio

10 0 0 0 0

Cpu cl(1)

0 9 15 2 2

CPU(2)

0 1 3 2 2

Disk(3)

Scenario 3, makeBid()

0 0 0 0 0

Net(4)


63


64

M. LITOIU

/GVJQF ECNN Client

Net RPCRouter EJBItem work Session

EJBItem

Data

'NCRUGF VKOG

6 + / '

CPU demand

Disk demand

Network demand

Figure 10. Measuring the elapsed time for the scenario find().

Application-level monitors such as RA only measure the elapsed time, as illustrated in Figure 10 in a UML sequence chart. To be accurate, a performance model has to account for CPU, disk, and network demands for each object and each scenario as illustrated in Equations (1)–(3) above. To split the elapsed time in resource demands, the performance engineer has to complement RA with operating system monitors and an in-depth knowledge of the system. In our case, we used RA in conjunction with the Windows performance monitor and instrumented some of the code. The collected data for our sample application are presented in Table II. The table contains all the quantities needed in Equation (1) of section 4.1. The first column contains the objects of set O, indexed from 1 to 5. The first row identifies the three user scenarios, indexed from 1 to 3, and the second row shows the devices in the system indexed from 1 to 4, identified in the deployment diagram in Figure 5. Cpu cl identifies the Client Hosts’s CPU, Net denotes the Network between the Client Host and Application Host, while the CPU and Disk belong to the Application Server. The cells of the table c , that is the demand of object o, at device i, in scenario c. contain the quantities Dio There are two more inputs in Equation (1) that need be identified, the objects visited synchronously c . This information is taken from the by each object o, that is Oo , and then the number of visits, Vop class diagram (Figure 4), code and sequence diagrams like that in Figure 10. Data collected from those sources are summarized in Table III, where each cell contains the number of visits from the object in the first column to the object on the first row. For example, we can see that the object EJBItemSession visits (invokes) once the EJBItem for each scenario request and does not invoke any other object. The number of visits is the same for all three scenarios. One more input is needed, the think time for every scenario. This is the average time between two successive scenario requests from the same user. We consider the think times to be 3000 ms for all scenarios. c 2004 John Wiley & Sons, Ltd. Copyright



65

c , for one scenario request. Table III. Visits, Vop

Client RPCRouter EJBItemSession EJBItem Data

Client

RPCRouter

EJBItemSession

EJBItem

Data

0 0 0 0 0

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

Response time

Scalability of Auction Application

2500 2000 1500

ApacheSoapClient Model

1000 500 0 1

25

50

75

100

Number of users

Figure 11. Predicted versus measured scalability for scenario find().

4.3. Validating the performance model A performance model has to be validated against real data and this section shows several steps in that direction. We validate the model by comparing the scalability and the threading level predicted by the model with those monitored on the real application and for the deployment architecture at hand. For simplicity of the presentation we only focus on the find() scenario. Figures 11 and 12 show the metrics estimated with the performance model versus the measured ones. In Figure 11, the estimated average response time, for three population vectors, approximates closely the monitored average response time. The estimated and the measured response time were determined by considering users repeatedly invoking the find() scenarios with the same think time, 3 s. The number of users using scenario find() were varied from 1 to 100. An extra step in the validation stage is to look at how well the performance model estimates the soft metrics such as the threading level for the Web server. Figure 12 shows the monitored number of threads and the estimated average number of threads when there are 100 users that use the system repeatedly with a think time of 3 s between two consecutive requests. There are two kinds of clients, as described in Section 3.1, one that keeps the HTTP connection alive and one that does not. When the connection is kept alive, the number of Web server threads is exactly the number of clients, 100 (a thread for each client). When the connection is closed after each request, the number of threads varies in time (since some clients are in the think time phase) but with an average of close to 40. c 2004 John Wiley & Sons, Ltd. Copyright


66

M. LITOIU

Monitored number of threads with HTTPConnection: Keep-Alive

Estimated average number of threads

Monitored number of threads with HTTPConnection:close

Figure 12. Predicted versus monitored average number of threads for RPCRouter for 100 clients in scenario 2.

%NKGPV 2TQZ[

#RRNKECVKQP *QUV

9GD *QUV

%NKGPV *QUV *662

42%4QWVGT

51#2

9GD %QPVCKPGT 'VJGTPGV /

4/+ ++12

',$5GUUKQP ',$+VGO

#RR 5GTXGT

&CVC

&CVC 5GTXGT

'VJGTPGV /

Figure 13. Three-host deployment architecture.

To estimate the average number of threads for the Web server we compute the utilization of the RPCRouter since this is the servlet that consumes the Web server threads. The utilization of this object is estimated at 42, close enough to the monitored number of threads when the connection is closed after each request. In general, a performance model cannot be correctly built in one step. There are always many iterations between the validation step and building the performance model step illustrated in the previous section. With the model validated, we can use it for redesigning purposes. This is the subject of the next section.

5. ALTERNATIVE CONFIGURATIONS There are performance engineering principles that address the improvement of performance by simply tuning the code, too many to enumerate [17]. In this section we assume that all software principles have c 2004 John Wiley & Sons, Ltd. Copyright


Response Time [ms]


1400 1200 1000 800 600 400 200 0

67

WS-2 Hosts EJB-2 Hosts WS-3 Hosts WS-Faster Host

1

25

50

75

Number of 97/67 Clients Figure 14. Scalability of four architectures for scenario 2, find().

been exhausted, and an upgrade in hardware is required. We look at ways of upgrading the hardware, and estimate the performance of such an upgrade by using the performance models built in previous sections. The ways to improve the performance by hardware upgrading are as follows: (a) to distribute the workload across more computers or (b) to upgrade to more powerful hardware. Figure 13 shows a possible use of the first option: the Web container is hosted by the Web host and the Application server and Data server are deployed on a different host with the same hardware characteristics as the Web host. The second option is to upgrade the Web host from Figure 5 to a 2 GHz PentiumTM machine. To analyze the performance of those new deployment topologies, we have to readjust the performance model developed in the previous section. For option (a), the demands are those shown in Table II, but there are two more devices (CPU and Disk) according to Figure 13. Option (b) uses the original performance model but the demands become lower, with a factor 1.2/2, that is the ratios of the original and new CPU frequencies. By solving the models, under the condition that only users in scenario find() are present, we obtained the results in Figure 14. ‘WS-2 Hosts’ and ‘EJB-2 Hosts’ are the reengineered application and the original EJB application deployed on the two host architectures shown in Figure 5. ‘WS-3 Hosts’ denotes option (a) deployed as in Figure 13 and ‘WS-Faster Host’ denotes option (b) defined above. Option (a) increases the latency and improves the scalability by a factor of two. Still, the response time is higher than that of the original EJB application. Option (b), with a faster processor, improves both the latency and the scalability; both are at the level of the original EJB application. More comprehensive performance studies can be carried on in order to help the decision maker choose the appropriate deployment topology. These studies include the response times for all scenarios under different population vectors N and the total load on the server hardware. For example, Figure 15 shows the scalability of all three scenarios when the total number of users varies from 3 to 150 and users in scenarios 1, 2 and 3 represent 30%, 60% and 30% from the total. The results are obtained on the deployment topology WS-FasterHost. On the same topology, we can predict the total utilizations of the hardware devices, as shown in Figure 16. c 2004 John Wiley & Sons, Ltd. Copyright


68

M. LITOIU

1200 1000

Rc [ms]

800 600 400 200 0 (1,1,1)

(5,15,5)

(10,30,10) (15,45,15) (20,60,20) (25,75,25) (30,90,30)

Scenario 1

Scenario 2

Scenario 3

Figure 15. Scalability of all scenarios for different population vectors on WS-Faster Host topology.

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 (1,1,1)

(5,15,5)

(10,30,10)

(15,45,15)

CPU Utilization

(20,60,20)

(25,75,25)

(30,90,30)

Disk Utilization

Figure 16. CPU and disk utilization on the Application Host of WS-Faster Host topology.




69

In a usual decision-making process, the performance results have to be matched against performance requirements and considered together with other software or system quality attributes [18].

6. SUMMARY AND CONCLUSIONS We presented a reengineering scenario that migrated a legacy application to a new middleware technology and detailed its performance implications. As a case study, we considered an Auction application implemented with EJBs and being migrated to Web services. There are several steps that can be followed when reengineering an application along the lines presented in this paper. The first step consists of a thorough investigation of the performance penalties introduced by a new and untested technology. In the realm of Web application, latency and scalability are the major metrics that affect the user perceived response time. Latency is the time elapsed at the client side from when a remote service method is invoked until a response is returned. Scalability is the capacity of the application to accommodate an increasing number of users without a degradation of the performance metrics. In the case of Web services, the main goals of which are to improve the interoperability and programmability over the Web, we showed that, due to layering and HTTP protocol, latency is in the order of hundreds of milliseconds. Still, that is not an inhibitor in the client server applications where the users alternate requests with think times. However, by migrating to Web services, the scalability of an EJB-based application is significantly decreased. The reason is again the HTTP protocol but also some unoptimized code, inherent in the first releases of a technology. A second step of the migration consists of building a performance model for the new distributed application that includes the new technology. We showed the main steps in building a layered queuing model for applications that involve Web services: identify the structure of the model, that is identify the main usage scenarios, the objects and their interaction for each scenario; collect the quantitative data by instrumenting the code or monitoring the application; and validate the model by solving it and comparing the estimated results with those measured on the real application. A third step in the migration process uses the performance model to investigate new architectural configurations or deployment topologies. We showed how the model can be used to evaluate two new possible deployment topologies and how to gain insights into the performance of a new application and get more performance metrics that can help the decision maker to make an informed decision about the migration plan.

TRADEMARKS IBM, and WebSphere are registered trademarks of International Business Machines Corporation in the United States, other countries, or both. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc., in the United States, other countries, or both. Microsoft, Windows, and Windows NT are registered trademarks of Microsoft Corporation in the United States, other countries, or both. Intel and Pentium are trademarks of Intel Corporation in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. c 2004 John Wiley & Sons, Ltd. Copyright


70

M. LITOIU

REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.

Brunner R, Weber J. Java Web Services. Prentice-Hall: Englewood Cliffs NJ, 2002. Chiu K, Govindaraju M, Bramley R. SOAP for high performance computing. Cluster Computing and the Grid, 2002. Davis D, Parashar M. Latency performance of SOAP implementations. Cluster Computing and the Grid, 2002. Walsh A. J2EE Essentials. Wiley: New York, 2003. Rolia J, Sevcik K. The method of layers. IEEE Transactions on Software Engineering 1995; 21(8):689–700. Litoiu M, Rolia J, Serazzi G. Designing process replication and activation, a quantitative approach. IEEE Transactions on Software Engineering 2000; 26(12):1168–1178. Litoiu M, Rolia J. Object allocation for distributed applications with complex workloads, Computer Performance Evaluation. Modeling Techniques and Tools, Haverkort BR, Bohnenkamp HC, Smith CU (eds). (Lecture Notes in Computer Science, vol. 1786). Springer: Berlin, 2000; 25–39. Francis T, Herness E, Knutson J, Rochat K, Vignola C. Professional IBM WebSphere 5.0: Application Server. Wox, 2003. Booch G, Rumbaugh J, Jacobson I. The Unified Modeling Language—User Guide. Addison-Wesley: Reading MA, 1999. Harold E, Means S. XML in a Nutshell. O’Reilly: Cambridge, 2001. IETF RFC 896. http://www.ietf.org/rfc/rfc0896.txt?number=896. Lazowska ED, Zahorjan J, Graham J, Sevcik K. Quantitative System Performance, Computer Systems Analysis Using Queuing Network Models. Prentice-Hall: Englewood Cliffs NJ, 1984. Menasce D, Alemida V. Capacity Planning for Web Performance. Prentice-Hall: Englewood Cliffs NJ, 1998. Reiser M, Lavenberg SS. Mean value analysis of closed multichain queuing networks. Journal of ACM 1980; 27(2):313– 322. De Souza e Silva E, Muntz RR. A Note on the computational cost of the linearizer algorithm for queuing networks. IEEE Transactions on Computers 1990; 39(6):840–842. Joines S, Willenborg R, Hygh K. Performance Analysis for Java Web Sites. Addison-Wesley: Reading MA, 2003. Smith C, Williams L. Performance Solutions. Addison-Wesley: Reading MA, 2001. Kazman R, Asundi J, Klein M. Quantifying the cost and benefits of architectural decisions. Proceedings of ICSE2001, 2001; 297–306.

AUTHOR’S BIOGRAPHY

Marin Litoiu is member of the Centre for Advanced Studies at the IBM Toronto Laboratory where he initiates and manages joint research projects between IBM and universities across the globe in the area of Application Development Tools. Prior to joining IBM (1997), Marin was a faculty member with the Department of Computers and Control Systems at the University Politechnica of Bucharest and held research visiting positions with Polytechnic of Turin, Italy, and Polytechnic University of Catalunia, Barcelona. His other research interests include distributed objects; high-performance software design; performance modeling, performance evaluation and capacity planning for distributed and real time systems. He holds a PhD from Carleton University, Canada, and a doctorate degree in Control Systems from Politechnica University of Bucharest.