A New Synthetic Web Server Trace Generation Methodology - CiteSeerX

1 downloads 88 Views 7MB Size Report
... web server bench- marking is perhaps best illustrated by the myriad of work- ... ISP web-hosting application and includes requests for static and dynamically ...
A New Synthetic Web Server Trace Generation Methodology Steven Weber Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 78712 512.232.7923 [email protected]

Abstract We propose a new synthetic trace generation methodology for web server performance benchmarking. We discuss the two primary existing approaches to synthetic trace generation in terms of queueing models. We propose a new methodology as the natural combination of these two approaches. This hybrid approach permits natural modeling of client content and server session caching as well as a natural correspondence between the workload model parameters and the statistics of the resulting synthetic trace.

1.

Introduction

The importance and difficulty of accurate web server benchmarking is perhaps best illustrated by the myriad of workload generators, benchmarks, and proposals that have been developed in the recent past [1, 2, 3, 4, 5, 6, 7, 8]. Different benchmarks tend to emphasize different aspects of server performance and are tailored to emulating different environments such as ISP hosting, e-commerce and database, etc. The most popular benchmark, SPECweb99 [1], is based on an ISP web-hosting application and includes requests for static and dynamically generated files. TPC-W [2] is a benchmark loosely based on an e-commerce type of web application that evaluates the performance of application databases more than the web server itself by emulating typical e-commerce transactions. WebSTONE [3] essentially provides a framework for workload generation instead of a fixed workload. SURGE [4] provides a methodology for trace generation where the load on the server is generated as a superposition of requests from multiple clients, while clients alternate between periods of activity and quiescence. WAGON [6] is quite similar to SURGE, but models the user clicks for pages rather than file requests. The recent proposal GEIST [5] simulates the aggregate traffic load seen by the server directly, i.e., as opposed to emulating client activity and superimposing individual client loads to form the aggregate load. The discussion in [5] has a very insightful discussion regarding classifying the various benchmarking and load generation efforts currently in use as either being an attempt at “user emulation” or “aggregate traffic generation.” The former

Rema Hariharan Sun Microsystems, Inc. 5300 B, Riata Park Room 1618 Austin, TX 78731 512.401-1064 [email protected]

approach attempts to define client behavior, generate client requests, then superimpose the requests from a large number of users to simulate the aggregate load. The latter approach dispenses with user emulation and instead attempts to simulate the aggregate server workload directly. We use this dichotomy in this paper to explore the fundamental advantages and disadvantages of each approach. Our work in this area led us to the insight that these two approaches could be combined, into what we call the hybrid approach to synthetic trace generation. We realized that each of these three approaches could be modeled using a a very simple queueing model [9]. The user emulation approach can be thought of as a closed system of two queues feeding into one another. These two queues represent client states of activity and quiescence. The clients circling through this queueing system are independent of one another and their superposition yields the aggregate load. The aggregate traffic generation approach, embodied in GEIST, is easily represented as a single queue into which clients arrive, offer requests to the server, and then depart. Indeed, GEIST is described in queueing terminology and it was this discussion which first gave us the inspiration to look at the user emulation approach from a queueing perspective. Finally, the hybrid approach has a corresponding queueing model which is the natural combination of the two other queueing systems. We decided to investigate these three approaches to workload generation from the perspective of synthetic trace generation. That is, we obtained some actual web server logs and set the goal of accurately synthesizing those log files using the three queueing models mentioned above. We will be evaluating these queueing models primarily from the standpoint of how well they fare in achieving a satisfactory synthesis. We analyzed the web server logs from a banking institution over a course of 16 days in late August, 2001. The website logged 13.5 million requests over this period. The website offers customers standard on-line banking options such as checking a balance, paying a bill, transferring funds, etc. The format of the log files is shown in Figure 1. As is usual for such logs, if the client sending requests to the web server is behind a proxy then it will be the proxy IP address that is recorded in the log. These proxy IP’s are very prevalent in our logs, which

IP Address

Date & Time

Request Type

URL

HTTP Version

Response Code

Content Length

Figure 1: Format of log files from banking website. we verified by performing a DNS lookup on some of the more active addresses. Proxy IP’s obscure our ability to make statistical characterizations of client browsing because the requests from multiple users behind a common proxy are listed in the logs under the proxy IP address instead of the client IP address. The date and time are recorded at the resolution of one second. Also of importance is the fact that no file size information is recorded for requests corresponding to dynamic content. Requests for dynamic content comprised roughly 30% of all the requests made in the file. These dynamic scripts correspond to the different options available to the client on the website. The remaining content was almost exclusively small image files used in the page layout.1 The rest of this paper is organized as follows. Section 2 introduces On/Off client models whereby client request activity is modeled as consisting of alternating periods of activity and quiescence. Section 3 discusses current approaches to synthetic trace generation, namely, the two traditional queueing models mentioned above. Section 4 introduces our proposed hybrid approach and its corresponding queueing model. Section 5 investigates all three models from the perspective of simulating different aspects of the banking server log files. Finally, Section 6 offers a summary and discussion of future work.

2.

On/Off Client Models

Client web browsing may be thought of as a sequence of “clicks”, where each click generates a number of requests to a web server, i.e., embedded requests. The embedded requests are for the files/objects/scripts which constitute the elements of the requested page. Each of the embedded requests is logged individually in the server log file. The times at which these embedded requests are seen in the web server log file are strongly correlated because each of the requests are generated almost simultaneously. We may therefore try to infer statistics associated with client browsing activity using web server log files by grouping together requests for a given IP address with request times close to one another. This inference is obfuscated by both network and proxy effects. Network delay and, more importantly, network jitter, has the effect that embedded requests may arrive at the server at different times, and therefore the log times associated with those requests will be spread out. Also, requests for a given IP address in the log files with closely correlated request times may actually correspond to multiple client browsing activity because there may be multiple clients simultaneously generating requests from behind a common proxy. 1 Roughly 60% of these image files were very small–less than 100 bytes. Note that the smallest workload class in SPECweb99 is the 100 bytes to 1KB class.

2.1.

Determining On and Off Times

We are interested in obtaining a statistical characterization of client web browsing through analysis of the server logs. To this end we group all requests according to the IP address for the request and associate each IP address with a “virtual client”, even though we recognize there is likely a many to one mapping between actual clients and virtual clients due to the proxy issues. We define virtual client activity to be an alternating sequence of “On” and “Off” states, where the time spent in the On state corresponds to times at which the server is handling embedded requests and the time spent in the Off state corresponds to times at which the virtual client is quiescent. This is very similar to the On/Off client model found in [4]. Because the web server log files do not record any information regarding how much time the server spent actually handling a given request, we define On times using an “activity time” parameter T . The activity time parameter T is defined as the maximum time between requests for a given virtual client such that the requests are considered to correspond to the same group of embedded requests. That is, two requests for a given IP address that are adjacent in time are considered to belong to the same On time if the time between them is less than T , and are considered to correspond to different On times if the time between them exceeds T . The formation of On and Off times for a given virtual client from the sequence of request times in the web server log files is illustrated in Figure 2. The figure illustrates how the activity parameter T is used to construct the sequence of On and Off times for a given virtual client using the sequence of request times found in the web server log files for a given IP address. Larger values of T will result in fewer On times of longer duration and smaller values of T will result in a larger number of On times of shorter duration. Individual requests isolated in time from all other requests by more than T seconds result in On times of 1 second, since request times are recorded in the web server log file at a granularity of 1 second. The pertinent issue therefore is to identify the optimal activity parameter T which does the best job of correctly grouping the requests into On and Off times such that the groupings bear the closest resemblance to the groupings of embedded requests produced by the clients clicking on various web pages on the server. Choosing T to be too small will result in embedded requests received by the server at different times due to network jitter being separated into multiple On times, while choosing T to be too large will result in embedded requests from multiple client web page requests being grouped into a single On time. Figure 6 plots the time between requests. Note that there is no clear transition point clearly demarking inter-request times as corresponding to the same embedded request vs. different embedded requests. Thus there is no clear optimal value for T . We have found that T = 5 seconds obtains a reasonable compromise between these two extremes for our particular web server logs. This is intuitively a reasonable number since

Client Requests t

Client On and Off Times T=3

t

T=2

t

Figure 2: Constructing the sequence of On and Off times for a given virtual client using the sequence of requests found in the web server logs for a given IP address. The top figure shows the request times in the logs, the bottom two figures show the constructed On and Off times for two different activity parameters T . it is unlikely that network jitter will cause embedded requests to arrive at the server separated by more than 5 seconds and it is unlikely that individual clients are clicking on web pages at a rate faster than once every five seconds. The distribution of inter-request times within an On times for T = 5 is shown in Figure 7. The fact that the probability of an inter-request time within an On time exactly equaling 5 seconds is small (around 6%), we feel somewhat justifies our choice of T = 5.

2.2.

Distributions of On and Off Times

We used the activity parameter T = 5 seconds and parsed the web server log files to obtain the distributions of the durations of the On and Off times. These distributions are shown in Figures 8 and 9. The figures show the complementary cumulative distribution function of the duration of the On and Off times respectively plotted on a log-log scale. The On time distribution has a mean of 2.58 seconds and a variance of 7.91, while the Off time distribution has a mean of 191 seconds and a variance of 1,087,659. The large variance for the distribution of Off time durations comes from the fact that there is enormous variability in the time between web page requests generated by clients. In the context of an online banking website such as ours, clients may generate several web page requests during a single browsing session, then not come back to the web site for days, weeks, or even months.

2.3.

Simulating Request Generation within an On Time

Our purpose in doing this analysis is to develop a characterization of client request generation which may be used to construct synthetic traces statistically equivalent to a given web server log file. In this section we outline our approach

to simulating a group of embedded requests using the abstraction of On and Off times. The basic idea is to generate an On time duration for a given client drawn from the distribution of On times found in the log file, and then for each second comprising that On time to generate some number of individual requests. To this end we investigated the distribution of the number of requests made during a single time slot (i.e., a second) conditioned on that time slot being an On time. The probability mass function for the number of requests per second is shown in Figure 10. The figure illustrates that 51% of the time there is no request generated by the active client, and 41% of the time there is only one request generated. Thus only 8% of the time will the logs reflect that more than one request was received during a given second by a given client. The above approach assumes the different requests being made at each time slot comprising the On time are independent. This assumption is clearly invalid, but our concern in this paper is not with the actual files being requested but in simulating the times and clients associated with the requests. For this purpose the independence assumption is most likely valid since it only implies assuming that the number of requests made by a client across active time slots are independent of one another.

2.4.

Transience and Recurrence

For a consumer online banking website it is fair to assume that most of the clients will access the site periodically, although that period may have an enormous variance across different clients. For example, some clients may log in to check their account balance daily while others may only do so monthly or even less. The implication of this is that when looking at the web server log files over the course of a number of days we would expect to see some clients generating requests throughout that duration while other clients will generate requests at an isolated point in time and then not reappear in the logs. This point is brought home in Figure 11, which plots the CCDF of the number of On times per IP address over the course of a given 24 hour period. The plot shows that 74% of the clients had more than one On time, while 90% of the clients had less than 10 On times. Note that, by definition, the corresponding plot for the distribution of the number of Off times per IP address is almost identical, since the number of Off times is one less than the number of On times. The fair inference, therefore, is that constructing a synthetic trace for any significant period of time must incorporate the fact that typical client behavior involves a each client making some finite number of web page requests and then exiting the system. That is, it is invalid to model the active client population as a recurrent set of users and it is invalid to model the active client population as a transient set of users.

mu−1On

Table 1: Response codes and their relative frequency in the banking logs Response Code Banking Data 200 OK 68.10 206 Partial Content 0.12 302 Found 4.50 304 Not Modified 27.05 403 Forbidden 0.03 404 Not Found 0.19

2.5.

mu−1 Off

Importance of Transience and Recurrence

The importance of accurately modeling the number of On times generated by a given client lies in the fact that clients generating multiple web page requests within a given time interval may be using a client-side cache. That is, client requests may take the form of cache consistency checks generated by the client’s browser. If the cached object is indeed consistent then the web server need only respond with a “cache not modified” response code, i.e., a RC 304, and need not resend the file to the client. The prevalence and importance of RC 304’s is discussed in [10]. Our own analysis of our web server log files shows that around 27% of the requests received by the web server resulted in RC 304. The distribution of HTTP response codes returned by the server for our log files is shown in Table 1. Note that the SPECweb99 benchmarking specification generates a synthetic workload completely devoid of cache consistency checks, in fact SPECweb99 assumes every request results in the file being returned and the server generating a response code of “okay”, i.e., RC 200. The importance of including a representative fraction of requests resulting in RC 304 is that benchmarking web server performance without 304’s will underestimate the actual request load which the server will be able to handle in the real world. Thus benchmarking without 304’s is overly conservative. A second reason that accurate modeling of client recurrence and transience is important is that recurrent clients will reuse TCP and SSL session ID’s. That is, recurrent clients will not require the server to invoke the TCP or SSL handshaking protocol. This obviously has a major impact on web server performance.

3.

N

Current Approaches to Synthetic Trace Generation

Discussions of and approaches to synthetic web server trace generation are found in [4, 5] and others. The discussion in [5] makes the useful dichotomy between “user emulation” and “aggregate traffic generation.” The authors define user emulation as defining typical web request behavior for a client, creating request traces for each client and then superimposing the individual client request traces to form the aggregate synthetic trace which is fed to the server. The aggregate traffic generation approach, on the other hand, simply seeks to produce an

lambda mu−1 On Figure 3: Queueing model representations of current trace generation methodologies. All queues are M/GI/∞. The closed queue on the left of the figure represents the “user emulation” approach, while the open queue on the right of the figure represents “aggregate traffic generation”. aggregate trace directly.

3.1.

Closed and Open Queueing Models

We will compare and contrast these two approaches via queueing models. We point out that the implementations [4, 5] have features demanding a richer model than the simple queueing models we propose here. Nevertheless, we feel the essence of these two approaches may be contrasted via these models, shown in Figure 3. We restrict our attention to M/GI/∞ queues, although arguably both [4, 5] support a more general class of arrival processes and service disciplines. The labels µ−1 denote that the service times are random variables S with distribution FS such that E[S] = µ−1 . When a client is in an On queue the client is actively generating requests and when the client is in an Off queue the client is quiescent. The top figure shows two M/GI/∞ queues feeding into one another with the top queue having a service time distribution FSon and the bottom queue having a service time distribution FSof f . The individual clients cycling through the queues are independent of one another because of the M/GI/∞ assumption. The label N indicates that the number of clients in the system is fixed at N by virtue of the fact that the queue is closed. Thus complete specification of this queueing model entails specifying the service distributions FSon and FSof f , along with the number of clients N . The bottom figure shows a single M/GI/∞ queue being fed by a Poisson process with rate parameter λ. Complete specification of this queueing model entails specifying the service distribution FSon and the arrival rate parameter λ.

3.2.

Configuring Queueing Model Parameters

Our purpose in introducing these two queueing models is to evaluate their effectiveness in producing synthetic traces that are statistically equivalent to a given web server log file. To that end we will configure the parameters of the queueing models so that the average load generated by the queues equals that found in the log file. In particular, we will configure the queues so that the average number of clients in the On queue in each model matches the average number of clients in the On state found by parsing the web server log files using the activity parameter T = 5. To obtain the greatest possible fidelity we will actually draw the random On and Off queue service durations from the distribution of On and Off time durations found by analyzing the web server log file. For the closed queueing model, then, this leaves only the number of clients N as a free parameter. If we define λ as the rate at which clients enter the On queue, then applying Little’s Law gives −1 N = λ(µ−1 on + µof f )

(1)

as the number of clients in the closed queueing model. If we obtain λ as the rate at which new On times are started in the web server log file then we will have the expected number of clients in the On queue matching the average number of clients in the On state in the web server log file. Note that N will not equal the number of distinct IP addresses found in the web server log file. The closed queueing model assumes all clients alternate between states of activity and inactivity for as long as the system is run, i.e., it assumes all clients are recurrent. The actual log files, on the other hand, contain both transient and recurrent clients, as discussed above. For the open queueing model we need only specify the new client arrival rate λ. We obtain λ as above, i.e., the rate at which new On times are started in the log files. Note that λ is not the number of unique IP addresses found in the log file over some time interval divided by the duration of the interval. The open queueing model assumes all clients are transient, i.e., that they generate requests for one On time and then depart forever. The actual log files, as discussed earlier, demonstrate most clients will be active for more than one On time.

3.3.

Drawbacks to the Existing Queueing Models

As mentioned above, the closed queueing model assumes all clients are recurrent and the open queueing model assumes all clients are transient. We earlier presented an argument as to the importance of transience and recurrence in order to accurately model client content caching as well as caching of TCP and SSL state. Under the closed queueing model we have the unfortunate effect that eventually all clients will have cached all of the objects they are requesting, and will never need to establish a new TCP or SSL session. Under the open queueing model we have the similarly unfortunate effect that no client will ever be able to make use of a content cache, nor will any

client ever be able to make use of a pre-existing TCP or SSL session.2 Of course, it is possible to simulate the effect of client content caching by artificially setting some fraction of the client requests to be cache consistency checks. Likewise, one could simulate the effect of TCP/SSL session reuse by artificially setting some fraction of the requests to require a new TCP or SSL handshake with the server. The point is that both of these approaches are artificial and require separate investigation to identify what those appropriate fractions would be. A superior solution would be one which obtains these effects directly. A second, albeit less important, drawback is that accurately setting the queueing model parameters to achieve an average load equaling that for a given server log file requires twisting the model parameters from their natural interpretation. In particular, at first blush one might assume a natural choice for N in the closed queue is the number of distinct IP addresses found in the log file, and one might assume a natural choice for λ is the average rate at which new clients appear in the log files. A superior model would be one where each of the queueing model parameters bears a natural correspondence to the corresponding quantities found by analyzing the server log file. Both models have respective advantages and disadvantages. A common task in synthetic trace generation is to modulate the load, e.g., ramp up or ramp down, in order to evaluate server performance as a function of load. This is easily accomplished via the open queue by simply modulating the mean arrival rate λ. The closed queue, however, is not as easy to modulate because it requires adding and deleting clients from the system. There are practical problems such as which clients to delete, or, if adding new clients, which queue to start them off in. This problem is not encountered with the open queue model since it provides a natural means by which clients may be added or deleted from the system. A disadvantage to the open queue, however, is that it fails to incorporate client Off times. Frequently the On time distribution will have a finite tail while the Off time distribution will have a heavy tail. A heavy tailed Off time has the effect that the resulting superimposed load is self-similar and long range dependent. The closed queue incorporates the Off time information while the open queue does not. Note that for our banking data the open queue is still acceptably “bursty” due to the On time distribution having a somewhat heavy tail (Figure 8). The heavier than expected tail in the On time is caused by proxy IP’s generating sustained bursts of activity which add weight to the tail of the On time distribution.

2 This is a rather gross simplification. The actual session use/reuse protocols depend on both browser and server connection management policies.

mu−1 On

lambda

p 1−p

mu−1 Off Figure 4: The proposed hybrid queueing model is a natural combination of the open and closed queue. External arrivals are a Poisson process with rate parameter λ. The parameter p represents the recurrence probability, i.e., the probability the client will go through at least one more On time and Off time.

4.

A New Hybrid Approach to Synthetic Trace Generation

The preceding discussion of the queueing models corresponding to common existing synthetic trace generation demonstrates that neither approach accurately models the important aspect of transience and recurrence. We propose a natural combination of these two approaches which we term the hybrid approach. The corresponding queueing model is illustrated in Figure 4. The queueing system shows two queues feeding into one another but with external arrivals entering the system as a Poisson process with rate parameter λ, and a departure process determined by the recurrence probability parameter p. Thus clients entering the system alternate between On and Off times for a while and then eventually exit the system completely.

4.1.

Configuring Model Parameters for the Hybrid Queue

As before, we wish to configure the parameters of the system so that the average number of clients in the On queue matches the average number of clients in the On state found in the log file to be synthesizes. We choose the On and Off service time distributions to be exactly what is found in the corresponding log file, which leaves λ and p as free parameters. We obtain λ as the average rate at which new clients appear in the log file, i.e., the total number of unique IP addresses found in the log over some specified interval of time divided by the duration of the interval. To calculate the recurrence probability p we use the equation p=

Mon − Mip , Mon

Figure 5: Illustration of counting argument used to obtain a reasonable estimate for the recurrence probability parameter p. All On times but the last for a given client are labeled as recurrent and each client’s last On time is labeled as transient. The recurrence probability is configured as the fraction of recurrent On times found in the log file. clients with a varying number of On times. We label all On times but the last as recurrent On times, and each client’s last On time is labeled a transient On time. Thus the recurrence probability p is the fraction of all the On times that are recurrent in the log file.

4.2.

Advantages of the Hybrid Queue Model

The primary advantage of the hybrid queueing model over the closed and open queueing models is the improved fidelity to the recurrence and transience of clients found in the log file. Using this model for synthetic trace generation we may associate a client cache with each client. Clients cache the requests they make using a caching policy similar to what is used by client web browsers. Client requests are generated for each second of the On time, and if the object is cached but identified as possibly stale, then the client generates a cache consistency check for the object. This approach offers an easy way to obtain an accurate representation of “cache not modified” response codes. We may also naturally model TCP and/or SSL handshaking and session reuse by having each client handshake with the server upon first entering the system. We may also define a TCP and/or SSL session timeout parameter and force the client to renegotiate with the server if the session has expired. A secondary advantage to the hybrid queue is that the parameters λ and p are configured using the natural corresponding quantities found in the log file to be synthesized. The arrival rate is simply the rate at which new IP addresses are found in the log file over the specified interval, and the recurrence probability is simply the fraction of On times that are recurrent.

(2)

where Mon is the total number of On times found in the log file over all clients, and Mip is the total number of unique IP addresses found in the log file. The reason for this formula is illustrated in Figure 5 which shows a representation of three

5.

Simulation and Evaluation of the Three Queueing Models

We have written software to simulate and evaluate the effectiveness of these three queueing models.

5.1.

Verifying the Parameter Choices of the Three Queueing Models

Recall that we have chosen to configure the parameters for each of the three queueing models so as to obtain an equivalent average load, represented by the average number of clients in the On state. To verify these parameters have been chosen correctly we analyzed plots of the number of clients in the On state for the banking data and for each of the three queueing models. All plots are visually similar, suggesting that all three queueing models are successful in terms of accurately representing the mean number of clients in the On state. To verify the similarity more rigorously we performed some time series analysis on the number of clients in the On state over the course of the hour. Figure 12 plots the autocorrelation function ρ(k) versus the time lag k. Note the x-axis is on a logarithmic scale. None of the three models provide a perfect match to the correlation structure found in the server log files, but all three are reasonably close. We also constructed a variance time plot to investigate the self-similar nature of the load found in the log file and the load generated by the three queueing models. Figure 13 plots the sample variance corresponding to the process obtained by aggregating the number of clients in the On state into blocks of size m. That is, if the number of clients in the On state is denoted {xj }nj=1 then we define (m)

xj

=

1 m

mj X

xi , j = 1, ...,

i=m(j−1)+1 (m)

n . m

(3)

n

m The aggregated process {xj }j=1 is obtained by averaging n the elements of {xj }j=1 over blocks of size m. The process {x} is said to demonstrate long range dependence and self-similarity if the sample variance of the aggregated pro1 cess V ar({x(m) }) falls off in m slower than m . Intuitively, averaging {x} over increasingly large m will average out the fluctuations over increasingly long time scales. The critical de1 is also plotted. Thus the banking data and all three cay rate m queueing models demonstrate long range dependence.

5.2.

Analysis of Requests per Client over Time

The previous section emphasized the point that all three queueing models do a satisfactory job of simulating the instantaneous number of requests seen by the server. The difference between the three models is demonstrated more clearly when we examine the number of requests made by the individual clients comprising the simulation. Figures 14 through 17 plot the times of each request made by each client, where the clients are numbered in the order in which their first request is made. The simulation duration is one hour. The On and Off time durations are again drawn from the distributions plotted in Figures 8 and 9, and the number of requests made during each time slot comprising an On time is taken from Figure 10.

Figure 14 shows the client request times for an hour from our log files and Figure 15 shows the client request times simulated by the hybrid queue model. We can see the hybrid queue accurately models the rate at which new clients enter the system and also accurately models the request pattern. Figure 16 shows the results from the closed queue. Note there are only 248 clients in this simulation, as opposed to nearly 2500 clients in the log file. This is because all clients are recurrent in the closed queue model and so we need fewer clients to achieve the same overall load. Note also that each client’s request pattern is necessarily more dense in time than the log file plot–even though the simulation is using the same On and Off time distributions. Figure 17 shows the results from a simulation of the open queue. Note there are nearly 9000 clients in this model– the larger number being required since all clients are transient. Because of this transience, we see that each client is active for only one On time and then departs forever, completely failing to capture the request pattern from the log file. These figures underscore the point that the hybrid queueing model provides a natural means of capturing the client request pattern from a log file while the closed and open queueing models utterly fail to do so.

5.3.

Simulation of Content Caching

We showed in Table 1 that 27% of the requests seen in our log files received a response code of 304, “not modified”. The 304 response code is generated when a client’s cache makes a conditional GET request for a file to check if the client’s local copy of the file is still valid. These conditional GET requests are significant for web server performance benchmarking because the server doesn’t need to resend the file to the client if the client’s cached copy is still valid–it simply returns a 304 response telling the client to use its local copy. Because of this importance it is desirable that a synthetic trace generation approach be able to produce a trace with a 304 component statistically similar to what is found in the original file. We believe there are several approaches to this reproduction. One option is to simply randomly tag certain client requests as conditional GETs. The obvious way to do this for our data is to flip a biased coin with a heads probability equal to the 27% of 304s we observe in our log file. Here we are making the implicit assumption that all conditional GETs result in a cache not modified response, we make this assumption because one cannot disambiguate a conditional GET from a normal GET from our log files. Another approach is to actually simulate client content caching by keeping track of which files a client has requested and then sending a conditional GET whenever the client requests a file it has requested in the past. A third option, suggested to us by one of our reviewers, would be to simulate page expiration times, i.e., clients would check the page expiration and re–request the page if it has expired. Note that although the first approach may be used by all three queueing models, the second approach is only appropriate for the hybrid queue. Effective use of the third ap-

proach would require an artificial acceleration of page expiration times in order to capture the dynamics within the couple of hours usually allotted for benchmark tests. Moreover, the third approach can be seen as an extension of the second, i.e., a client sends a conditional GET if it has requested the file in the past, and the server resends the page only if the page has expired since that client’s last request time. The ability of the hybrid queue model to keep track of client state is the crucial feature here. In contrast, because clients are only present for one On time in the open queue they have relatively little chance in caching anything. Conversely, because clients are “permanent” in the closed queue they will eventually cache everything. The hybrid queue, however, provides a natural setting for simulating a client cache because each client is in the system for an intermediate number of On times during which it will achieve an intermediate cache hit ratio for the requests it will generate. We implemented a very simplistic client cache consisting of a list of the files that client had requested in the past. We are not modeling the fact that client caches are of finite size, that cached content is assumed valid, i.e., not requiring a consistency check, for some finite time, nor the important fact that dynamic content cannot be cached. Figure 18 shows the number of requests receiving a response code 304 each minute over an eight hour portion of our log files. The corresponding overall request load has a similar shape so that the relative fraction of requests receiving RC304’s is relatively constant around 27%. Thus the diurnal variation seen in the RC304 load is also present in the overall load. Figure 19 shows the file popularity empirical probability mass function from which we had our clients draw their file requests. There were 191 unique files requested once we stripped off the URL encoding present in some of the requests. This is a rather small number of unique files, but perhaps not so strange considering the site is an online banking site with a relatively small number of distinct services offered. The files were assigned a rank based on the overall number of requests for that file. We then tried to simulate the RC304 request pattern with the hybrid queue model. We used the On and Off time distributions from Figures 8 and 9, and we averaged the new client arrival rate over minute intervals and used those averages to modulate the Poisson process arrival rate. This allows us to achieve the load modulation apparent in Figure 18. We used the file request distribution in the Figure 19 and the counting argument described in Figure 5 to set the recurrence probability p. The results are shown in Figure 20. For purposes of comparison we implemented the first approach described above, i.e., randomly choosing requests to be labeled as conditional GETs using the open and closed queueing models. Both tests yielded similar results to those obtained using the closed queue. What is perhaps most apparent about these figures is that they perform very similarly to one another, but that neither captures the statistics of the RC304 load from the log file

so as to be statistically indistinguishable. We see the log file’s RC304 load to have a larger variation than that achieved by either the hybrid queue’s cache simulation scheme or the open and closed queue’s random selection scheme. We freely admit both approaches to be rather simplistic and conjecture the underlying dynamics governing the generation of conditional GETs is more sophisticated our models are able to account for. One advantage of the cache simulation scheme, however, is that it does allow one to naturally experiment with the impact of different client caching policies on the RC304 load seen by a server. Another benefit is that it seems more natural and intuitive to model client content caching through the use of an actual client content cache then through setting an artificial parameter as required by the random selection approach.

5.4.

Simulation of TCP and SSL session caching

Web servers cache TCP and SSL connection information to minimize the overhead of TCP and SSL handshaking protocols. The dynamics of this caching process are somewhat complex but essentially involve reusing the current session for a new client request provided the time since the last request from that client is less than some threshold parameter. SPECweb99 SSL addresses the issue of session ID reuse in a rather convoluted fashion. Each new TCP connection has an 18% chance of being designated to carry a Keep–Alive header. If the connection does carry a Keep–Alive header then the number of HTTP requests sent using that session is randomly selected according to a uniform distribution between 5 and 15. A new SSL session is negotiated once this number of HTTP requests has been sent. It turns out that each session ID is used by approximately 14.3 requests. While this method may be used to get the effect of session ID reuse on the server, it has little to recommend it in terms of being a natural model. We used the following model to investigate session reuse. We first analyzed the log file and kept track of the number of times each client made requests after having not made any requests for 10 or more minutes. We label these requests as requiring a renegotiation. For the hybrid queue, we produced a synthetic trace and made the same observations. For the open and closed queues, however, we used a simple marking scheme of randomly marking about 8% of the generated requests as requiring renegotiation. All three simulations seem to capture the renegotiation load adequately. We emphasize, however, that the hybrid queue model offers the ability to directly investigate the impact of session caching timeout parameters while a similar investigation using the other queueing models would be quite difficult. Also, we feel a strong advantage of the hybrid queue is that it models renegotiation load by identifying expired sessions directly rather than simulating them through random selection as required by the other two models.

Conclusion and Future Work Time Between Requests CCDF 1

"TimeBetweenRequests.data" u 1:4

0.1

0.01 CCDF

We have contrasted the hybrid model with both the closed and open queueing models. Our analysis has failed to show a marked superiority of the hybrid model over the closed and open queueing models with respect to the client content caching and TCP/SSL session reuse. We also wish to emphasize that our open and closed queueing models are simplifications of the available benchmarking products. As such we have necessarily glossed over many of the wide range of features these products offer. Nevertheless, we feel our model is important for two reasons. First, it offers a combination of the benefits of the two models, such as ease of modulation and incorporation of Off times, while reducing the drawbacks of the two models, such as offering natural means of calculating model parameters. Second, we feel the hybrid model to be more extensible than the other two models. For example, our current work focuses on constructing complex state transition models for browsing sessions. That is, we generalize the hybrid queue model to a network, with each queue representing a different page of the server, and transitions between queues representing hyperlinks connecting the pages. This page based model promises a more accurate representation of browsing sessions, page reuse, and session file requests. We have recently proposed this approach to a benchmarking organization.

0.001

0.0001

1e-05

1

10

100 1000 Time (Seconds)

10000

10000

Figure 6: CCDF for the time between requests.

Time Between Requests During On Time PDF 0.35

"tbreq.data" u 1:2

0.3 0.25 PDF

6.

0.2

0.15

References 0.1

[1] T. S. P. E. Corporation, http://www.specbench.org/osg/web99/.

“SPECweb99.” 0.05

0

[2] T. T. P. P. Council, “TPC Benchmark W.” http://www.tpc.org/tpcw. [3] M. Benchmarks, “The http://www.mindcraft.com/webstone.

WebStone

Benchmark.”

[4] P. Barford and M. Crovella, “Generating representative Web workloads for network and server performance evaluation,” in Proceedings of the ACM Sigmetrics Conference on i Measurement and Modeling of Computer Systems, 1998.

4

5

On Duration Disbn CCDF 1

[6] Z. Liu, N. Niclausse, and C. Jalpa-Villanueva, “Traffic model and performance evaluation of Web servers,” Performance Evaluation, vol. 46, no. 2-3, pp. 77–100, 2001.

"sondol.data" u 1:4

0.1

CCDF

0.01

0.001

[8] G. Banga and P. Druschel, “Measuring the capacity of a Web server,” in Proceedings of the USENIX Symposium on Internet Technologies and Systems (USITS), (Monterey, CA), Dec 1997.

0.0001

[9] L. Kleinrock, Queueing Systems: Volume II, Computer Applications. John Wiley & Sons, 1976.

1e-05

[10] E. Nahum, “Deconstructing SPECweb99,” in Proceedings of the 7th International Workshop on Web Content Caching and Distribution, 2002.

2 3 Time (Seconds)

Figure 7: PMF for the time between requests conditioned on those requests being separated by less than T = 5 seconds.

[5] K. Kant, V. Tewari, and R. Iyer, “Geist: A generator for E–commerce & Internet server traffic,” in Proceedings of the International Symposium on Performance Analysis of Systems and Software, 2001.

[7] D. Mosberger and T. Jin, “httperf – a tool for measuring Web server performance,” in Proceedings 1998 Workshop on Internet Server Performance (WISP), (Madison, WI), June 1998.

1

1

10 On Duration

100

Figure 8: CCDF of the On time duration.

Autocorrelation Sequence 0.7

Off Duration Disbn CCDF 1

"soffdol.data" u 1:4

Banking Closed Queue Open Queue Mixed Queue

0.6 0.5

0.1 rho(k)

0.4

CCDF

0.01

0.3 0.2

0.001 0.1 0

0.0001

-0.1 1e-05

1

10

100 1000 Off Duration

10000

1

10

100

Figure 9: CCDF of the Off time duration.

1000

k

10000

Figure 12: Autocorrelation structure for the number of On clients found in the original log file, the hybrid queue, the open queue, and the closed queue.

Variance Time Plot 10

Number of Requests Per Time Slot During On Time PDF

Banking Closed Queue Open Queue Mixed Queue 1/x

"numreq2.data" u 1:2

0.5

1

0.4

0.1

Var(X^(m))

PDF

0.6

0.3

0.01

0.2 0.001 0.1 0.0001 0

0

1

2 3 Num. Req.

4

10

100

1000

m

Figure 10: PMF for the number of requests per second per client.

Figure 13: Variance time plot for the number of On clients found in the original log file, the hybrid queue, the open 1 queue, and the closed queue. The decay rate m is also plotted.

Request Times Per Client (Log File)

Number On Sessions Per IP Disbn CCDF 1

1

5

2500

"rtpc-lf.data" u 2:1

"numses.data" u 1:4 2000

0.1 Client ID

CCDF

1500

0.01

1000

0.001

0.0001

500

0 1

10 100 Number On Sessions

1000

Figure 11: CCDF for the number of On times per IP.

0

500

1000

1500

2000 Time

2500

3000

3500

4000

Figure 14: Requests times by client: data taken from one hour of the log file.

Log File: Number Of Requests with RC304 Per Minute

Request Times Per Client (Hybrid Queue) 2500

180

"lf304.data" u 1:2

"rtpc-qh.data" u 2:1 160

2000

140

Num. Req.

120 1500

Client ID

100

1000

80 60 40

500

20 0

0 0

500

1000

1500

2000 Time

2500

3000

3500

4000

Figure 15: Request times by client: simulation of the hybrid queue.

0

100

200

300 Time

400

500

600

Figure 18: Number of requests per minute receiving response code 304 from an 8 hour period of our log files.

File Request Popularity PMF 0.1

Request Times Per Client (Closed Queue) 250

"fpd.data" u 1:2

"rtpc-qc.data" u 2:1 0.01

200

Client ID

PMF

150

0.001

100 0.0001 50 1e-05 0

0

500

1000

1500

2000 Time

2500

3000

3500

1

10 File Popularity Index

4000

Figure 16: Request times by client: simulation of the closed queue.

100

Figure 19: File request popularity probability mass function taken from the same 8 hour period.

Hybrid Queue: Number Of Requests with RC304 Per Minute 180

Request Times Per Client (Open Queue) 9000

140

7000

120 Num. Req.

8000

6000 Client ID

100

5000

80

4000

60

3000

40

2000

20

1000

0

0

"hq304.data" u 1:2

160

"rtpc-qo.data" u 2:1

0

500

1000

1500

2000 Time

2500

3000

3500

4000

Figure 17: Request times by client: simulation of the open queue.

0

50

100

150

200

250 Time

300

350

400

450

500

Figure 20: Simulation of the RC304 load using the hybrid queue model and direct simulation of client content caching.