Performance Study of Reliable Server Pooling - CiteSeerX

5 downloads 13262 Views 129KB Size Report
bile ad-hoc networks include network partitioning, high sig- ...... [5] Cisco Systems, San Jose, CA. CiscoTM ... Content Delivery Networks: Web Switching for.
(IEEE NCA: Int’l Symposium on Network Computing and Applications, pp. 205-212, Cambridge, MA, April 2003)

Performance Study of Reliable Server Pooling ∗† ¨ M. Umit Uyar1 , Jianliang Zheng1 , Mariusz A. Fecko2 , Sunil Samtani2 1

Electrical Engineering Department, The City College of the CUNY, New York, NY 2 Applied Research Area, Telcordia Technologies, Inc., Morristown, NJ features a name-based addressing model that isolates a logical communication endpoint (identified by a pool handle) from its IP address(es)—connections between end-systems are viewed as communications between a client and a pool, which is mapped to the underlying transport and IP paths. An architecture for the reliable server pooling is currently being defined by the RSerPool Working Group (WG) of the IETF. The RSerPool’s distributed structure provides high reliability and efficiency for server pooling operations [18, 19]. To help evaluate the design of reliable server pooling in different environments, this paper investigates if the RSerPool framework is capable of providing sufficient performance in wired and wireless networks. In the realm of wireless networking, our interests are focused on mobile adhoc networks that can support mission critical applications (such as disaster recovery and battlefield communications) in an environment with no fixed infrastructure. As part of this research, an NS-2 simulation testbed for the RSerPool [15, 18, 20] has been implemented. A series of simulation experiments were run to characterize different aspects of the framework. Our simulation results show that the implemented version of the RSerPool performs well in fixed and relatively reliable environments, but its performance worsens rapidly as the networks become more unreliable or mobile. We identify problems in wireless mobile ad-hoc networks such as network partitioning, high overhead, and excessive aggressiveness in handling failures. This paper is organized as follows. Section 2 states the motivation and scope of our work. Section 3 gives an overview of the IETF RSerPool WG architecture. Section 4 describes the NS-2 simulation testbed and its capabilities. Section 5 presents the simulation setup and metrics, as well as the experiments results and their interpretation.

Abstract The reliable server pooling allows redundant information sources to be viewed as a single transport endpoint, and therefore is able to provide persistent connections and balanced traffic. The IETF RSerPool Working Group has proposed an architecture to implement the reliable server pooling. We conducted a number of simulation experiments with the current definitions of the RSerPool protocols to quantify their performance in both wired and wireless environments. The simulation results show that the RSerPool works well in fixed and relatively reliable environments, but its performance worsens rapidly as the networks become more unreliable or mobile. The issues we identified in wireless mobile ad-hoc networks include network partitioning, high signaling overhead, and excessive aggressiveness in handling failures. These problems are partly due to the heavy reliance of the RSerPool architecture on the reliability of the underlying network, which is unlikely to be guaranteed in a wireless mobile ad-hoc environment. Keywords: reliable server pools; RSerPool; ASAP; ENRP; server selection; ad-hoc networks; battlefield networks

1. Introduction For applications that require persistent connections to servers such as military communications, real-time transactions, and videoconferencing, a traditional abort-and-restart approach is not sufficient. The reliable server pooling is a comprehensive framework to handle session failures and system performance degradations. It allows a pool of redundant information sources to be viewed as a single transport endpoint, which implies that, in case of session failures, some applications can be transparently switched to another server without restart. The reliable server pooling

2. Motivation and Scope It has been a basic requirement for most communication networks to provide reliable connections, prevent service interruptions and downtime, and enable rapid recovery or failover when failures occur or network performance degrades. Different approaches have been reported in the literature to improve the server reliability and enhance the system availability [1, 9, 16]. For example, server replication

∗ Prepared

through collaborative participation in the Communications and Networks Consortium sponsored by the U.S. Army Research Laboratory under the Collaborative Technology Alliance Program, Cooperative Agreement DAAD19-01-2-0011. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation thereon. † c 2003 Telcordia Technologies, Inc.

1

is a basic paradigm with the promise to increase the network reliability and efficiency. In most server replication approaches, sessions originally served by the failed server are lost, although new sessions can be established and sustained by the new server. Simple server replication by itself cannot thus guarantee highly available services [16]. The aim of the RSerPool is to provide an open standard, together with a set of signaling protocols that do not entail layering violations, i.e., they preserve the end-to-end model of IP. This is one of the main features that distinguish the RSerPool from proprietary systems such as web-server farms [10]. In such systems, a front-end device provides access to a set of back-end servers through a single virtual IP address. Typically, all communication between clients and the server farm must pass through this device, which inspects, and in some cases modifies, headers at OSI layers 2, 3, 4 and 7. These solutions are usually targeted specifically for Web applications and associated media streaming protocols, and require deployment of vendor-specific hardware. For example, to utilize CiscoTM Distributed Director [5], a vendor router has to be deployed in the topological proximity of each replicated server. The RSerPool, with its loosely-coupled architecture, does not set a complete fault-tolerant computing as its goal. Rather, it maps a single communication-destination name to multiple routable and reachable transport endpoints to increase the availability of distributed software-server entities registered under that name. The difficulty of providing fault-tolerant computing with the RSerPool is the necessary penalty for the openness of the RSerPool protocols as well as high survivability of its architecture. Since the server pools may span several network domains, the services that a particular pool provides are likely to survive disasters limited to some geographical locations; a similar statement may not be true in the case of server farms or clusters that share a local network. In addition, the equivalence of a specific functionality of two or more servers is decided outside the RSerPool framework by the servers choosing to register to the pool that supports a given service. The above advantages and limitations determine the current scope of the RSerPool. It is essentially a communications-oriented overlay network providing an upper layer protocol or an application with a range of reliability services. These services, ranging from simple server selection to a fully automatic session-failover capability, can be designed and added using the RSerPool as a distributed, open platform. For example, the RSerPool allows plugging in various server selection policies, which may be a simple least-recently-used algorithm (used in our study) as well as more sophisticated ones based on load balancing. While the general techniques for the transport-level session migration are available [1, 16], this issue (and the related problem of keeping data integrity) is currently be-

(1)

.. … …

PU

.. … …

(3) NS

(2) (4)

Applications: RSP-blind RSP-partially-aware RSP-fully-aware

.. … …

.. … …

PE(j)

… PE(i) .. … …

PE(k)

.. … …



RSP API RSP Mapping(shim)

PE(l)

SCTP/TCP/UDP

Figure 1. Reliable server pooling stack and operations.

yond the core scope of the RSerPool. The work on transparent failover specific to the RSerPool platform is ongoing both within [6] and outside [7] the IETF. For simple data access transparency among pool elements, one can consider data replication techniques for partitionable ad-hoc networks [4, 9], or a file synchronization protocol (e.g., rsync [17]) that can keep the inconsistencies between servers transient. Instead, we assume a failover capability for simple applications (e.g., file transfer), and focus on the core RSerPool architecture and signaling protocols.

3. RSerPool framework As shown in Fig. 1, there are three classes of entities in the architecture defined by the IETF RSerPool WG: pool elements, name servers, and pool users. A set of servers with the same application functionality are grouped into server pools. A client can access a server pool by consulting a Name Server (NS). A server in a pool is called a Pool Element (PE), and a client being served by a PE of a server pool is called a Pool User (PU). NSs play a critical role in this architecture because they are in charge of the management and maintenance of the entire server-pool namespace. Each NS communicates with its peers to share the same view of the namespace. All NSs have the same functionality and can backup each other, although they are distributed at different sites in the network, each taking care of a different group of PEs and PUs. An NS is said to be the Home NS of those PEs and PUs that are under its supervision. Any entity, except the NS itself, must find a Home NS to whom all its requests for accessing the namespace must be directed. The RSerPool architecture consists of two complementary protocols, namely, Endpoint Name Resolution Protocol (ENRP) [20] and Aggregate Server Access Protocol (ASAP) [15]. The former is used among NSs for sharing 2

information about the server-pool namespace, and the latter primarily for communications between PEs/PUs and NSs. The two protocols are asymmetric in that ASAP messages can be exchanged between ASAP and ENRP agents.

Wired/Wireless Scenarios Definition

ENRP Agent

3.1. Endpoint Name Resolution Protocol (ENRP) ENRP is the protocol that defines the procedures and message formats of a distributed, fault-tolerant registry service for storing, bookkeeping, retrieving, and distributing pool operation and membership information. There are two types of communications used among ENRP servers: point-to-point messages and announcements from one server to all its peers. An ENRP server uses PEER PRESENCE message to locate other ENRP servers during initialization and bootstraps itself into the server-pool namespace. The PEER PRESENCE is also used as a heartbeat message, that is, after initialization, the PEER PRESENCE message is periodically sent by an ENRP server to inform its peers of its active status. The PEER LIST REQUEST and PEER LIST RESPONSE messages are used for a new ENRP server to acquire a copy of the list of ENRP servers from a peer server already in the operation scope. Similarly, the PEER NAME TABLE REQUEST and PEER NAME TABLE RESPONSE messages are used for a new ENRP server to acquire a copy of the current serverpool namespace (i.e., the server pools) from a peer server already in the operation scope. Whenever an ENRP server changes the server-pool namespace, it will send out a PEER NAME UPDATE message to inform its peers to update their local copies.

ASAP Agent

MAODV Routing

Simplified SCTP

RSP-aware CBR APP

Deterministic/Statistical Error Model

NS-2

Figure 2. NS-2 simulation testbed. the transport-heartbeat mechanism is insufficient. END POINT UNREACHABLE is a message sent by a PU to its Home NS to indicate that it has problems to reach a certain PE. A PE or PU has to find a Home NS by sending a SERVER HUNT message to the ENRP client channel before it can access any services provided by a server pool. Cookie messages are used to facilitate the sharing of session status.

4. NS-2 Simulation Testbed The NS-2 simulation testbed developed at the City College of New York comprises seven functional modules, as shown in Fig. 2. The Wired/Wireless Scenarios Definition module is a Tcl script interface; the other six modules are written in C++. Each module is briefly described below. • Wired/Wireless Scenarios Definition: (1) For wired networks, it defines the network topology, link bandwidth and propagation delay, link queue, and traffic pattern; binds ENRP and ASAP protocols to nodes; selects the unicast/multicast routing protocols; configures the debug/trace options and error models; and schedules various events such as initializations of NSs and PEs/PUs, re-/de-registration of PEs, starting/stopping applications. (2) For wireless mobile ad-hoc networks, it defines the initial network topology, mobility pattern, wireless channel, radio-propagation model, antenna model, interface queue, and traffic pattern; binds ENRP and ASAP protocols to mobile nodes, selects radio transmission range, wireless unicast/multicast routing protocols; configures the multi-layer trace options and error models; and schedules various events as in wired networks. • ENRP Agent: This module handles both ENRP and ASAP messages, since an ENRP agent needs to communicate with both peer ENRP agents and ASAP agents. An ENRP agent can unicast or multicast messages to its peers, but only unicast messages to ASAP agents. The module enables an ENRP agent to: initialize itself and begin to serve as an Name Server in the server pool namespace, acquire peer server list from a peer, download server-pool namespace data from a peer, handle PE (re-/de-)registration, provide name resolution for a pool handle, update serverpool namespace, detect and remove unreachable PE from pool(s), help PEs/PUs to discover Home NSs.

3.2. Aggregate Server Access Protocol (ASAP) ASAP, in conjunction with ENRP, provides a highavailability data-transfer mechanism over IP networks. ASAP depends on ENRP to provide a high-availability namespace. It is responsible for the abstraction of the underlying transport technologies, load-distribution management, fault management, as well as the presentation to the upper layer (i.e., the ASAP user) a unified primitive interface. ASAP uses a name-based addressing model and defines each logical communication destination as a server pool, providing full transparent support for server pooling and load sharing. It monitors the reachability of the PEs in a server pool and has the ability to automatically switch an association from one PE to another in case of PE failures or performance degradations. It also allows dynamic system scalability, i.e., members of a server pool can be added or removed at any time without interrupting the service. A server can join or leave a server pool by sending Registration or Deregistration request to its Home NS. A PU or PE can query about a server pool by sending a NAME RESOLUTION message to its Home NS. END POINT KEEP ALIVE is a heartbeat message used to determine a PEs health status in a more timely fashion when 3

• ASAP Agent: An ASAP agent gets name-resolution service from ENRP agents and provides an interface to upperlayer applications. The module supports the following functions: server hunt for PEs/PUs, PE (de-/re-)registration, pool handle resolution, endpoint keep-alive, reporting unreachable endpoints, PE selection, and switchover. • MAODV Routing: Multicast Ad-hoc On-Demand Distance Vector protocol [13] is integrated into the testbed to support multicast and unicast in ad-hoc networks. • Simplified SCTP: This module currently contains a subset of SCTP [14] closely related with the reliable server pooling. This module supports the basic functions of a connection-oriented session such as acknowledgment, retransmission, failure detection and notification, as well as a configurable receiving window for congestion control. No SCTP-specific reliability features are implemented; therefore, they do not affect the reliability results presented here. • RSP-aware CBR APP: To facilitate the simulation of reliable server pooling, we define a reliable server pooling aware (RSP-aware) constant bit rate (CBR) application interface. The packet size and packet rate can be defined at the beginning of simulations. • Deterministic/Statistical Error Model: A statistical error model is used to simulate link-level noise-featured errors or losses. However, rare but severe failures such as cuts of links and breakdowns of servers are simulated deterministically to ensure the comparability among different scenarios.

density (10 pkts/sec, 20 pkts/sec, 40 pkts/sec). To address the scalability problem in wired networks, four sets of scenarios are defined, depending on the number of nodes: • 16 nodes (1 NSs + 2 PEs + 13 PUs) • 36 nodes (2 NSs + 5 PEs + 29 PUs) • 49 nodes (3 NSs + 7 PEs + 39 PUs) • 100 nodes (6 NSs + 14 PEs + 80 PUs) For wireless mobile networks, all nodes are randomly distributed within a two-dimensional square plane at the beginning of a simulation run; these nodes then move based on a random waypoint mobility pattern [3]. A random link error model and an RSP-aware CBR model are used. For wireless networks, a free-space-propagation channel is assumed and the routing protocol of MAODV is used. The performance of the RSerPool in wireless mobile networks is evaluated with respect to the following parameters: • Transmission range [m] (100, 300, 500) • Node speed range [m/sec] (0-10, 0-20, 0-30) • Pause between node movements [sec] (2) • Link error rate [%] (0.2, 0.4, 0.8) • Application traffic density [pkts/sec] (10, 20, 40) To address the scalability problem, four scenarios for each set of parameters (e.g., 300 m, 0-10 m/sec, 2 sec, 0.2%, 20 pkts/sec), based on different number of nodes with their roaming planes, are defined as follows: • 16 nodes (2 NSs + 2 PEs + 12 PUs in 800x800 m2 ) • 36 nodes (5 NSs + 5 PEs + 26 PUs in 1200x1200 m2 ) • 49 nodes (7 NSs + 7 PEs + 35 PUs in 1400x1400 m2 ) • 100 nodes (14 NSs + 14 PEs + 72 PUs in 2000x2000 m2 ) The number of nodes and the corresponding area have been chosen in such a way that the ratio between them remains the same for all four sets of scenarios. For all simulations, two server pools are constructed, where each PE joins both pools, and half of the PUs are served by each pool. For wired networks, the initialization of the namespace, including initializations of ENRP servers, home hunts for PEs/PUs and construction of server pools, begin within the first 50 seconds of each simulation. PUs access pools during the period from 100 to 700 seconds. For wireless networks, the first 600 seconds are skipped: the initialization of the namespace begins at 600 seconds, and PUs access pools during the period from 1,000 to 1,600 seconds. It has been reported [3] that, for some cases, the first 15 minutes of the NS-2 experiments may be unstable for the waypoint mobility model. Therefore, in our simulation experiments the results are recorded after 15 minutes.

5. Simulation Experiments One set of simulation experiments are devoted to wired networks, which are the main focus of the IETF RSerPool WG. Reliable server pooling plays even more important role in wireless mobile ad-hoc networks, where the connections are more unreliable; yet many important applications with high-reliability requirements (e.g., disaster recovery, law enforcement, and digital battlefield communications) must operate in this environment. Another set of simulation experiments are thus designed to evaluate the performance of the RSerPool in wireless mobile ad-hoc networks.

5.1. Experimental Setup For wired networks, mesh topology is used in our simulation experiments, since other popular topologies (e.g., star, bus, and ring) are likely to include one or more single points of failure. All nodes are evenly distributed within a two-dimensional square plane for simplicity. A duplexlink with a bandwidth 2 Mb/s and a link delay of 10 ms are used for all the links. A random link error model is adopted for the simulations. An RSP-aware CBR model is currently used for all end-to-end application sessions. The performance of the RSerPool in wired networks is evaluated with respect to the following parameters: (1) statistical link error rate (0.01%); and (2) application traffic

5.2. Performance Metrics We extract and analyze several performance metrics to measure the performance of basic RSerPool operations, applications and overall system efficiency as described below. • Number of home hunt attempts per PE/PU per unit time: It reflects the home-hunt efficiency, which can significantly 4

Table 1. Scalability experiment results for wired networks. Parameters: Link Bandwidth (BW)=2Mb/s, Link Delay (D)=10ms, Link (AppD)=20pkts/sec Number of Nodes (NumN) Number of home hunt attempts per PE/PU per second Percentage of successful transmissions among NSs Number of switchovers per PU per second during an application session Latency of home hunt (in seconds) Ratio of unnecessary PE de-registrations to total PEs

Error Rate (ErrR)=0.01%, Application Traffic Density 16 nodes 0.001 NA 0.0000 0.0427 0:2

36 nodes 0.001 92.86% 0.0000 0.0482 0:4

49 nodes 0.001 88.78% 0.0000 0.0523 0:6

100 nodes 0.001 87.78% 0.0000 0.0457 0:12

Table 2. Application traffic density experiment results for wired networks. Parameters: NumN=49, BW=2Mb/s, D=10ms, ErrR=0.01% AppD Number of home hunt attempts per PE/PU per second Percentage of successful transmissions among NSs Number of switchovers per PU per second during an application session Ratio of RSerPool messages to data messages Latency of home hunt (in seconds) Ratio of unnecessary PE de-registrations to total PEs

affect the overall performance of the RSerPool. It can be used to measure the effect of different network parameters, especially those for wireless mobile networks such as movement speed and transmission ranges. • Percentage of successful transmissions among NSs: RSerPool is a distributed framework and depends on the synchronization ability of NSs to work properly. This metric measures the synchronization ability among NSs. • Number of switchovers per PU per unit time during an application session: It shows how the RSerPool increases the reliability of an application session. • Ratio of RSerPool messages to data messages: It is defined as the ratio of total the RSerPool messages (ENRP and ASAP messages) sent to application data messages successfully received. It can be used to measure the overall throughput and control overhead. • Latency of home hunt: It is defined as the period from the time a PE/PU begins to hunt for a home to the time it receives the first response from an NS in the namespace. • Ratio of unnecessary PE de-registrations to total PEs in the RSerPool namespace: A PE needs to be de-registered by its home NS from a pool (or pools) if it is no longer operational due to some QoS/failure condition. We define unnecessary PE de-registration as the de-registration of an operational PE from a pool (or pools), which could happen when a node temporarily moves out of transmission range.

10pkts/sec 0.001 88.78% 0.0000 1:1337 0.0523 0:6

20pkts/sec 0.001 88.78% 0.0000 1:1340 0.0523 0:6

40pkts/sec 0.001 89.00% 0.0001 1:1575 0.0523 1:6

It can be seen from Tables 1 and 2 that the wired networks have relatively high reliability. For example, the percentage of successful transmission among NSs is near 90% and more than 1,300 data messages are delivered for each RSerPool message sent. In general, the performance of the RSerPool is quite reliable and stable when we vary the number of nodes. The same is true for application traffic density, which affects the network performance mainly through use of buffers. Thanks to the static routing information, a packet can be sent out quickly in wired networks, and the effects of traffic densities in our simulations are rather negligible as one can see from Table 2. When the application traffic density is increased to 40 packets per second, though, there appears one PE unnecessarily de-registered by its home NS.

5.4. Experiment results for wireless networks Our simulation experiments indicate that the overall performance of the RSerPool in wireless mobile networks is not satisfactory. The performance of wireless mobile networks is more sensitive to various network parameters such as number of nodes, transmission range, movement speed, link error rate, and application traffic density. Table 3 shows that the percentage of successful transmissions among NSs is only about 18%, which implies that the network partition problem is very serious and NSs can hardly synchronize with each other in wireless mobile networks. This is a serious shortcoming since the reliability and efficiency of the RSerPool depend on the synchronization ability of NSs. Table 4 shows that the transmission range plays an important role in wireless mobile networks. In our simulations, the average distance between two nodes is 200 me-

5.3. Experiment results for wired networks A set of experiments for wired networks are run to evaluate the effects of the number of nodes (Table 1), and traffic density (Table 2). The overall performance of the RSerPool for wired networks is satisfactory. 5

Percentage (%)

Attempts per Second

0.4 0.3

wired wireless

0.2 0.1 0 0

50

100

100 80 60 40 20 0

150

wired wireless

0

50

Number of Nodes

100

150

Number of Nodes

Figure 3. Number of Home Hunt attempts per PE/PU per second (data from Tables 1 and 3).

Figure 4. Percentage of successful transmissions among NSs (data from Tables 1 and 3).

Number of Data Packets

ters. (In terms of hops, transmission ranges 100m, 300m, and 500m correspond to 0.5 hop, 1.5 hops, and 2.5 hops, respectively.) The simulation results indicate that the network performance will suffer if the transmission range is too small, since not enough nodes will be covered. The opposite is true for too large transmission ranges, which cause traffic congestion, more interference, and higher energy consumption in the network. These observations are confirmed by an in-depth study [2] of transmission-range effects on MAODV, where too large a range is reported to limit the effective bandwidth of neighboring users. As shown in Table 5, percentage of successful transmission among NSs increases slightly when the movement speed range is increased from 1-10 m/sec to 0-20 m/sec. However, it drops by more than 50% when the movement speed range is increased from 1-20 m/sec to 0-30 m/sec. This result shows that a slightly higher movement speed may help improve the network performance, but too high speed degrades the network performance sharply due to quick expiration of the routing information. Table 6 shows that the percentage of successful transmissions among NSs drops when link error rate increases from 0.2% to 0.8%. However, the other metrics do not change much as the link error rate increases because the constant mobility is the main contributor for failures in wireless mobile networks. The effect of application traffic density is not very significant (Table 7). Nevertheless, this effect needs to be further examined, since the application traffic densities used in our simulations are relatively moderate.

2000 1500

wired wireless

1000 500 0 0

20

40

60

Traffic Density (pkts/sec)

Figure 5. Data packets delivered per RSerPool packet sent (data from Tables 2 and 7).

works, with the number of data packets successfully delivered less than 6 for each RSerPool packet sent. Among the RSerPool packets, about 50% are SERVER HUNT packets and about 25% are ENDPOINT KEEP ALIVE and END POINT UNREACHABLE packets. Another problem in wireless mobile networks, as shown in Tables 3 through 7, is that a high percentage of PEs are unnecessarily de-registered by their home NSs. A PE will be de-registered by its home NS if it is found unreachable by its home NS, or if the number of times that it is reported unreachable by PUs more than a given threshold. In wireless mobile networks, a PE or its home NS can move out of the transmission range of other nodes for a period that is long enough to trigger the unnecessary de-registration. In wired networks, however, the communications failures mainly come from the link error, which is generally very small and rarely results in unnecessary PE de-registrations. In wired networks, a NS can efficiently communicate with its peers and maintain an accurate view of the serverpool namespace. Most PEs and PUs successfully find a Home NS during the first home-hunt attempt, and hold on to that server for a long period. Failures of NSs and PEs can be reliably detected, and failovers can then be successfully performed. However, the heartbeat mechanism among NSs is rather inefficient and expensive in a highly reliable network. The timeout-based failure-detection mechanism has limitations, especially for applications such as military communications and real-time transactions.

5.5. Comparison of wired and wireless networks Our simulation experiment results shows that the performance of the RSerPool is quite different for the wired and wireless mobile networks. As illustrated in Fig. 3, the number of home hunt attempts per PE/PU per second is much higher in wireless mobile networks than that in wired networks. Fig. 4 shows that the percentage of successful transmissions among NSs is about 90% for wired networks, but between 3% and 18% for wireless mobile networks. As shown by Fig. 5, the overhead of RSerPool in wireless mobile networks is much higher than that in wired net6

Table 3. Scalability experiment results for wireless mobile networks. Parameters: Transmission Range (TranR)=300m, Movement Speed Range (SpdR)=0-10m/sec, Pause between Movements (PauM)=2sec, Link Error Rate (ErrR)=0.2%, Application Traffic Density (AppD)=20pkts/sec Number of Nodes (NumN) 16 nodes 36 nodes 49 nodes 100 nodes Number of home-hunt attempts per PE/PU per second 0.103 0.288 0.222 0.194 Percentage of successful transmissions among NSs 2.82% 9.52% 18.06% 6.57% Number of switchovers per PU per second during an application session 0.0158 0.0262 0.0391 0.0516 Latency of home hunt (in seconds) 261.54 133.38 133.10 143.83 Ratio of unnecessary PE deregistrations to total PEs 0:4 2:10 3:14 10:28

Table 4. Transmission range experiment results for wireless networks. Parameters: NumN=49, SpdR=0-10m/sec, PauM=2sec, ErrR=0.2%, AppD=20pkts/sec TranR 100m 300m Number of home-hunt attempts per PE/PU per second 0.168 0.222 Percentage of successful transmissions among NSs 0.81% 18.06% Number of switchovers per PU per second during an application session 0.0620 0.0391 Ratio of RSerPool messages to data messages 1:1.47 1:4.92 Latency of home hunt (in seconds) 281.88 133.10 Ratio of unnecessary PE deregistrations to total PEs 0:14 3:14

500m 0.269 25.00% 0.0193 1:6.49 176.24 0:14

Table 5. Movement speed range experiment results for wireless networks. Parameters: NumN=49, TranR=300m, PauM=2sec, ErrR=0.2%, AppD=20pkts/sec SpdR 0-10m/sec Number of home-hunt attempts per PE/PU per second 0.222 Percentage of successful transmissions among NSs 18.06% Number of switchovers per PU per second during an application session 0.0391 Ratio of RSerPool messages to data messages 1:4.92 Latency of home-hunt (in seconds) 133.10 Ratio of unnecessary PE deregistrations to total PEs 3:14

0-20m/sec 0.247 24.79% 0.0396 1:4.97 143.56 7:14

0-30m/sec 0.303 11.13% 0.0376 1:4.34 144.22 3:14

Table 6. Link error rate experiment results for wireless networks. Parameters: NumN=49, TranR=300m, SpdR=0-10m/sec, PauM=2 sec, AppD=20pkts/sec ErrR 0.2% 0.4% Number of home hunt attempts per PE/PU per second 0.222 0.243 Percentage of successful transmissions among NSs 18.06% 10.97% Number of switchovers per PU per second during an application session 0.0391 0.0364 Ratio of RSerPool messages to data messages 1:4.92 1:5.06 Latency of home hunt (in seconds) 133.10 132.42 Ratio of unnecessary PE de-registrations to total PEs 3:14 4:14

0.8% 0.281 10.13% 0.0377 1:4.81 132.49 4:14

Table 7. Application traffic density experiment results for wireless networks. Parameters: NumN=49, TranR=300m, SpdR=0-10m/sec, PauM=2 sec, ErrR=0.2% AppD 10 pkts/sec Number of home hunt attempts per PE/PU per second 0.361 Percentage of successful transmissions among NSs 15.72% Number of switchovers per PU per second during an application session 0.0423 Ratio of RSerPool messages to data messages 1:2.64 Latency of home hunt (in seconds) 127.54 Ratio of unnecessary PE de-registrations to total PEs 2:14

7

20 pkts/sec 0.222 18.06% 0.0391 1:4.92 133.10 3:14

40 pkts/sec 0.265 16.32% 0.0422 1:5.91 112.63 3:14

6. Conclusion

[3] T. Camp, J. Boleng, and V. Davies. A survey of mobility models for ad-hoc network research. In Basagni and Lee, eds, Mobile Ad-Hoc Networking—Research, Trends and Applications, vol. 2(5) of Wiley J. Wirel. Commun. Mob. Comput., pp. 483–502. 2002. [4] K. Chen, S.H. Shah, and K. Nahrstedt. Cross-layer design for data accessibility in mobile ad-hoc networks. Kluwer J. Wirel. Personal Commun., 21(1):49–76, 2002. [5] Cisco Systems, San Jose, CA. CiscoTM Distributed Director. (http://www.cisco.com). [6] P.T. Conrad and P. Lei. Services provided by reliable server pooling. Internet draft, IETF, 2002. [draft-conrad-rserpoolservice, work in progress]. [7] T. Dreibholz. An efficient approach for state sharing in server pools. In Proc. IEEE LCN: Conf. Local Comput. Networks, Tampa, FL, 2002. [8] M.A. Fecko, S. Samtani, M.U. Uyar, and P.T. Conrad. Designing reliable server pools for battlefield ad-hoc networks. In Proc. IIIS SCI: World Multi-Conf. System. Cybern. Inf., vol. X, pp. 357–362, Orlando, FL, 2002. [9] T. Hara. Effective replica allocation in ad-hoc networks for improving data accessibility. In INFOCOM’01 [12], pp. 1568–1576. [10] S. Hull. Content Delivery Networks: Web Switching for Security, Availability and Speed. McGraw-Hill/Osborne, Berkeley, CA, 2002. [11] Proc. IEEE ICDCS: Int’l Conf. Distrib. Comput. Syst., Vienna, Austria, 2002. [12] Proc. IEEE INFOCOM, Anchorage, Alaska, 2001. [13] T. Kunz and E. Cheng. On-demand multicasting in ad-hoc networks: Comparing AODV and ODMRP. In ICDCS’02 [11]. [14] R. Stewart and C. Metz. SCTP: New transport protocol for TCP/IP. IEEE Internet Comput. M., 5(6):64–69, 2001. [15] R. Stewart and Q. Xie. Aggregate server access protocol (ASAP). Internet draft, IETF, 2002. [draft-ietf-rserpoolasap, work in progress]. [16] F. Sultan, K. Srinivasan, D. Iyer, and L. Iftode. Migratory TCP: Highly available Internet services using connection migration. In ICDCS’02 [11]. [17] A. Tridgell. Efficient Algorithms for Sorting and Synchronization. PhD thesis, Australian National Univ., Canberra, Australia, 2000. (http://samba.org/rsync). [18] M. Tuexen, Q. Xie, R. Stewart, M. Shore, L. Ong, J. Loughney, and M. Stillman. Architecture for reliable server pooling. Internet draft, IETF, 2001. [draft-ietf-rserpool-arch, work in progress]. [19] M. Tuexen, Q. Xie, R. Stewart, M. Shore, L. Ong, J. Loughney, and M. Stillman. Requirements for reliable server pooling. RFC 3237, IETF, 2002. [20] Q. Xie and R. Stewart. Endpoint name resolution protocol (ENRP). Internet draft, IETF, 2002. [draft-ietf-rserpoolenrp, work in progress].

The reliable server pooling is a framework that allows a pool of redundant information sources to be viewed as a single transport endpoint and, therefore, is an open platform to provide persistent connections and balanced traffic for different applications. The IETF RSerPool WG has proposed an architecture to implement the reliable server pooling. In this paper, the performance of the evolving RSerPool framework is evaluated for wired and wireless mobile adhoc networks. The simulation results show that the performance of the RSerPool for wired networks is satisfactory. In wireless mobile ad-hoc networks, however, the performance of the simulated versions of the RSerPool protocols is not satisfactory. The most serious problem encountered is the network partitioning (which prevents NSs from communicating with each other efficiently), high signaling overhead, and excessive aggressiveness in handling PE failures. Further research is needed to address the problems identified in the IETF RSerPool WG architecture. Simple server-selection schemes defined in the IETF RSerPool WG architecture are not sufficient for wireless mobile ad-hoc networks because the high mobility and limited server resources have not been taken into account [8]. The choice of mobility models [3] is also likely to impact the performance metrics, since the mobility patterns in disaster-recovery and combat-field applications follow a group mobility model rather than the random waypoint one.

Acknowledgments We would like to thank Dr. Thomas Kunz of Carleton Univ. for providing parts of the MAODV C++ source codes, and Dr. Phill Conrad of Temple Univ. for useful comments.

Disclaimers • Our simulations were performed for the architecture being developed within the IETF RSerPool Working Group as described by the early 2001-2002 internet drafts [15, 18, 20] (works in progress and subject to change). All references in this paper to ASAP, ENRP, and any part of the RSerPool architecture apply to their versions described here, which represent a particular stage in their development that may differ from the current IETF version by the time the final version of this paper appears. • The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government.

References [1] L. Alvisi, T.C. Bressoud, A. El-Khashab, K. Marzullo, and D. Zagorodnov. Wrapping server-side TCP to mask connection failures. In INFOCOM’01 [12]. [2] E.M. Belding-Royer and C.E. Perkins. Transmission range effects on AODV multicast communication. Kluwer J. Mob. Networks Appl., 7(6):455–470, 2002.

8