Design of a Generic Platform for Efficient and Scalable Cluster

Design of a Generic Platform for Efficient and Scalable Cluster Computing based on Middleware Technology Stefaan Vanhastel, Filip De Turck*, Piet Demeester Department of Information Technology, Ghent University - IMEC Sint-Pietersnieuwstraat 4 1 , B-9000 Gent, Belgium. Tel.: +32 9 267 35 87, Fax: +32 9 267 35 99 E-mail: [email protected] *Research Assistant of the Fund Of Scientific Research - Flanders (F.W.0.-V.), Belgium

Abstract

ponents communicating with each other. These components often require large computational power and are therefore run on a pool of workstations. Distribution of these components over the available workstations is often done by human intervention in a more or less arbitrary way. This leads to inefficient use of the pool resources and encumbers the dynamic creation and removal of middleware components. Therefore, there is an urgent need for software tools and techniques for efficient and automated management and clustering of pool resources. In this paper we present a cluster management architecture which supports middleware applications as well as traditional software. The architecture is based on middleware since this (i) facilitates communication between the platform components and the middleware clients that invoke the execution of software components on the platform and (ii) ensures easy distribution of the components. Unlike traditional clusters (such as Beowulf clusters: [6]) the platform achieves load balancing by distributing middleware components between workstations, not by parallellizing monolythic processes. The advantage is that existing distributed applications can be run on the platform without any modifications, and that this approach allows for easy development of new applications since only standard middleware programming techniques are involved. Moreover, the platform also achieves more efficient balancing of communication load (both in terms of network traffic and of server invocation requests) by exploiting the inherent localization transparency offered by the middleware technology (requests to a specific server are automatically directed to the relevant workstation, instead of to a single point of contact). The platform has been successfully applied in the area of telecommunication network design and management, as reported upon in [4].

In this paper; we address the design of a generic and scalable platform f o r cluster computing. The architecture of the platforni is based on middleware technology in order to ensure easy distribution of the software components along the participating workstations and to exploit advanced comniunication techniques, such as event notifkation and object registration. The computational tasks are referred to as Intelligent Agents, which ure software agents that are capable of executing particular algorithms on input data. The developed platform offers advanced features such as transpurent load balancing, task scheduling, run time compilation of agent code and niigration of tasks. The architecture of the platjorni will be outlined from a computational point of view arid each component will be described in detail. Furthernrore, the engineering aspects of the platforni will be covered. In addition, a sample scenario f o r coniputational t a s k s in the area @ teleconimunication network design and niunagenient will be described.

1. Introduction Middleware technology is increasingly becoming the technology of choice for dcsigning and implementing large software architectures. This is mainly due to the numerous advantages middleware offers to the distributed software developer, namely (i) the developed applications are independent of the operating system, programming language and actual location of the distributed objects, (ii) the applications can make use of advanced features such as an Event Service and a Transaction Service and (iii) there is a strict enforcement of the 00 and client/server paradigms. One of the main characteristics of middleware-based distributed applications is that they consist of numerous com-

0-7695-1010-8/01 $10.00 O 2001 IEEE

The architecture is generic, in that it is independent of the computational tasks to be executed on the cluster platform.

40

in section 5. In section 6, we summarize our work, sum up the conclusions that could be drawn from this study and point out some issues under current research.

In this paper, the computational tasks will be referred to as Intelligent Agents, defined in the broadest sense as software components that are capable of executing particular algorithms on input data. Therefore, the implemented platform is referred to as an Intelligent Agent Platform (IAP). The platform offers advanced features such as the use of XML (extensible Markup Language: [ I ] ) to specify input and output data formats and the run time compilation feature of Java to allow flexible updating of the agents. This ensures that the agents can be implemented independent of their specific input/output parameters and that multiple input formats can be supported, which can be added or updated at run time. For the actual implementation, CORBA (Common Object Request Broker Architecture: [2]) was chosen as middleware technology together with the C++ and Java programming languages. For an excellent reference about CORBA programming with C++, the reader is referred to [ 3 ] . CORBA is the de facto middleware standard, defining interfaces and services to build distributed applications. It was specified by the OMG (Object Management Group) with the purpose to offer a high level interface for the distributed software developer. Alternatives include Sun’s EJB (Enterprise Java Beans: [IO]), which requires all applications to be written in Java code and Microsoft’s DCOM (Distributed Component Object Model: [ 1 I]), which limits the operating system to Microsoft’s products. The main difference between the proposed platform and traditional cluster architectures (e.g. Beowulf, Mosix [5]) is that it only provides Cluster Management Software. There are no provisions for providing a Single System Image (SSI), since the nature of the typical application (i.e. middleware-based software ) does not require this. The proposed platform offers more or less the same functionality as Grid Computing platforms such as Condor [9], Globus [8] and Legion [ 7 ] . However, the main property that distinguishes this platform from others is the CORBA-based design approach. While this does limit the applicability of the platform in terms of available workstations (a participating workstation needs to have CORBA installed) and applications (although vanilla applications can be run on the platform, only middleware-based applications can fully exploit all the advantages offered by the platform), CORBAs location transparency, standardized API and ease of integration with existing software components more than compensate for this. The remainder of this paper is structured as follows. Section 2 will detail the general concept of the implemened platform, whereas section 3 will focus on the computational decomposition of the architecture. Section 4 will focus on the engineering aspects, such as the security issues, task scheduling, task migration and the implementation of the Agent and code bases. A sample scenario will be described

2. General Concept The cluster management platform, (from now on referred to as the Intelligent Agent Platform as explained in the introduction) is organized in a number of pools. Each pool is managed by a dedicated Intelligent Agent Coordinator (IAC) and Load Balancing Server (LBS), responsible for dispatching agent creation requests and load balancing between workstations respectively. The IACs and LBSs are managed by an IACM (Intelligent Agent Coordinator Manager) and an LBSC (Load Balancing Server Coordinator), respectively. A pool is characterized by a unique poolID and a set of capabilities, listing the types of problems this particular pool can handle (i.e. the types of software agents the workstations of this pool can run). Each pool contains an arbitrary number of workstations, all capable of executing the same subset of agents. Subsets are not mutually exclusive, i.e. different pools can run the same agents. Each workstation is identified by a unique hos t I D . Workstations need not be connected to the same logical network - instead they can belong to different IP subnets and can be widely distributed geographically. Figure 1 illustrates the general concept of the Intelligent Agent Platform. It shows a number of pools, each with their dedicated IAC and LBS. Some of the pools are for internal use only (e.g. a “Routing” pool that calculates optimal routes through telecommunication networks and a “Monitoring” pool that processes network performance measurements), other pools (“Public 1” and “Public2”) are composed of workstations spread over the Internet, typically set up for tasks that require huge computational effort, such as the Seti@home project [ 141 of the university of Berkeley. Individuals or organizations wishing to contribute can simply download the necessary software and register their computer(s). The following paragraphs detail a typical scenario for job request, workstation selection, agent creation, agent execution and result notification.

Pool creation & workstation addition When the need for a new pool arises, a dedicated IACLBS pair is created by the IAP administrator (i.e. human intervention is usually required in order to prevent abuse - the only exception being the automatic creation of additional IACLBS pairs to relieve an overloaded IAC or LBS). A number of properties are associated with each pool, such as the type of agents this pool is intended to execute and the IDS of users who have permission to execute agents in this pool. Workstations wishing to join a particular pool simply assume its

41

'

Agent Execution Once a workstation is selected, the IAC will forward the agent creation request to that workstation. To increase the flexibility of the platform, agents are stored in an agent base and downloaded upon request. Agents can be tuned even more by downloading code from the Code Base and subsequent compilation of the necessary invocation code at runtime. Depending on the nature of the input data (static or continuously updated), it is passed either as a parameter at the time of the request or through the use of events throughout the lifetime of the agent. Results are returned to the client at the end of the algorithm or continuously throughout the execution of the agent in the case of repeated calculations. In both cases, event-based communication is used.

poolID (the administrator of the workstation actually enters the ID "Seti@home" in a configuration file or GUI), and register their h o s t I D with the relevant IAC and LBS. From then on, the IAC knows that these particular workstations are available to execute agents, and in turn the workstations will submit their load information to the LBS at regular intervals. Agent Execution Request: Pool selection When a client wishes to execute an agent, it contacts the IACM and requests the execution of an agent of a particular type. The IACM will then check if there is a pool capable of executing the requested type of agent, and if so, will return the reference to the responsible IAC. The IACM will also verify that this client has the necessary permissions to execute agents on a particular pool. In case multiple pools are capable of running a particular type of agent, the IACM will perform loadbalancing between the pools with the help of the Load Balancing Server Coordinator (LBSC).

3. Computational view of the Platform Architecture 3.1. General Architecture

Workstation selection Once a pool is selected, the client directly contacts the responsible IAC with a request to execute an agent. This IAC will transparently distribute agents between all available workstations. The actual load balancing is performed by the dedicated Load Balancing Server, based on information about the workstation load, estimated agent requirements in terms of resources (based on execution history), and a load balancing algorithm which can be tuned at runtime. This is achieved by run time compilation of (parts of) the load balancing algorithm Java code. Currently, a straightforward Lowest Load First algorithm is used to allocate non-reserved resources (the latter are discussed in section 4.2).

The general architecture of the Intelligent Agent Platform (IAP) is depicted in Figure 2 . It consists of two types of components (i) components for the administration and management of the platform and (ii) workstations-side components that execute the agents. The former comprise the Intelligent Agent Coordinator Manager (IACM), the Intelligent Agent Coordinators (IACs), the Load Balancing Server Coordinator (LBSC), the Load Balancing Servers (LBSs) and the Agent and Code Bases. The platform components at the participating workstations include the Intelligent Agent Execution Server (IAES) and the Load Monitors (LMs). Each of these components will be detailed in section 3.2.

Figure 2. Computational Decomposition of the intelligent Agent Platform.

Figure 1. Intelligent Agent Platform: General concept.

42

3.2. Component Description

Load Balancing Server (LBS) Each LBS has authority over a single workstation pool, which contains workstations capable and willing to run specific types of agents. An LBS runs a registration service which keeps track of the registered LMs on each workstation. Upon request from the IAC, the LBS selects the most appropriate workstation from its pool and returns its hostID to the IAC. Selection is based on a load balancing algorithm which can be selected or changed at runtime (by making use of the run time compilation feature of the Java programming language). In addition of keeping track of the load on individual workstations, the LBS also calculates the average load on the pool and reports it to the LBSC.

Intelligent Agent Coordinator Manager (IACM) The Intelligent Agent Coordinator Manager (IACM) manages the different pools in the Intelligent Agent Platform, each represented by an IAC. This component is responsible for the IAC creation on available servers. Available servers are machines running an Intelligent Agent Coordinator Factory (IACF) which has registered with the IACM. The factories are responsible for the actual creation of the IACs on their machine. This allows load balancing of IACs between available servers, with the actual load balancing done by an LBS. The IACM also directs new workstations to the IAC responsible for the pool they wish to join. Furthermore, the IACM keeps track of the capabilities of each pool and will provide clients with a reference to an IAC capable of executing the required agent. In case multiple IACs comply to the request, the IACM will base its selection on load balancing algorithms executed by the LBSC. From then on, clients will directly contact the IAC instead of the IACM. To allow further load balancing of requests from a particular client between pools, IAC references are timestamped and expire after a time-out. A single IACM per IAP is created by the IAP administrator, and its reference is added to the Corba Naming Service so that the IAESs can obtain its reference.

Intelligent Agent Execution Server (IAES) The IAES is responsible for the downloading (from the Agent Base), creation and execution of agents on a workstation. When the IAES is started, the LM will register with an LBS (reference obtained from the LBSC) and the IAES will register with an IAC (reference obtained from the IACM). Both components are identified with a unique poolID and hostID. The IAES will also subscribe agents to the appropriate event channels to ensure communication with the client. Intelligent Agents (U), Agent Base and Code Base The agent base contains the different agents either in binary or in source code form. Agents are downloaded and executed (after compilation if applicable) by the IAES when needed. To offer more flexibility, agents can be customized by downloading a particular algorithm (or fragments from algorithms) from a code base. Code fragments from the Code Base are downloaded by the agent, which then creates the necessary invocation code to be compiled at runtime on the workstation, and subsequently executed.

Load Balancing Server Coordinator(LBSC) The Load Balancing Server Coordinator (LBSC) manages the Load Balancing Servers. This component is also responsible for the LBS creation on available servers. Available servers are machines running a Load Balancing Server Factory (LBSF) which has registered with the LBSC. The factories are responsible for the actual creation of the LBSs on their machine. This allows load balancing of LBSs among available servers. This implies that at least two LBSs are created by the administrator on startup, one to load balance the IAC servers and one to load balance the LBS servers. In order to make pool load balancing decisions for the IACM, the LBSC keeps track of the average load of each pool, as reported by the LBSs. A single LBSC per ZAP is created by the IAP administrator, and its reference is added to the Corba Naming Service so that LMs can obtain its reference.

Event Channels Push event channels are used for the communication between the client and the agent. The IAES will subscribe agents to the appropriate event channels. By using an event-driven mechanism instead of a polling-based approach, the load on the platform is greatly reduced. Load Monitor (LM) Each IAES runs a Load Monitor which keeps track of workstation resources (CPU load, memory, Disk UO, Network YO, etc.). Currently, the gtop library is used to obtained process statistics. A factor (determined by running a number of benchmarks on typical hardware platforms) is applied to obtain a hardware-independent load metric. Load statistics are made available to the LBS. The LM registers with the LBS responsible for that pool (reference obtained from the LBSC based on poolID). Both IAES and LM are identified with a poolID and h o s t ID.

Intelligent Agent Coordinator(IAC) The IAC delegates incoming requests from clients to one of the IAESs which have registered (all IAESs registered with an IAC share a common poolID). Selection of the IAES is performed by the LBS responsible for that pool of workstations. The LBS returns the hostID of the selected workstation, which is mapped on an IAES reference by the IAC. IACs and their LBSs are created by the IAP administrator. IACs obtain a reference to their LBS from the LBSC.

43

3.3. Interface Definitions

4.1. Security Aspects

All the described software components interact with each other via well defined IDL interfaces. Each component offers an interface, which defines the operations that this component offers to the other components of the platform.

Security is an important issue for the platform, especially when using external resources. Two aspects are important: the platform has to provide (i) a secure communication environment and (ii) a secure execution environment. The former can easily be achieved by using the Secure Socket Layer (SSL) both for CORBA calls (specified in the CORBA-SSL standard: [2, 151) and for the transporl of code fragments and agents (by tunneling FTP over SSL or using secure HTTP (HTTF'S)). A secure execution environment is more difficult to implement. Making the execution environment secure for the users of the workstations can be achieved by certifying agents and code fragments with a PGP (Pretty Good Privacy: 116, IS]) signature, that the authenticity and integrity of the code fragment can be checked. This assures that users only execute code from a trusted source. However, it is next to impossible to make the execution environment safe from the clients' point of view. Preventing users of tampering with the code andor binaries or to develop trojan horsas which act as real agents but return bogus results is very difficult. The platform as described in this paper relies on the goodwill of the users and employs a user registration mechanism as dissuasion technique. sal

4.2. Task Scheduling

Figure 3. Interface definition fragment.

The platform, as detailed in the previous section; has been extended to allow clients to specify time information of their request. For instance, a client can request the execution of an agent every day between 9 am and 5 pm or every Friday night (e.g. for large calculations over the weekend). In this way, clients can make reservations for the execution of their jobs. Incoming requets for immediate execution can use the remaining (not reserved) resources. Two generic middleware components, a Scheduled Request Coordinator (SRC) and a Scheduled Request Executor ( S E ) have been designed. The SRC collects the incoming requests and performs admission control on this set of requests. Furthermore, the SRC invokes algorithms for the optimal assigment of the requests to the appropriate workstations, taking into account the workstation load distribution and the duration of the scheduled tasks. The actual request assigment to the IAES at the requested start time is performed by the Scheduled Request Executor (SRE). At the time of writing, these generic components are being implemented and various algorithms for the optimal assignment of a scheduled task to a particular workstation are being developed and thoroughly evaluated. This will be reported upon in extensive detail in a future paper.

These interfaces have been defined in a generic way, which prevents redefinition of the interfaces (and consequent recompilation of the ORB stubs) every time a new request parameter is supported or a novel load monitoring technique is added to the system. This is mainly achieved by the use of key-value pairs in the definition of the interfaces and by the use of XML strings to describe the request input parameters. As an example, the definition of the IDL interface of the Intelligent Agent Coordinator (IAC) component is listed in Figure 3. The IAC interface provides operations to execute an agent request or to stop a running agent. The LBS interface offers operations to select a workstation based on a prediction of the load, to register a Load Monitor or to update the load information. The other components have similar interfaces.

4. Engineering view This section highlights some of the engineering aspects such as the security aspcts, task scheduling, task migration and the implementation of agent and code bases.

44

4.3. Task Migration

5. Sample Application

Based on the load information of the workstations in a particular pool, the Load Balancing Server (LBS) can detect that some workstations are overloaded with jobs, whereas other workstations in the pool are lightly loaded. The LBS can then trigger the Intelligent Agent Coordinator (IAC) to redistribute the agents over all available workstations in order to achieve a more optimal distribution of the agents. Subsequently, the IAC will contact the appropriate IAESs with the request to stop the execution of an agent and save his current state. This task will then be migrated to the IAES on a less loaded workstation. The latter IAES will consequently invoke the continuation of the agent. currently, every agent implements a task migration interface offering suspendhewme methods. Task migration through checkpointing is being implemented.

A typical application environment for the described Intelligent Agent Platform is the area of telecommunication network management. Consider the example of a service provider, offering the Virtual Private Network (VPN) service to his customers. The calculation of the optimal route for the VPNs often involves large calculations and routing algorithms are very often subject to change. These are exactly the two requirements (large calculations and flexible algorithm updates) for which the Intelligent Agent Platform is best suited. Section 5.1 will desrcibe an example agent for the calculation of VPN routes and section 5.2 details the component interactions for the example by means of a sequence chart.

4.4. Agent Base Implementation The agent base contains agents in binary format. When a workstation receives a request to create and execute an agent, the workstation can download the necessary binaries (either for a new installation or to upgrade an existing one) from the Agent Base. Since workstations come in a variety of Operating Systems, the Agent Base needs to store binaries for all supported 0% Furthermore, since some agents might depend on specific libraries, the Agent Base has to keep track of these dependencies and notify the execution environment on the workstations (the IAES) that additional libraries are required. It is the responsibility of the IAES to fetch and install the required libraries. Since installing new libraries is an OS-specific task, it is desirable to delegate it to the OS as much as possible, and -whenever possible- use dedicated OS tools to perform this task. For example, for workstations running Linux (e.g. the Linux Debian Distribution), the agent base is based on the standard install mechanisms such as the packageing system from the Debian distribution. Agents are stored as standard Debian packages (containing the agent as well as version and dependency information) on a standard FTP or HTTP server.

5.1. VPN Routing Agent The input to the VPN routing agent will consist of (i) the provider’s network topology and network state, such as the available bandwidth and reserved connections as a function of time and (ii) information about the VPNs to be set up. This involves end point information and also the technical details and Quality of Service requirements for the VPNs and (iii) the desired output format to describe the calculated routes. This will be provided as a DTD (Document Type Declaration: [ 11) string. Furthermore, a reference to the DTDs, which describe the network representation and VPN information, will also be provided. This enables the agent to invoke the appropriate parsing functions on the input XML strings. Figure 4 lists a DTD to describe the network topology and state information. The agent is a Java application that performs the following steps: 1. It downloads the appropriate Network and VPNLis t classes from the code base, based on the DTD information of the input XML strings.

2 . A new Java class is created and the code for the parsing of the network and VPN request info is inserted.

4.5. Code Base Implementation

3. The code for the appropriate routing algorithm is The Code Base is implemented as a CORBA component with two types of operations: (i) get operations, which allow the IAESs or the agents to download code to be compiled at runtime and (ii) operations to allow the IAP administrator to add or remove code. When requesting code from the code base, the client also receives a list of other files or libraries that are necessary to compile the requested code fragment. These files or libraries can be OS specific. The IAES will subsequently download the necessary files or libraries and also trigger their compilation and/or installation.

downloaded from the code base and the code to execute the algorithm is appended to the new Java class. 4. The appropriate R o u t eL is t class is downloaded from the code base and the code to convert the algorithm output to the desired output format is appended to the new Java class.

5. The new Java class is compiled and executed. 6. The results are returned.

45

r ? m l encoding=.US-I\SCII.?>

#IMPLIED #IMPLIED

< ! E L M E M INTERFACESLOT INTERFACECARD?>
XIMPLIED

< ! E L M E N T PORT ANY, < ! E L M E N T LINK EMPTI,
CDATA CDATA CDATA CDATA CDATA CDATA CDATA CDATA

XIMPLIED XIMPLIED #IMPLIED XIMPLIED XIMPLIED #IMPLIED WIMPLIED WIMPLIED

Figure 5. Message sequence chart for the VPN Routing Agent example.

Figure 4. Document Type Declaration to represent the network topology and state information.

ponents. Moreover, advanced engineering techniques, such as security aspects, migration of tasks, task scheduling, run time compilation of agents stored in code bases are detailed. An application of the platform in the area of telecommunication network design and management has been described. Further work includes the implementation and performance evaluation of efficient algorithms for task scheduling and task migration.

Note that the agent code is written in a generic way so that it can be applied to all kinds of routing algorithms.

5.2. Sequence Chart Figure 5 shows a message sequence chart for the VPN routing agent example. For clarity, only interactions between components running on the workstation where the agent is executed are shown. The agent execution request from the client is propagated through the platform until finally the appropriate IAC dispatches it to the IAES of one of the workstations in the pool. The IAES will download the Agent class file from the agent base, and create and start the agent. The agent in turn will download the necessary support classes (e.g. Network representation and VPN specification classes) from the code base, and create a temp class as described in the previous section. After adding an algorithm and route conversion code, the temp class is executed and returns its results to the agent. The agent in turn will push the results on the event channel on which the client is listening.

7. Acknowledgement Part of this work has been supported by the Flemish Government through the IWT project ITA-GBO. Furthermore, the authors thank Frederik Debacker and Pieter Thysebaert for the stimulating discussions.

References Bob DuCharme, “XML, The Annotated Specijication”, Prentice Hall, 1999.

OMG, “The Complete formaW98-12-09: CORBA Specijication”, http://www.omg.org, December 1998.

6. Conclusion

Michi Henning, Steve Vinoski, “Advanced CORBA Programming with C+ +”, Addison-Wesley, 1999.

An architecture for a cluster computing platform based on middleware has been presented, which is both generic and scalable. The advantages of the use of middleware technologies have been outlined. The different components of the architecture have been described in detail, together with the design and implementation issues of each of these com-

Filip De Turck, Stefaan Vanhastel, Filip Vandermeulen, “Design and implementation of a Generic Connection Management and Service Level Agreement Monitoring Platform Supporting the Virtual Private Network Service”, Accepted for IFIPLEEE IM 2001, Seattle, May 2001.

46

153 Mosix website, “Mosix: Scalable Cluster Computing f o r Linux”, http://www.mosix.cs.huji.ac.il/. [6] Beowulf website, “The http://www.beowulf.org/.

Beowulf

Project”,

[7] Legion webiste, “Legion Worldwide Virtual Computer”, http://legion.virginia.edu/. [8] Globus website, “The http://www.globus.org/.

Globus

Project”,

[9] Litzkow, M., Livny, M., and Mutka, M.W., “Condor - A Hunter of Idle Workstations”, Proceedings of the 8th International Conference of Distributed Copmuting Systems, pp.104-111, June, 1988.

“Enterprise JavaBeans, Developing Component Based Distributed Applications”, Addison-Wesley, 1999.

[ 101 Tom Valesky,

[ 1 11 Thuan Thai, “Learning DCOM, Distributed Components on Windows”,O’Reilly, April 1999. [ 121 Rajkumar Buyya, “High Performance Cluster Com-

puting: Architectures and Systems, Vol.1”, Prentice Hall, 1999. [ 131 Rajkumar Buyya, “High Performance Cluster Com-

puting: Programming and Applications, V01.2”, Prentice Hall, 1999. [ 141 Setiohome homepage,”Seti, The Search f o r

traterrestrial Intelligence”, ssl .berkeley.edu/

Ex-

http://setiathome.-

[ 151 William Stalling, “Cryptography and Network Secu-

rity, Principles and Practice”, Prentice Hall, 1998. [ 161 Simson Garfinkel, “PGP, Pretty Good Privacy”,

O’Reilly, 1995.

47