Scalable Workflow System Model Based on Mobile Agents - CiteSeerX

Scalable Workflow System Model Based on Mobile Agents Jeong-Joon Yoo1, Doheon Lee2, Young-Ho Suh3 and Dong-Ik Lee1 1

Department of Info. and Comm. Kwang-Ju Institute of Science and Technology 1 Oryong-Dong Buk-Gu Kwangju, Korea (Republic of) 2 Department of Computer Science, Chonnam National University 300 Yongbong-Dong Buk-Gu Kwangju, Korea (Republic of) 3 Internet Service Department, Electronics and Telecommunications Research Institute 161 Kajong-Dong Yusong-Gu, Korea (Republic of) {jjyoo,dilee}@kjist.ac.kr, [email protected] [email protected] Abstract. A workflow system defines, creates and manages the execution of business workflows with workflow engines, which interpret workflow definitions, and interact with task performers. As most of non-trivial organizations have massive amount of workflows to process simultaneously, there is ever-increasing demands for better performance and scalability of workflow systems. This paper proposes a workflow system model based on mobile agents, so called Maximal Sequence model, as an alternative to conventional RPC-based and previous mobile agent-based (DartFlow) models. The proposed model segments a workflow definition into blocks, and assigning each of them to a mobile agent. We also construct three stochastic Petri net models of conventional RPC-based, DartFlow, and the Maximal Sequence modelbased workflow systems to compare their performance and scalability. The stochastic Petri-net simulation results show that the proposed model outperforms the previous ones as well as comes up with better scalability when the numbers of workflow tasks and concurrent workflows are relatively large.

1 Introduction A workflow is defined as the computerized facilitation or automation of a business process [1]. It is composed of tasks, each of which corresponds to a unit business task or another business (sub-) process. A workflow system defines, creates and manages the execution of workflow processes with one or more workflow engines, which interpret workflow definitions, and interact with task performers [1]. It is common for an actual workflow system to maintain and control hundreds of workflow definitions and tens of thousands workflow processes simultaneously. We can regard an execution of a workflow process as repetition of three basic steps such that (i) scheduling tasks, (ii) assigning tasks, and (iii) obtaining results until the entire workflow process completes. Workflow engines decide which tasks to perform in the scheduling step. They communicate with task performers to assign tasks in the assignment step. After completing the assigned tasks, the task performers communicate with the workflow engines to report the results. Most of existing workflow systems such as FlowMark [2], Action WorkFlow [3], FloWare [4], and Exotica/FMQM [5] adopt RPC model-based communications [6] between workflow engines and task performers. Due to the inherent characteristics of

©

the RPC model, workflow engines have to schedule the next tasks for all ongoing workflow processes, and two step-communications for assigning tasks and obtaining results are essential. As the number of workflow processes increases, the scheduling overhead given to workflow engines degrades the system performance significantly. In addition, the two step-communications also impose significant overload on the network bandwidth. To address this limited scalability, mobile agents have been considered as an alternative to the RPC model-based architecture recently [7]. A mobile agent is a software program that can migrate over a network under its own control and acts on behalf of a user or another entity [8]. Since mobile agents carry the workflow definitions by themselves, they can decide the next tasks to perform without help of workflow engines. Furthermore, workflow engines do not have to communicate with task performers to assign tasks as mobile agents migrate to proper task performers autonomously. It implies that the mobile agents residing in task performers can take loads of scheduling and assigning tasks off the workflow engines. However, since a mobile agent contains the entire workflow definition, the physical size is apt to be much larger than a simple RPC message. Consequently, the migration of mobile agents over the network causes another communication overhead between workflow engines and tasks performers as well as between task performers. In this paper, we propose to segment a workflow definition into blocks, and assigning each of them to a mobile agent. The segmentation is called Maximal Sequence model since it groups tasks that can be executed sequentially to the maximal extent. We also build stochastic Petri-net models for three architectural alternatives such that the RPC model, the previous mobile agent model, and the proposed Maximal Sequence model to compare their performance and scalability. The rest of this paper is organized as follows; Section 2 briefly explains the RPC model and the previous mobile agent model. Section 3 proposes a new approach called Maximal Sequence model along with an illustrative comparison with the previous models. Section 4 presents the simulation results to show the proposed model comes up with better performance and scalability in massive workflow environments.

2 Previous Workflow Systems In this section, we briefly describe the RPC model and the previous mobile agent model in the perspective of three basic steps (scheduling, assigning tasks, and obtaining results) of a workflow execution. 2.1 RPC model-based Workflow Systems Fig. 1 depicts workflow executions in RPC model-based workflow systems. A workflow engine decides which tasks to perform in the scheduling step. Assuming that the workflow definition indicates the corresponding task performer is A, the workflow engine communicates with A to assign a task in the assignment step. After completing the assigned task, task performer A communicates with the workflow

Task Performer A

Workflow Engine

Task Performer B

Fig. 1. RPC model-based workflow execution engine to report the result. To decide the next task, the workflow engine again performs the scheduling step. Assuming that the workflow definition indicates the corresponding task performer is B, the workflow engine communicates with B to assign a task. The entire workflow is executed in this way until the final task is completed. A fact worthy of note is that the workflow engine solely takes charge of scheduling and assigning tasks. As the number of workflow processes increases, the computational overhead imposed to the workflow engine becomes excessive. Furthermore, communication overhead is also concentrated to the workflow engine. Though distributed versions of RPC model based-architectures are introduced to break up this centralized overhead, the inherent characteristics of the RPC model place limitation in scalability of workflow systems. To overcome this limitation mobile agents-based workflow systems have been considered. 2.2 Mobile Agent-based Workflow Systems There are some benefits casting mobile agents into workflow systems as follows; - no need to consult the central workflow engine at every step and hence workloads being imposed on engines can be reduced, - intelligent routing can be implemented efficiently, - support thin clients, and - naturally support heterogeneous environment. A workflow system may include thin clients such as laptops and PDAs which are connected through unreliable networks. A mobile agent works well in this environment, since network connection is not necessary during the computation. Workflow system environment is fundamentally heterogeneous, often from both hardware and software perspectives. Because mobile agents are generally platform independent, they naturally support heterogeneous workflow system environment. Although there are many advantages of mobile agents in workflow systems, an equivalent solution can be found that does not require mobile agents. Whereas each individual advantage can be addressed in some manners including multi-agent [9][10], a mobile agent framework simply addresses all of them at once.

Workflow Engine

Task Performer A

Task Performer B

Fig. 2. Mobile agents-based workflow execution DartFlow, as an example, uses mobile agents for highly scalable workflow systems. Since the mobile agent carries the workflow definition by itself, it can decide the next tasks to perform without help of workflow engines as shown in Fig. 2. In this figure, a workflow engine creates a mobile agent, which contains the entire workflow definition. The workflow engine sends the mobile agent to a task performer (A in Fig. 2) that is in charge of the first task of the workflow. After the mobile agent completes the first task, it reports the result to the workflow engine; decides which task performer is in charge of the next task; and migrates to the task performer (B in Fig. 2). This migration continues until the entire workflow completes. According to the workflow definition, a mobile agent duplicates and migrates to multiple task performers to execute parallel tasks. The primary difference of this model to the RPC model is that the scheduling and assigning tasks are not in charge of workflow engines anymore. It implies that the computational overhead is distributed among workflow engines and task performers. In addition, the communication overhead for assigning tasks is also distributed. Although this mobile agent-based model may seem to resolve the scalability limitation of the RPC model, it has a hidden cost. Since a mobile agent contains the entire workflow definition, the physical size is apt to be much larger than a simple RPC message. Consequently, the migration of mobile agents over the network introduces another communication overhead between workflow engines and task performers as well as between task performers.

3 Maximal Sequence Model To reduce the communication overhead to move entire workflow definition, we propose to segment a workflow definition into blocks, and assigning each of them to a mobile agent. The segmentation is called Maximal Sequence model since it groups tasks that can be executed sequentially to the maximal extent. Definition 1. (Maximal Sequence Path) A collection of tasks that can be executed sequentially in a workflow is called a sequence path (No AND/OR-Split or AND/OR-Join conditions occur during

sequence path). If a sequence path cannot include another task to be executed sequentially, the sequence path is called a maximal sequence path. Fig. 3 shows examples of maximal sequence paths. The workflow consists of four maximal sequence paths, T1, T2, T3, and T4. Now, we can define the Maximal Sequence model of a workflow execution as segmenting a workflow definition into maximal sequence paths; and assigning each of them to a mobile agent. The following algorithm describes a pseudo code of the proposed Maximal Sequence model. Algorithm Maximal Sequence Model: ELEMENT Maximal_Sequence_Model(workflow) { TASK_QUEUE q; ELEMENT currentElement; set currentElement as the first element of workflow; do { switch (currentElement.type) { case TASK: add currentElement into q; currentElement = currentElement.next; // next element break; case SPLIT: define a set of tasks in q as a MSP and delete all tasks from q; // MSP : Maximal Sequence Path as in Definition 1 for (int i=0;i