A Scheduling Strategy for a Real-Time

A Scheduling Strategy for a Real-Time Dependable Organic Middleware Uwe Brinkschulte, Alexander von Renteln, Mathias Pacher Institute for Process Control and Robotics University of Karlsruhe (TH) {brinks, renteln, pacher}@ira.uka.de

Abstract. This paper presents the architecture and conception of a dependable organic middleware based on the yet existing, not organic middleware OSA+. We show a scheduling strategy which assigns missions in real-time to a distributed set of platforms in the scope of a fabric automation scenario. The missions are distributed to different robots by the organic middleware whose scheduling includes organic aspects like self-organization, self-optimization and self-healing. Keywords: Self-organization, organic real-time scheduling, OSA+ middleware, self-healing

1

Introduction

Nowadays, it is a challenge to manage complex missions in fabric automation in respect to performance, robustness and flexibility because there is only a limited number of robots which have to be used in an optimal way. Another aspect is that robots might have to overtake missions from defect robots. The SIMON project at the University of Karlsruhe deals with these challenges [9] in the scope of a fabric automation scenario by trying to add organic features to a middleware. This means that the middleware is intended to have some life-like properties as self-configuration, self-healing and self-optimization to e.g. distribute the missions to robots or to autonomously re-schedule a mission in case of a robot failure. By having this self-x features, the dependability of the system is considerably increased. In this paper, we present the architecture of the organic middleware and its scheduling strategy which holds the mentioned organic features. The paper is structured as follows: In section 2, we present related work and similar approaches to our work. In the following section 3, we present a short introduction of the OSA+ middlware. Section 4 presents the concept of execution paths, which is important for the scheduling strategy. In the sections 5, 6 and 7, we explain pre-conditions and assumptions made for the scheduling of the organic middleware, and in section 8, the scheduling is explained in detail. Section 9 concludes the paper.

2

Related Work

A lot of different middleware systems have been developed to provide a homogeneous view over a heterogeneous network like CORBA [10], Java RMI [11], DCOM [12] and

Microsoft’s .NET framework [13]. In contrast to our middleware, the softwares above cover no organic aspects. Self-organization has been a research focus for several years. Publications like [15] deal with basic principles of self-organizing systems, like e.g. emergent behavior, reproduction etc. Regarding self-organization in computer science, several projects and initiatives can be listed. IBM’s Autonomic Computing project [4, 5] deals with selforganization of IT servers in networks. The German Organic Computing Initiative has been founded in 2003. Its basic intention is to improve the controllability of complex embedded systems by using principles found in organic entities [14]. Regarding self-organization in middleware, current middleware approaches provide features for load-balancing. Middleware architectures fulfilling organic computing principles are rare. In [3], the use of middleware for self-healing is investigated, but none of the presented approaches deals with fabric automation. Another approach towards an organic middleware is AMUN developed at the university of Augsburg [8]. It consists of four main parts: The Transport Interface which decouples the communication from the transport platform and the Event Dispatcher which is responsible for the delivery of incoming and outgoing messages. The Service Interface and Service Proxy is the connector between the AMUN middleware and services which build an application and the Autonomic Manager which configures the services on a platform. Our approach is more fine grained than the AMUN approach because our Autonomic Manager will even configure the jobs to be executed by services. In this way, our organic middleware is able to react to changes in its environment in a faster and more fine grained way. Another difference is that our algorithms support real-time properties which is not concerned by AMUN.

3

The OSA+ Middleware

OSA+ is a service oriented middleware for distributed real-time systems [7]. It provides an uniform view over a heterogeneous network, protocols and OS features and simplifies distributed application development. In OSA+, the active communication parts are services. A service realizes certain functionalities which are made public to the execution environment through an interface. This interface can be accessed in a platform and a language independent manner. In our case, the service interface is accessed through jobs. A job consists of an order and a result. The order is sent from one service to another to state what functionality the service should do, for which data, and when the action should be performed. After the service executes the order, a result is sent back. The communication of jobs is accomplished by a platform. The platform facilitates the plugging of services which can communicate with each other. An important aspect regarding the OSA+ middleware consists of the flexible way it adapts to different environments. In this respect, OSA+ is using the micro-kernel concept. The core of the middleware, the micro-kernel, has a minimum foundation of functionalities and is independent from the execution environment (no hardware nor operating system dependent parts).

The adaptation of the middleware is done by a set of special services provided by the developer, which extend the functionality of the core. These services are plugged into the platform according to the user needs, and realize tasks like the MemoryService which allocates memory for jobs and services at run-time by accessing the memory management and the ProcessService which introduces the multi-tasking and multithreading facilities of the hardware and operating system to the middlware. Besides, the Communication services make use of the available communication systems, e.g. TCP/IP, serial, etc. These services provide a transparent use of communication between different hardware platforms. The EventService handles timer events or events caused by other hardware components. It is used as a monitoring tool and signals if jobs cannot hold their time constraints. Another aspect of the OSA+ mddleware is that is designed to offer support for real-time and introduces only small overhead at run-time which is acceptable for most applications [7].

4

Execution Paths

In [6], we described the idea of execution paths in detail. The idea is that missions are splitted in sequences of atomic jobs which can directly be executed by services of the middleware.1 Splitting the mission introduces some dependencies to be considered: If we look at the mission ”Bring a sparepart from place A to place B”, then the resulting sequence of atomic jobs is as follows: ”Drive(A)” ”PickUp(grab, sparepart)” ”Drive(B)” ”PickUp(unload, sparepart)” Considering the set of atomic jobs, we notice that the jobs have to be executed sequentially because it is useless to drive to B and unload the sparepart before having picked it up at position A. We also notice that the jobs have to be executed by the same robot since it is also useless if e.g. one robot drives to A and picks up the sparepart while another robot drives to B and tries to unload the sparepart now located on the first robot. Generalizing this example, we identify three kinds of dependencies between atomic jobs: Let A and B be atomic jobs and X and Z be a service and a resource respectively, which are able to execute A and B. There is a 1. temporal dependency between A and B if and only if B has to be executed after the finishing of A. 2. service dependency between A and B if and only if A and B have to be executed by the same service. This means if A is executed by X then B has to be executed by X, too. 1

In fact, the concept is more general. It can be used to split real-world tasks/missions in sequences of jobs to be executed by humans or computers and not only by a middleware, see [6].

3. resource dependency between A and B if and only if A and B have to be executed on the same resource. This means if A is executed by any service running on resource Z then B has to be executed by any service running on Z, too. The temporal and the resource dependencies are motivated and explained in the example. The service dependency is used for e.g. database accesses. If there is a job storing a value in a database of a certain service, the value can only be read out from another job by accessing the same database service. Knowing the dependencies, we define an execution path as follows: An execution path is a finite set of atomic jobs which have to be executed in a certain order by one resource. The execution of an execution path is not interruptible and after finishing the execution path, the resource has to be in an initial state. This definition is very intuitive as it follows the idea to form the missions by atomic jobs. It includes the temporal dependencies given by the ”certain order”, and it also includes the resource dependencies since the atomic jobs of the execution paths are claimed to be executed by one resource.

Mission

Execution Path

…

Execution Path

Job

Job

Execution Path

Job

Job

Job

Fig. 1. A mission is partitioned into execution paths

By initial state (in the definition), we mean a pre-defined state of the resource. This specification was claimed in order to ease the scheduling of the Autonomic Manager. If the robot has e.g. a mechanical arm, the initial state of the robot is that the arm is in a zero-position. The claim eases the scheduling of the execution paths because the Autonomic Manager does not have to include this information in the scheduling decision. The additional specification that the execution of the execution path is not interruptible means that once an execution path has started it can not be interrupted by another execution path (it forms a logical unit like a transaction). The reason is that a resource or a service might be in a state different from the initial state while executing the execution path. Therefore, it is neither guaranteed that the other execution path can be started nor that the first execution path can be resumed after finishing the interrupt. A mission given by the user consists of one or several execution paths, see fig. 1. We assume as a precondition that missions are organized in a way that there are no dependencies between different missions.

But it is also possible, that there are dependencies between different execution paths of a mission, see also fig. 1. In [6], we described the problems arising from dependencies between different execution paths and presented several ways to handle them. In the following sections, we will describe a scheduling scheme for the execution paths to resources respectively to services. The scheduling is able to hold real-time constraints and covers the above mentioned organic properties.

5

Assumptions for the Organic Middleware

In our scenario, we assume that the user decomposits the missions to execution paths. The basic idea of the organic middleware is now that the assignment of execution paths to resources and services is done by the middleware autonomously. It introduces selfconfiguration, self-optimization due to reassignment of execution paths under changing conditions and self-healing due to reassignment of execution paths in case of resource failure. This is done by the Autonomic Manager (AM) (see fig. 2). As there is an AM running on each resource, one master is elected by a de-centralised master election [1].

mission execution path

execution path

execution path

execution path

execution path

Robot

execution path

AM

service x

OSA+ µKernel

Robot AM

Robot service y

OSA+ µKernel

AM

service z

OSA+ µKernel

Fig. 2. Mission scheduling by the Autonomic Manager

Additionally, several parameters are assigned to the resources which describe important properties of them. Since the parameters are application dependant, the user has to define them. Nevertheless, we can identify two kinds of categories of these parameters: – static parameters and – dynamic parameters Static parameters are all parameters which do not vary during time. These parameters describe features of a resource e.g. if a resource is able to move or to grab an object. In contrary to static parameters, dynamic parameters are able to vary. Parameters like the amount of power left in the battery of a robot or the number of items on the cargo area are examples for dynamic parameters. During the execution of jobs/execution paths

on the resources, these dynamic parameters may vary due to the power consumption or the charging of the resources.

6

A Scheme for the Parameter Prediction

To be able to choose between all possible assignments of the execution paths to the resources, it is necessary to know the development of all the dynamic parameters. Therefore, we need to predict the variation of the dynamic parameters if a resource will execute a job respective an execution path. If we know the parameter prediction of the different assignments of execution paths, we can choose the best one. Since the middleware does not know about the semantics of a job, some information has to be given by the application. For parameter prediction three types can be distinguished: Relative value given by the application For some jobs, the variation of a parameter is directly given by the job. We explain this by an example: Let’s consider the dynamic parameter ”number of spareparts on the robot”. If there is a job ”Load(sparepart, 3)” which means that three spareparts have to be charged on the robot, the predicted parameter value of the job is increased by 3. Therefore, the applicant has to give the information about the modification of the parameter in this case. Absolute value given by the application For some other jobs, the absolute value of a parameter of the job executed is given by the application job. Let’s consider the dynamic parameter ”Distance to target” and the job ”Drive(target)”. If a robot executes this job, the distance to the target will be predicted to be ”0” after. Therefore, the application has to give the information about the new value of the parameter. Value calculated by the middleware For the most of the jobs, the AM can predict the parameter variation by the following equation: rnew = rold − (M ∗ c job + d job ) Hereby, rnew and rold are n-dimensional vectors containing the values of the n dynamic parameters to be considered in this scenario. rold are the parameter values before and rnew are the parameter values after executing the job. The modification of the parameter values is predicted by the term M ∗ c job + d job where M is a n × n matrix containing the connections between the parameters. This matrix is universal for the whole scenario but can be modified by the user for each job if necessary (see next section). The vector c job depends on the job and announces the parameters necessary to predict the dynamic parameter variation. d job is a constant vector also defined for this job. Notice that we assume the parameter variation to be calculated linearly.

7

The Calling Scheme

The mission and thus its set of execution paths is send to the AM by an XML file which is structured as shown in fig. 3. Its structure is almost self-explaining and therefore, we only mention some of the parameters in the XML file in detail.

The root element of the container is the mission element and contains at least one execution path. Each execution path contains at least one job which has four subelements: restrictions, weightings, dependencies and instruction. In restrictions, the user is able to claim minimal or maximal values of parameters, e.g. the minimal power level needed to execute a job. The weightings include job or execution path specific modifications of the matrix M. The dependencies mentioned in section 4 are included in the dependencies elements. Finally, the instuction element contains an instruction along with a timeout and an optional quality of service value. Figure 3 illustrates the format of the XML file.

mission [id]

execution path [id]

restrictions

execution path [id]

job [id]

job [id]

weightings

dependencies

instruction

restriction restriction weighting weighting dependence dependence

name

min

name

value

execution path id

job id

id

Timeout

QoS

max exact

Fig. 3. Tree illustrating the XML format

8

The Scheduling of the Autonomic Manager

In this section, we will explain the scheduling algorithm of the AM. The AM will firstly do a parameter prediction as mentioned in section 6 for all possible assignments of execution paths to resources and services. After finishing, it will compare the results by some relations and choose the best one according to the relation. The first step of scheduling - the parameter prediction In the first step, the AM builds up a prediction tree for a mission. The root of this prediction tree is the current status of each resource. Hereby, the current status of a resource are the current values of all of its dynamic parameters. These values are included in a resource vector. From this starting point, the AM begins to predictively schedule the first execution path - the one with the highest priority - to the different robots. For this purpose, the AM checks if the jobs of the execution path can be executed by the services of these resources (this is done by checking the static parameters). If this is possible, the AM

predicts the parameter modifications according to section 6 and creates a new leaf in the prediction tree for each possible assignment of the execution path. Each leaf contains the modified values of the dynamic parameters according to the predicted assignment. These modified values are also represented by a modified resource vector. In this step, the AM also checks if the restrictions (mentioned in section 7) can be met. If they can be met, the branch is continued and otherwise, the branch will not be explored for the next execution paths. After finishing the predictive assignment of the first execution path, the AM starts to assign the second execution path in the same way as the first execution path was scheduled. The main difference is, that this prediction now starts from the leafs of the first prediction, see fig. 4.

EP 1 on resource 2

EP 1 on resource 1

EP 2 on resource 1

EP 1 on resource 3

EP 2 on resource 4

EP 2 on resource 4

EP 2 on resource 3

EP 2 on resource 4

EP 3 on resource 1

EP 3 on resource 3

EP 3 on resource 1

EP 3 on resource 3

EP 3 on resource 3

EP 3 on resource 3

EP 3 on resource 4

1

2

3

4

5

6

7

Fig. 4. A prediction tree assigning 3 execution paths to 4 resources

This procedure is repeated for all of the mission’s execution paths and as a result of the first step, we get a complete prediction tree. The second step of scheduling - choice of the best leaf In a second step, the AM has to choose the best leaf and thus the best assignment of execution paths according to the dynamic parameters. Since the AM has to compare the leafs, it combines the parameter vectors of all of the leafs’ resources in the following combination vector mlea f :    min{r1res1 , r1res2 , ..., r1resk } m1  m2   min{r2res , r2res , ..., r2res }    1 2 k  :=   ...  =   ... mn min{rnres , rnres , ..., rnres } 

mlea f

1

2

k

Each of its parameter values contains the minimum of the parameter values of one type of a leaf’s robots, e.g. power. This scheme guarantees for each robot of a leaf a minimum parameter value calculated in the combination vector thus presenting a lower bound for the parameters. Using the combination vector for each leaf, the AM has to compare them to find the best suited assignemnt of execution paths.

We implemented two different strategies to compare the combination vectors: Comparison by vector length When using this strategy, the AM compares the length of the different leafs’ combination vectors. It calculates the length of a combination vector mlea f by the following formula: n

mlea f = ∑ |mi | 1 i=1

After computing the length of the different combination vectors, the AM compares them using the ”≤” relation in R. Then the AM selects the vector with maximum length. If there are several vectors with the same maximal length, then the according leaf which is the first in the tree will be chosen. This comparison strategy can be used if all of the the parameters have the same priority or if the applicant has no certain information about the priorities of the parameters. Comparison by priority Another strategy is to order the combination vectors by priority. Let’s consider the two combination vectors mlea f and vlea f of two different leaves:     m1 v1  m2   v2     mlea f =   ...  and vlea f =  ...  mn vn Then, mlea f ≤P vlea f if and only if m1 < v1 or m1 = v1 and m2 < v2 or ... m1 = v1 and m2 = v2 and ... and mn−1 = vn−1 and mn < vn or m1 = v1 and m2 = v2 and ... and mn−1 = vn−1 and mn = vn This means, the vector with the highest value in the first row is greater than the other vector. If the first value is the same in both vectors, the values in the second rows are decisive and so on. This comparison is useful if the applicant knows exactly which parameters are the most important ones. It is more restrictive than the first strategy and might allow better scheduling if the parameters are suited. Organic and real-time properties of the scheduling strategy The presented scheduling assigns the execution paths to the services and resources autonomously thus it is self-configuring. As the current parameter values of the resources are continuously refreshed by a monitoring unit, the scheduling is also self-optimizing since it includes these values. We also include self-healing since the dependent execution paths of a mission are rescheduled if a resource fails. The scheduling is also able to meet real-time constraints because the creation of the tree is interruptible which means that the AM is able to interrupt the tree computing at each point of time (if this is necessary). Then, it uses the existing tree to choose the best assignment of execution path and can use the execution time of the robots to complete the scheduling.

9

Conclusion and Further Work

In this paper, we presented a new idea to realize a real-time dependable organic middleware. We introduced the idea of execution paths and a real-time scheduling scheme by which the execution paths are assigned to services and resources, respectively. The scheduling scheme includes organic features like self-configuration, self-optimization and self-healing. We implement the presented middleware in the SIMON project at the University of Karlsruhe. As future work, we plan to evaluate the scheduling scheme in detail. Especially, we have to categorize the different kinds of parameters to get rules how to set the values in the parameter matrix to model the fabric automation scenario as well as possible. Acknowledgment The SIMON project is funded by the Landesstiftung Baden-Wuerttemberg.

References 1. R ICHARD J OHN A NTHONY, ”Emergence: a Paradigm for Robust and Scalable Distributed Applications”, Proceedings of the International Conference on Autonomic Computing (ICAC’04), 2004 2. A. B ECHINA , U. B RINKSCHULTE , F. P ICIOROAGA AND E. S CHNEIDER, ”OSA+ Real-Time Middleware. Results and Perspectives”, International Symposium on Object-Oriented RealTime Distributed Computing (ISORC), Vienna, Austria, 2004 3. C. B USCHMANN , S. F ISCHER AND N. L UTTENBERGER, ”Middleware for Swarm-like Collections for Devices”, IEEE Pervasive Computing Magazine, Vol. 2, No. 4, 2003 4. IBM, Autonomic Computing, http://www.research.ibm.com/autonomic/ 5. J. O. K EPHART AND D. M. C HESS, ”The Vision of Autonomic Computing”, IEEE Computer, pp. 41-50, 2003 6. M ATHIAS PACHER , A LEXANDER VON R ENTELN AND U WE B RINKSCHULTE, ”Towards an Organic Middleware for Real-Time Applications”, ISORC 2006, Ninth IEEE International Symposium on Object and component-oriented Real-time distributed Computing, Korea, 2006 7. F LORENTIN P ICIOROAGA, ”Scalable and Efficient Middleware for Real-time Embedded Systems. A Uniform Open Service Oriented Microkernel Based Architecture”, PhD thesis, Strasbourg, 2004 8. W OLFGANG T RUMLER , JAN P ETZOLD , FARUK BAGCI AND T HEO U NGERER, ”AMUN An Autonomic Middleware for the Smart Doorplate Project”, UbiSys ’04 - System Support for Ubiquitous Computing Workshop at the Sixth Annual Conference on Ubiquitous Computing, 2004 9. T HE SIMON PROJECT, University of Karlsruhe (TH), http://simon.ira.uka.de 10. Object Management Group: The common object request broker: Architecture and specification. Revision 3.0, July 2002 11. Sun Microsystems: Java Remote Method Invocation Specification. Revision 1.8, 2002 http://java.sun.com/j2se/1.4/docs/guide/rmi/ 12. G. E DDON AND H. E DDON, ”Inside Distributed COM”, Microsoft Press, 1998 13. M ICROSOFT C ORPORATION, The .Net framework, http://www.microsoft.com/net/default.mspx 14. VDE/ITG (E DITOR ), ”VDE/ITG/GI-Positionspapier Organic Computing: Computer und Systemarchitektur im Jahr 2010”, GI, ITG, VDE, 2003 15. R ANDALL W HITAKER, ”Self-Organization, Autopoisesis, and Enterprises” http://www.acm.org/sigs/sigois/auto/Main.html