Distributed Discrete Event Simulation: Optimistic ...

Distributed Discrete Event Simulation: Optimistic Protocols With Probabilistic Time Window Adaptation (Summary)

Johannes Luthi Diplomarbeit zur Erlangung des akademischen Grades Magister rerum naturalium an der Formal- und Naturwissenschaftlichen Fakultat der Universitat Wien Begutachtung: Univ. Prof. Dr. Gunter Haring Betreuung und Vorbegutachtung: Dr. Alois Ferscha 1 Introduction

There are multiple motivations to use computer simulation. Many models of physical systems are of a complexity hindering analytical evaluation. Often, experiments in physics are not feasible because they would be too dangerous or too expensive. Sometimes, such experiments are simply impossible, because they would take too much time, or (e.g. for weather prediction or astrophysical `experiments') the hole world would have to be the laboratory. The use of simulation for decision making in competitive business enables cheap and riskless trial and error experiments trying various parameters for e.g. work ow organization or resource management [Thes 89, Thes 90, Kivi 91]. But of course, using simulation for experiments, prediction methods, or decision making is not without time consuming challenges itself. First, it requires great skills | concerning knowledge about the simulated model and experimental techniques | to gain advantages from simulation. Among the factors that play an important role for successful application of simulation, Page and Nance identify the following [Page 94]: (i) an adequate understanding of the problem to be solved, (ii) a correct model, (iii) a correct program, (iv) experimental design, and (v) interpretation of results. Second, simulation experiments can take exceedingly long time to execute. As many simulations imply partially random behavior, for statistical reasons, many simulation runs may be necessary to obtain experimental results with suciently small con dence intervals. Furthermore, many simulation experiments are performed with the objective to search large parameter spaces by doing multiple simulation runs with every parameter constellation. But even single simulation runs for complex models may exceed the time and memory resources of contemporary computers. One approach to overcome simulation problems concerning computer resource restrictions is to use computer systems with multiple processors operating in parallel. For the execution of multiple simulation runs with varying input parameters for one simulation model, the use of multiprocessor computers is straight forward. But also concerning single simulation runs, for many models consisting of suciently divisible components operating in parallel in the real system (e.g. machines in a factory, airplanes in air trac simulation, components of a computer network), it seems to be obvious that it should be possible to exploit this inherent model parallelism to make the use of multiprocessor environments eective. This kind of model decomposition (distributed discrete event simulation | DDES) directly leads to the concept of logical processes (LP) in parallel simulation. However, the parallel execution of logical processes being dependent from each other yields a nontrivial problem to be solved for DDES: the synchronization of logical processes being simulated on distinct processors. Synchronization mechanisms have to be employed to uphold the causality principles applying in sequential simulation (and the real world, too). Two mainstreams of synchronization protocols for DDES have been established: (i) the conservative simulation protocol [Chan 79, Chan 81] and (ii) the optimistic or time warp simulation protocol [Je 85b]. Both are subject to certain advantages and disadvantages and for both, many enhancements and optimization techniques have been developed. One technique to combine advantages of both, the conservative and the optimistic protocols, is the use of optimistic simulation with time windows. Two new algorithms employing probabilistic time window adaptation are presented in this work. The thesis is organized as follows: Chapter 1 gives an introduction to the problem and an overview of the thesis. In Chapter 2, the technique of discrete event simulation (DES) is explained as opposed to continuous simulation. The various simulation model decomposition possibilities for parallel DES (PDES) are presented, and the decomposition 1

de nitions for DDES and the problems concerning causality constraints in DDES are discussed. In Chapter 3, the two established synchronization protocols, the conservative (Section 3.1) and the optimistic (Section 3.2) protocols, are presented including several optimization techniques. Especially detailed consideration is tributed to approaches for optimistic simulation employing time windows such as moving time windows [Soko 88, Soko 89, Soko 90], ltered rollback [Luba 89a, Luba 91], adaptive time warp (ATW) [Ball 90], and the local adaptive protocol [Hamn 94], as this is relevant previous work related to our new proposed approaches. At the end of this chapter, some additional research activities (such as specialized hardware for optimistic simulation) and ideas (e.g. DDES using time parallelism) are discussed. In Chapter 4, the notion of probabilistic distributed simulation employing selfadaptive LPs as proposed by Ferscha and Chiola [Fers 94b] is presented. The concept of a decision function (DF) for probabilistic simulation is de ned. Two concrete DFs are proposed in Chapters 5 and 6. The rst one uses an estimate for the distribution of virtual time increments belonging to messages of the message arrival process. Two variants of this approach are proposed. The second DF constructs a probabilistic cost expectation function (PCEF) for a rollback cost model employed by the DF. The objective of the pertinent synchronization protocol is to combine the conservative and the optimistic protocols in a way such that this PCEF is minimized. In Chapter 7, the performance of these two protocols in comparison to the original optimistic synchronization protocol with unlimited optimism on the Intel iPSC/860 multiprocessor environment is presented in the context of simulation of generalized stochastic Petri nets (GSPNs) with several example Petri nets (Appendix A gives a short introduction to the notion of Petri nets and also presents the examples used for testing purposes). Conclusions are drawn in Chapter 8. Appendix B explains some additional technical details about the implementation of the experimental programs. In the following, we summarize each chapter of the thesis (excluding the introduction) in a separate section using the titles of the original chapters.

2 Distributed Discrete Event Simulation 2.1 Discrete Event Simulation

Most simulation studies model the changes of a real system during time. We refer to the simulated time as virtual time. As opposed to continuous simulation, in discrete event simulation, state changes of the simulated system

are assumed to happen at discrete points of the virtual time and are thus controlled by uncontinuous functions, resulting in so-called events (e1 ; e2; : : :). Usually a virtual time stamp te is associated with an event e, denoting the virtual time at which e \happens". The occurrence of an event typically causes four actions: (a) progression of the virtual time to the timestamp of the simulated event, (b) changes of the state of the simulated system, (c) scheduling of new events, (d) descheduling of other events. Thus, the basic data structure of a DES program consists of: A virtual simulation clock: V T. A timestamp ordered list of pending (scheduled) events: the event list. The state variables. The simulation of an event consists of the following steps: 1. Determine the rst event ei (the event with the smallest timestamp) of the event list. 2. Set the virtual clock V T to the timestamp tei of ei . 3. Change the state variables according to the eects of ei . 4. Remove ei from the event list. 5. Eventually schedule new events and insert them in the event list. 6. Eventually deschedule previously scheduled events and delete them from the event list. As no event is allowed to have an impact on any event in the past, always simulating the event with the smallest timestamp guarantees causality in sequential DES. These simulation steps are repeated until there are either no more scheduled events in the event list, or the timestamp tef of the rst event ef in the event list holds: tef > V T end, where V T end is the virtual termination time of the simulation. Examples for DES are air trac control, military battle eld simulations, or performance analysis of computer systems or parallel programs.

2.2 Parallel Discrete Event Simulation

DES of large models can be enormously time as well as memory consuming. As there seems to be quite a big amount of inherent parallelism in most DES models [Righ 89, Lin 92], it is a challenging task to nd this parallelism and use it to eciently implement DES on multiprocessor computer systems to increase the performance of DES runs. Methods to use multiprocessor architectures for DES are summarized as parallel discrete event simulation (PDES). 2

Figure 1. Logical processes for DDES.

There are multiple approaches how to apply parallelization on DES implementations [Kaud 87, Righ 89] depending on which level the sequential simulation algorithm is parallelized: A parallelizing compiler can be used. Independent simulation runs can be performed on separate processors. Subroutines and functions can be computed on separate processors. A main processor can maintain a global event list and employ the other processors to simulate single events. The simulation model can be decomposed into submodels that are simulated on distinct processors. Any combination of the these approaches can be applied. In the thesis, there is a short discussion of each of these possibilities. In this summary we focus on parallelism at model function level, i.e. model decomposition, which is referred to as distributed discrete event simulation (DDES), which is the main focus of the thesis.

2.3 Distributed Discrete Event Simulation

In DDES, the simulation model is partitioned into regions1. Each region is simulated by a so-called logical process (LP). As depicted in Figure 1, each LP LPi consists of [Fers 95]: A spatial region Ri of the simulated system. A simulation engine SEi , executing the events belonging to the region Ri . A communication interface, enabling the LPs to send messages to and receive messages from other LPs. These LPs are mapped onto distinct processors with (as an assumption) no common memory. Thus, every LP can only access a subset of the state variables Si S, disjoint to state variables assigned to the other LPs. The simulation engine SEi of each LP LPi processes two kinds of events: internal events which have no direct causal impact on the state variables held in other LPs and external events that have to change the state variables in one or more other LPs. If an external event is processed, the LP holding the state variables that are to be changed is informed through a message sent by the CI of LPi . The message routing between the LPs is done by a communication system, connecting the LPs. Incoming messages are stored in input queues IQi;j , one for each sending process. Due to dierent virtual time progression within the various LPs, the causality principle is dicult to be guaranteed and special considerations have to be made to obtain the same simulation results from DDES as from sequential DES. Subsection 2.3.2 of the thesis presents a simple example that shows the need for synchronization mechanisms for DDES. The two most commonly used synchronization protocols in DDES are the following: The conservative (or Chandy-Misra) synchronization protocol developed by Chandy and Misra [Chan 79, Chan 81]. The optimistic (or time warp) simulation protocol based on the virtual time paradigm proposed by Jeerson [Je 85b].

3 Established Synchronization Protocols 3.1 The Conservative Protocol

The basic idea of the conservative simulation protocol is to absolutely avoid the occurrence of causality violations. This is done by only allowing an event e with VT te to be processed if it can be guaranteed that no message with VT lower than te will be received in the future. Under the assumption of FIFO2 message transport, this is achieved 1 2

Only spatial decomposition is considered, time-regions are brie y discussed in Section 3.3.3 of the thesis. First In First Out.

3

(a)

(b) Figure 2. (a) Data structure for optimistic simulation, (b) Message processing in optimistic simulation.

by only simulating an event if its VT is lower than the minimum of the timestamps of all events in all input queues. A more detailed formal description of the corresponding implementation rules can be found in the thesis. A serious problem arising in conservative simulation is the possibility of deadlocks [Holt 72] (an example for a deadlock situation is depicted in Figure 3.1 of the thesis). Two deadlock resolution schemes | deadlock avoidance via the use of NULL-messages [Chan 79] and deadlock detection and recovery [Chan 81] | are discussed in the thesis. In Section 3.1.2, several optimization techniques for the conservative simulation protocol are discussed, such as e.g. the carrier NULL-message approach [Cai 90, Wood 94], NULL messages on request [Bain 88], lookahead computation [Grov 88, Cota 90], and local deadlock detection [Prak 88, Ruko 91].

3.2 The Optimistic Protocol

In contrast to the conservative protocol, there is no blocking mechanism in the optimistic protocol. An event is simulated even if it is not safe to process. Thus, causality errors are allowed to occur, but they have to be detected and corrected. To guarantee causality, a mechanism called timewarp or rollback is implemented. At a rollback, the VT of the LP LPrb , that receives a message from its virtual past (i.e. tm < V TLP ), is set back to the timestamp tm of that message. Such a message, causing a timewarp, is called a straggler message. Additionally, the eects of all events e with timestamp te > tm , that have been processed in LPrb , have to be undone. These eects include (a) changes of the state variables, (b) changes of the event list, and (c) messages sent to other LPs. To allow recovery of the state variables and the event list, a copy of these is stored after the processing of every event. Sent messages are canceled by sending a corresponding anti-message for every sent message with timestamp greater than tm . To enable these corrections, the data structure for optimistic simulation has to extended as depicted in Figure 2a. The processing of incoming messages and anti-messages with respect to the VT of the message is illustrated in Figure 2b. The main problems of optimistic simulation are (a) the memory cost because the simulation state has to be stored frequently to enable timewarp, (b) global termination detection, and (c) rollback cascades, because sent anti-messages may cause rollbacks in other processes and so forth. Thus, in Section 3.2.2, several optimization considerations are discussed. This includes enhancements in memory management, such as considerations on the state saving frequency [Clea 94, Ronn 94], fossil collection, cancelback protocols [Je 85b, Gafn 88], and arti cial rollback [Lin 91c], techniques for global virtual time computation [Chan 82, Chan 85, Bell 90, Bald 91, Matt 93, DSou 94], techniques to reduce rollback cascades, such as lter [Prak 91], WOLF calls [Madi 88, Madi 89], or hierarchical rollback [Gima 89], and techniques to avoid multiple computation of the same results due to rollbacks, such as lazy cancellation [Gafn 88] and lazy re-evaluation [Fuji 90a]. Special attention is paid to optimistic protocols with time windows (Section 3.2.3). Basically, these are techniques to bound the local VT progression leading to limited optimism. The strategies discussed in the thesis include moving time windows [Soko 88, Soko 89, Soko 90], ltered rollback [Luba 89b, Luba 89a, Luba 91], adaptive time warp concurrency control (ATW) [Ball 90], and the local adaptive protocol [Hamn 94].

4 An Adaptive Time Window Approach: The Probabilistic Protocol

Chapter 4 of the thesis discusses the problems with both conservative and optimistic protocols along the lines of a small but illustrative example and proposes the potential bene ts of a hybrid approach with limited optimism. 4

The general concept of a probabilistic decision function (DF) is presented and formally de ned as follows: De nition 1 (Probabilistic Decision Function (DF)) Let sclock denote the CPU-time elapsed since the start of the simulation run. Let H be the set of all possible simulation histories and let E be the set of all possible events. Let nally (e) denote the VT timestamp of a given event e. A DF is de ned as a function + DF : IR E H ?! f0; 1g 0; if the simulation is to be blocked for one step DF (sclock ; e; h) = 1; if the event e is to be processed immediately which is monotonic decreasing with respect to the VT timestamps of events, i.e.

8 sclock 2 IR+ ; e1 ; e2 2 E; h 2 H : DF(sclock ; e1; h) = 0 ^ (e1 ) < (e2 ) =) DF(sclock ; e2; h) = 0 and deadlock free, i.e.

8 h 2 H; e 2 E there 9sproc 2 IR+ such that DF(sproc ; e; h) = 1 Note that this concept includes pure conservative as well as pure optimistic simulation, but additionally allows implementation of various hybrid approaches depending on the strategy de ned by the implemented decision function. Two strategies based on dierent aspects of the simulation- and rollback-history are presented in the following sections.

5 VT of Next Message Distribution Estimation

One approach proposed in this thesis is to consider the VT increment between succeeding messages received at a LP as a random variable with a certain distribution. Section 5.1 of the thesis gives the motivation for this approach. Parametric as well as non-parametric density estimation is considered. If there is reason to assume a speci c type of distribution such as e.g. normal distribution this rises the need for parameter estimation. Incremental parameter calculation without and with application of exponential smoothing techniques is discussed in Section 5.2 and is applied to the illustrative example of triangular density functions (Subsection 5.2.3). Section 5.3 discusses histogram-based non-parametric density estimation for the message VT increment distributions. Let ti be the VT of the ith received message and let e = te ? tn be the VT dierence between the VT te of the next event e and the timestamp tn of the last received message mn . We assume that we have got a series of functions (n) prb : IR ! [0; 1]; n = 0; 1; 2; : :: (0) prb (e) = 0; 8e 2 IR (n) prb (e ) = P(tn+1 ? tn < e); n > 0; e 2 IR estimating the probability that the event e with VT increment e to the last received message mn has to be rolled back due to the arrival of the next message mn+1 with VT tn+1 . Because we use the probability that the next message will cause a rollback, it only seems to make sense to block until the receipt of the next message. This is a possible source for deadlocks, because there will not necessarily be a next message3. Thus, we additionally bound the CPU-blocking time by the average CPU-time interval between the receipt of two succeeding messages. To decide for the best trade-o between blocking and rollback risk, we need to know the average cost (in real clock time) of a rollback. As not only the rollback operation itself but also the time spent with simulation of events that have to be rolled back is unproductively spent time, the average rollback cost is de ned as: time for rollbacks total time for event simulation total number of rolled back events crb = total + : number of rollbacks total number of simulated events number of rollbacks

(1)

We denote the real clock time of the receipt of message mi by smi , the real clock time at the time of the decision by sclock , and the average real clock time between the receipt of two succeeding messages after n messages by: run time sm(n) = total program : n

Thus, the maximum real clock blocking time before simulation of an event e is (see Figure 3): (n) sblock (e) = smn + sm(n) ? sclock :

3

Especially if other LPs are blocking as well.

5

( n) sblock

smn

sclock Figure 3. Maximum real clock blocking time. cpu t cost

Ecost ( sopt ) = min! e

smn + sm(n )

e Ecost ( s)

id ( s) = s

cpu prb ( ∆ clock + s, ∆ vte ) crb

sopt

Figure 4. Constructing the probabilistic cost expectation function (PCEF).

Simulation is blocked whenever

(n) (n) crb prb (e ) > sblock (e); and (2) (n) (3) sclock < smn + sm : If inequality (2) holds, the expected value of unproductively spent time without blocking is higher than the potential blocking time, while (3) guarantees the absence of deadlocks.

6 Adaptive Minimization of Rollback Costs

While the approach presented in Section 5 basically uses univariate densities to obtain rollback probabilities, the probabilistic cost expectation function (PCEF) proposed in this section tries to estimate rollback probabilities on the two-dimensional space spanned by the VT and the CPU-time increments between an event and the last received message. Let vt e denote the VT increment between the event e which is to be simulated next and the VT timestamp of the last received message. Furthermore, let cpu e be the CPU-time elapsed between the receipt of the last message and the wall clock time at which e is to be simulated. Let cpu the last message arrival. clock be the CPU-time since cpu vt vt Assuming a function prb (cpu e ; e ) describing the probability that an event e with (e ; e ) will have to be rolled back and the average CPU-time cost for a rollback crb (as de ned in equation (1)), we construct a PCEF depending on the varying real clock blocking time s as follows: e (s) = s + prb (cpu + s; vt) crb Ecost e clock As depicted in Figure 4, the expected cost (i.e. unproductively spent time) by blocking for s seconds consists of the blocking time s plus the average faulted time per rollback weighted with the rollback probability under the e (sopt ) = min! and conditions of the current event simulation. If we nd the optimal blocking time sopt with Ecost block for sopt CPU seconds before simulating the event e, this should minimize the expected value for unproductively spent time. (n) Estimates for the rollback probabilities prb (e ) are obtained in the following way: the VT{CPU-time grid is partioned into rectangles (in this work these rectangles are all of equal size; however, in [Fers 94c]4, Ferscha and Luthi propose a version of the PCEF approach using logarithmically spaced grids). Each simulated event is stored with an additional signature providing the information to which time rectangle it belongs. If an event is rolled back, the corresponding counter is incremented. The ratio of simulated and rolled back events for each VT/CPU-time rectangle is used as an estimate for the corresponding rollback probability in that region.

7 Experimental Results and Comparison of the Probabilistic Approaches

As an example for DES, the distributed simulation of timed transition Petri nets (TTPNs) is considered. A short introduction to the notion of TTPNs is given in Appendix A of the thesis. The implementation of the various

4 To be consistent with the bibliography of the thesis, the original reference was used in this summary. However, this paper is meanwhile published as: A. Ferscha and J. Luthi. \EstimatingRollback Overhead for Optimism Control in Timewarp". In: Proceedings of the 28th Annual Simulation Symposium, Phoenix, AZ, USA, April 9{11, 1995, pp. 2{12, IEEE Computer Society Press, April 1995.

6

Figure 5. Machine/repair model (\simple"-net) partitioned in two LPs.

= 1, t = 2

= 1=16, t = 16

12

30

10

25

8

20

6

15

4

10

2

5

0

0

evt

comm

rb

block

idle

=1 -net (left) and simple=1=16 -net (right). Figure 6. Performance results for simplet=2 t=16

simulation protocols described in this work (including the original timewarp DDES protocol without optimism control) has been done on a distributed memory multiprocessor architecture (a Hypercube iPSC/860 with 16 nodes) following the performance comparable implementation design proposed in [Chio 93b]. This implementation design provides a \fair" performance comparison of the candidate protocols. Several Petri nets of varying complexity have been used for testing purposes. In this summary, we only present results obtained from simulation of the simplest net, depicted in Figure 5, modelling a failure/repair system. It consists of two places and two transitions. The net is partitioned into two LPs, each of them simulating one transition and one place. Two important aspects can easily be controlled by two parameters of this net: the number of tokens t in the net controls the potential parallelism whereas the ratio 2 of the ring times of the two transitions controls the workload balance of the two LPs. The experiments were done for all combinations of the following parameter sets: t = f1; 2; 4; 8; 16g and 2 = f1; 1=2; 1=4; 1=8; 1=16g. The following protocols and its variants have been tested (the respective abbreviations used in the gures are given in parenthesis): Original time warp simulation without optimism control (opt). Estimation of VT increments between messages using triangular density functions { without geometric smoothing (tri1:0). { using geometric smoothing with parameter = 0:9 (tri0:9). Non-parametric estimation of VT increments between messages { using only a 10 data point grid for density estimation (num10). { using a 50 data point grid for density estimation (num50). PCEF approach { using a data grid with 20 points on the CPU-time axis and 5 points on the VT axis (cost20 5). { using a data grid with 40 points on the CPU-time axis and 25 points on the VT axis (cost40 25). =1=16 =1 In this summary, only results for the simplet=2 -net and the simplet=16 -net are presented to illustrate the in uence of the two parameters parallelism and imbalance.

7

RB+faulted-time ratio LP1

LP2

0.6 0.5 0.4 0.3 0.2 0.1 0 l=1

RBs / committed event

Average RB distance 3.2 2.7 2.2 1.7

l=1/2

l=1/4

l=1/8

1.2 l=1

l=1/16

0.55

2.7

0.45

2.2

0.35

1.7

0.25

1.2

0.15 l=1

l=1/2

l=1/4

l=1/8

0.7 l=1

l=1/16

opt

tri 1.0

tri 0.9

num 50

l=1/2

l=1/4

l=1/8

l=1/16

l=1/2

l=1/4

l=1/8

l=1/16

num 10

cost 20

cost 50

Figure 7. Performance results for simulation of simplet=16 -nets (i.e. high degree of potential parallelism): Time spent with rollbacks or in faulted state ( rst column), number of rollbacks per committed event (second column), and average rollback distance (third column). The rst row depicts results for LP1 , results for LP2 are depicted in the second row. F(vt) = P(vt