Time Slack-Based Techniques for Robust Project Scheduling Subject to Resource Uncertainty Abstract The resource-constrained project scheduling problem (RCPSP) has been the subject of a great deal of research during the previous decades. This is not surprising given the high practical relevance of this scheduling problem. Nevertheless, extensions are needed to be able to cope with situations arising in practice such as multiple activity execution modes, activity duration changes and resource breakdowns. In this paper we analytically determine the impact of unexpected resource breakdowns on activity durations. Furthermore, using this information we develop an approach for inserting explicit idle time into the project schedule in order to protect it as well as possible from disruptions caused by resource unavailabilities. This strategy will be compared to a traditional simulation-based procedure and to a heuristic developed for the case of stochastic activity durations. Keywords: robust scheduling, project scheduling, resource breakdowns, proactive, time buffering

1

Introduction

Most of the research in project scheduling deals with the generation of an initial project schedule (baseline schedule) in a static and deterministic environment with complete information. For an extensive overview we refer to Brucker et al. (1999), Herroelen et al. (1998) and Demeulemeester and Herroelen (2002). Unfortunately, these underlying assumptions simply do not always hold in practice. In the real world, a project manager often has to deal with a stochastic and dynamic scheduling environment. He has to protect the initial baseline schedule from the adverse effects of possible disruptions because often project activities are subcontracted or executed by resources that are not exclusively reserved for the current project. A change in the starting times of such activities could lead to additional costs due to required subcontractor flexibility and due to schedule nervousness. A possible measure 1

for the deviation between the initial schedule and the realized schedule is the weighted instability cost. It can be calculated by taking the sum of the expected weighted absolute deviations between the planned and the actually realized activity starting times. The weight wi , assigned to each activity i, reflects that activity’s importance of starting it at its planned starting time in the initial schedule. More specifically, wi denotes the marginal cost of deviating from the planned starting time of activity i during project execution. The instability weight can be quantified using for example the computer supported risk management system by Schatteman et al. (2008). Recent research by Leus (2004), Herroelen and Leus (2004), Leus and Herroelen (2004) and Van de Vonder et al. (2005, 2006, 2007b and 2008) considers the weighted instability cost objective for the case of project scheduling with stochastic activity durations. Other possible causes for uncertainty in project execution might be, amongst others, project content changes, bad weather conditions or unavailability of resources. In this paper we study the last of these possible causes. Resource breakdowns have been cited by numerous authors as one of the most important sources of disruptions in practical project management (see amongst others Yu and Qi (2004)). We only consider renewable resources. This means that each resource type k (k = 1, ..., R) is modeled as a set of individual resource units. In the deterministic case these resource units are assumed to be available throughout the project on a period-per-period basis. In the stochastic case, on the other hand, breakdowns may occur. Whenever a resource unit breaks down, it has to be repaired before it becomes available again. The time between the end of a repair period and a new failure for resource unit m of resource type k is modeled by means of a stochastic variable Xmk . The time needed to repair a resource unit m of type k is also represented by means of a stochastic variable Ymk . In order to cope with uncertainty, one has several options at one’s disposal. Proactive scheduling focuses on the construction of predictive schedules that use statistical knowledge of the uncertainties with the aim of increasing schedule robustness. A schedule is considered to be robust if it can absorb anticipated disruptions without affecting planned external activities while maintaining high shop performance (O’Donovan et al., 1999). Robustness can be induced by allocating extra resource capacity and/or execution time to each activity so that its execution uncertainty can be compensated to a certain extent without the need for rescheduling. Our proactive approach is depicted in Figure 1. 2

——————————— Insert Figure 1 ——————————— First, one has to decide whether to schedule the project using the maximal, deterministic resource availability ak or to use a buffered availability a∗k . This buffered availability can be calculated by taking the expected value of the steady state availabilities. The idea being that resource breakdowns can be compensated to a certain extent by not fully utilizing the available capacity, effectively using resource redundancy. Next, an initial schedule is generated using either an optimal approach for minimizing the project makespan or by scheduling activities having a high Cumulative Instability Weight (CIW) as early as possible in time. The aim of the latter approach, that we adopt from Lambrechts et al. (2008), is to reduce the probability that high-impact activities are delayed due to disruptions of activities taking place earlier in time. The CIW of an activity can be calculated by taking the sum of its instability weight and the instability weights of all of its successors, immediate as well as transitive. After constructing an initial schedule S u using one or more of the procedures we just introduced, time buffering can be added to this initial schedule. Our aim will be to improve the robustness of S u by inserting explicit idle time into the project schedule. The inclusion of slack time in front of activities allows the schedule to absorb potential disruptions caused by earlier resource breakdowns and the resulting activity shifts. Our objective is to insert a time buffer of size Bi in front of the starting times si of each activity i so that the expected instability costs are minimized without exceeding the project deadline. The next section will briefly introduce our scheduling problem together with a number of definitions and concepts that will be of importance in the remainder of the paper. The assumptions made in Section 3 allow us to analytically translate resource breakdowns into activity duration increases as we will see in Section 4. This information will be used in Section 5 to develop an approach for strategically inserting explicit idle time in the schedule. In a computational experiment this approach is compared with a simulation-based approach and with a dedicated approach for minimizing the instability costs in case of stochastic activity durations. The results of this experiment are given in Section 6. Note that all of these approaches assume that we know the distribution of the times to failure and the repair times. If this is not the case, one can resort to robust scheduling techniques that do not exploit this information but that are consequently also less efficient and effective. For an overview of proactive strategies based on the free slack measure we would 3

like to refer the interested reader to Lambrechts et al. (2007b). Unfortunately, no matter how much care is taken in constructing a proactive schedule, disruptions can never be totally prevented. In case an activity is delayed due to for example an unforeseen resource breakdown, the schedule may become infeasible. A reactive procedure must then be used to repair the schedule. The aim of this reactive procedure is to restore schedule feasibility in such a way that some objective function (such as the deviation from the baseline schedule) is optimized. However, here, we only focus on the construction of a robust predictive schedule. For more information regarding the reactive phase, we would like to refer the interested reader to Van de Vonder et al. (2007a) for the stochastic duration case and to Lambrechts et al. (2007a) for the stochastic resource availability case.

2

Problem Statement

The aim of the proactive baseline scheduling problem is to generate a project schedule that is feasible as well as robust. We represent the project using the activity-on-node representation: the digraph G = (N, A) contains a set of nodes N and a set of arcs A. The nodes represent the activities constituting the project whereas the arcs represent the finish-start, zero-lag precedence relations. Whenever (i, j) ∈ A we say that activity i (i = 1, ..., n) is an immediate predecessor of activity j, implying that activity j cannot start before activity i has finished: ∀(i, j) ∈ A

s i + di 6 s j

(1)

with si representing the starting time of activity i and di the deterministic duration of activity i. For ease of reference, Table 1 is included summarizing the notation that will be used throughout this paper. ——————————— Insert Table 1 ——————————— In resource-constrained project scheduling we also have to take the renewable resource constraints into account. As we indicated in Section 1, we assume that a finite amount ak of each resource type k is available on a period-per-period basis. Resource feasibility then implies that for each time period t and for each resource type k the sum of the resource requirements rik of the activities that are in progress during period t (Busyt ) cannot exceed

4

the availability ak : X

rik 6 ak

∀t, ∀k

(2)

i∈Busyt

Finally, a given project deadline δ has to be respected: sn 6 δ

(3)

with n the dummy end activity having a duration and a resource usage equal to 0 and representing project completion. Our objective then becomes the generation of a baseline schedule (s1 , ..., sn ) respecting constraints (1), (2) and (3) and minimizing the expected instability costs: X wi |E(Si ) − si | (4) i∈N

The real starting times Si in Equation 4 are stochastic variables that depend on the realization of the stochastic resource availabilities (Akt ), on the planned starting times (si ) and on the reactive policy that is used to repair a disrupted schedule. We assume that a railroad scheduling approach is used, meaning that activities are never started before their planned starting time (Si > si ). Lambrechts et al. (2007a) describe exact and suboptimal reactive scheduling procedures. The exact procedure relies on branch-and-bound while the heuristics involve tabu search and a so-called scheduled order list procedure. In this paper, we restrict ourselves to the scheduled order list procedure to deal with schedule disruptions. This policy reschedules activities in the order they were started in the baseline schedule while respecting precedence and resource constraints. In case an activity has to be interrupted due to a resource breakdown, it will either have to be restarted from scratch (preempt-repeat) or it will simply be resumed from the point where execution was halted (preempt-resume).

3

Assumptions

In this paper, we focus on the construction of a robust baseline schedule by inserting explicit idle time into the project schedule. In order to be able to do this in an efficient way, we need a more thorough understanding of the nature of resource breakdowns and their impact on the duration of the disrupted activity. Before doing so, we need to state some assumptions that allow us to determine this impact: 5

Exponentially distributed times to failure and repair times We chose to model times to failure and repair times using the exponential distribution. This choice is supported by empirical evidence as well as by mathematical arguments (Barlow and Proschan, 1996). A further advantage of using the exponential distribution is that it is unambiguously defined by its expected value. This means that we only need to know the mean time to failure and the mean time to repair for each resource type k (M T T Fk and M T T Rk ) to know the failure and repair distribution function. The formulas that will be derived in the following section can easily be adapted to deal with non-exponentially distributed repair times. For the times to failure, however, this modification is not so straightforward. Nevertheless, we are confident that the use of exponentially distributed times to failure does not pose a serious limitation to the use of our approach in practice. Resource allocations are fixed in advance The baseline schedule determines the starting time of each activity and consequently also the amount of resource units of each resource type required in each time period. The project manager can decide to allocate specific resource units to individual activities in advance. There are two advantages to this choice. First of all, rescheduling becomes far easier as the affected activities simply have to be right-shifted to restore schedule feasibility. Secondly, fixing resource allocations enables us to predict the impact of resource breakdowns on activity duration increases. We will illustrate these two observations by means of a small example. Suppose that we wish to plan the project depicted in Figure 2 consisting of eight real activities (activity 1 being the dummy start activity and activity 10 the dummy end activity). Above each activity we indicate its duration, its resource requirement of a single renewable resource type with a per period availability equal to eight and its instability weight. The project deadline is equal to 18 time units. Finally, we assume that each unit of the considered resource type has a mean time to failure equal to 15 and a mean time to repair equal to 3. ——————————— Insert Figure 2 ——————————— Consider the minimal makespan schedule for this project that is given in Figure 3 for the case in which resource allocations are fixed. ——————————— Insert Figure 3 ——————————— The horizontal bands represent the allocation of each resource unit. Re-

6

source unit 5, for example, is allocated to activity 2 in time interval [0, 2], to activity 7 in interval [7, 13] and to activity 9 in interval [13, 15]. For the remaining time periods it is idle. Imagine that we are interested in the impact of a resource breakdown on activity 4. In case resource allocations would be free, a number of things could happen. If only one unit would break down during the execution of activity 4 nothing would happen as sufficient idle resource capacity is available to deal with this disruption. In case more units would break down, activity 3 and/or 4 would have to be preempted depending on the total magnitude of the breakdown and the best rescheduling action. It is therefore not hard to see that it is practically impossible to predict the impact of a resource breakdown on the effective duration of an activity without fixed resource allocations. The case of fixed resource allocations is far easier to analyze. Activity 4 would only have to be preempted in case resource units 1, 2, 3 and/or 4 break down while being used by activity 2. Imagine the situation depicted in Figure 4. We see that resource unit 4 experiences a breakdown after 5 time periods (X41 = 5) and that its repair takes 2 time periods (Y41 = 2). Resource allocations are fixed, meaning that only the activity that is using resource unit 4 between time points 5 and 7 is affected. The result will be a preemption of activity 4 until time point 7. In case of preempt-repeat, this means that activity 4 will have to be restarted from scratch after the resource unit is repaired as is shown in the figure. Preempting activity 4 results in the postponement of activities 6, 7, and 9, resulting in a huge increase of the instability costs. ——————————— Insert Figure 4 ——————————— Summarized, this means that in case activity i requires ri resource units, only these specific resource units will be used for executing i and a breakdown of one or more of these units translates directly into an interruption of i. This example also immediately shows the main drawback of fixing resource allocations on rescheduling flexibility. This flexibility will be lost because idle resource units, nor resource units allocated to less important activities, can be used to overcome a possible resource shortage for activity i. In our example, the duration increase of activity 4 due to the breakdown of resource unit 4 could easily have been prevented by replacing it with resource unit 5 in case resource allocations were free. However, omitting this assumption renders the calculation of expected duration increases practically impossible because of the difficulties of predicting the rescheduling action that will be taken whenever a resource unit breaks down. Because this rescheduling action will depend on the state of the schedule at the disruption time, hoping to predict 7

it in advance seems unreasonable. Both failure and repair time distributions are identical for each resource unit m of a resource type k This assumption actually implies that we consider each resource unit of a given resource type to be identical with respect to its failure and repair characteristics. This is in no way restrictive as non-identical resource units can (and should) be modeled as separate resource types. Failure and repair times are mutually independent The mutual independence of failure times and repair times is often assumed in theory, but is unfortunately not always realistic in practice. It is likely that severe problems causing a resource breakdown happen less frequently, but that they require longer repair times. This would imply a positive correlation between interfailure times and repair times and thus a violation of the assumption. Nevertheless, the assumption is needed here to be able to obtain our theoretical results. Resource units can only break down when they are being used In case resource units can also break down when not in use (e.g. a worker gets ill during the weekend), it becomes very difficult to link resource outages to effective activity duration increases because activities may have to be postponed because of breakdowns before they are even started. Furthermore, the possibility of a second resource unit used by the considered activity breaking down while the first unit is still being repaired renders the translation of resource breakdowns into activity duration extensions intractable because of the number of scenarios one has to condition upon. Therefore, we assume that resources only break down when in use.

4

Breakdown Process

In this section, we will show how resource breakdowns affect an activity’s real duration under various scenarios. Throughout this analysis, we use the assumptions introduced in the previous section. The preempt-repeat case is studied in Section 4.1, the preempt-resume case in Section 4.2. In each section, we first analyze the impact of resource breakdowns on an activity’s

8

duration in case only one resource type is used to execute that activity. This analysis will then be extended to deal with multiple resource types.

4.1 4.1.1

Preempt-Repeat Single resource type

Because of interruptions such as resource breakdowns, the real duration of activity i becomes a stochastic variable Di consisting of a deterministic part di , corresponding to the duration of the activity when no interruption occurs and after which i is terminated, and a stochastic part σi , corresponding to the total failed execution time (i.e. not resulting in activity completion) Xi together with the total repair time Yi . If we denote the length of the r’th failed execution or repair time as Fir , respectively Rir , and the number of interruptions as Ni , we can define: Di σi Xi Yi

= = = =

di + σi Xi + Yi Fi1 + ... + FiNi Ri1 + ... + RiNi

(5) (6) (7) (8)

We can calculate the expected value of σi as follows: E[σi ] = E[Ni ]E[Fi ] + E[Ni ]E[Ri ]

(9)

because we assume that the interfailure times are independent from the repair times. Note that this does unfortunately not hold for the interfailure times and the number of interruptions, because whenever the probability of large interfailure times is relatively high, fewer interruptions will occur on average. Nevertheless, simulation results obtained from an experiment in which the average duration increase was calculated for a wide variety of problem parameters (resource usage, activity duration, failure and repair time distribution) show that assuming independence in our equation does not significantly alter the results. In case we only consider one resource type, the time to restart execution of an interrupted activity i is equal to the time to repair a resource unit used by that activity i, i.e. Ri = Y . Note that this does not hold for Fi . Fi differs from X in two important respects. First of all, whereas X represents the time to failure of a single resource unit, Fi represents the time to interruption of 9

the activity i using ri resource units. This distinction is important because i is supposed to be interrupted as soon as one of the resource units used to execute i breaks down. Secondly, whereas X is able to take on values larger than di , this would clearly not make any sense in our analysis as this does not correspond to a failed execution but to a completion of i and the probability density function (pdf ) should therefore be conditioned on this fact. The distribution function of the time to interruption for activity i can be derived from the failure time distribution of the ri resource units that are used to execute i using Properties 1 and 2 (Blumenfeld, 2001). Property 1. Let X1 , X2 , ..., Xn be independently and identically distributed stochastic variables with cdf (cumulative distribution function) F (x). The minimum of these variables, Z with cdf G(z), will then be distributed as follows: G(z) = 1 − [1 − F (z)]n Property 2. Let X be a stochastic variable with pdf (probability distribution function) f (x). This pdf can be modified into a pdf constrained between 0 and m as follows: g(x) g(x|x < m) = P r(x < m) Apart from the expected value of the time to interruption and the repair time, we also need to know the expected value of the number of interruptions experienced by activity i throughout its execution (Ni ). E[Ni ] can be calculated using Lemma 1. Lemma 1. If we let ψi represent the probability that activity i is interrupted in a preempt-repeat scenario, the expected number of interruptions until i finishes is given by: ψi (10) E[Ni ] = 1 − ψi Proof. If ψi represents the probability that activity i is interrupted then the number of interruptions is obviously distributed with pdf : h(Ni ) = (1 − ψi )ψiNi

(Ni = 0, 1, 2, ...)

(11)

Notice that this pdf is very similar to the pdf of a geometric distribution (Blumenfeld, 2001): 10

Observation 1. The random variable X has a geometric distribution if P (x) is given by P (x) = p(1 − p)x (x = 0, 1, 2, ...) The expected value of the geometric distribution is given by E[x] =

1−p p

Substituting p with (1 − ψ) and substituting x with Ni yields the result in Lemma 1. The parameter ψi can easily be calculated using Property 1. All of the above leads to the following result: Theorem 1. In a preempt-repeat environment with fixed resource allocations, the expected duration extension due to breakdowns for an activity with duration di and resource usage ri of a single renewable resource type for which the time to failure of each resource unit is exponentially distributed with parameter λ and the time to repair is exponentially distributed with parameter µ is given by: 1 1 ψi + − di (12) E[σi ] = 1 − ψi λri µ with ψi = 1 − e−λri di . Proof. First of all, we need to know the expected value of E[Ni ]. Using Equation 10 and Property 1 we get: 1 − e−λri di (13) e−λri di The expected value of the time to interruption is obtained using Properties 1 and 2: Zdi E[Fi ] = xf 0 (x|x < di )dx E[Ni ] =

x=0

1 = ψi

Zdi

xλri e−λri x dx

x=0

1 di (1 − ψi ) = − λri ψi 11

(14)

The repair time is exponentially distributed with parameter µ so that: 1 (15) E[Y ] = µ Substituting (13), (14) and (15) in (9) yields Equation 12. 4.1.2

Multiple resource types

The case of multiple resource types is somewhat more complicated. We now use R different resource types. In each time period of its execution, activity i requires rik units of resource type k. We extend the notation of the time to failure and the repair time with a subscript k to represent the considered resource type. The main differences with the single resource type case are in the calculation of ψi and E[Yi ]. The time to interruption of activity i (Fi ) is now determined by the minimum time to failure over all resource units over all resource types constrained between 0 and di . Let Xmk represent the time to failure of resource unit m of resource type k. Using property 1 we can write: ψi = P r(i interrupted) = P r[min(X11 , X21 , ..., Xri1 1 , ..., X1R , X2R , ..., XriR R ) 6 di ] = 1 − [1 − F1 (di )]ri1 ...[1 − FR (di )]riR

(16)

Allowing for multiple resource types does not change the distribution function of Ni so that Equation 10 remains valid. Deriving the expected value of Ri is more complicated as the distribution function depends on the resource type that causes the disruption and that resource type will therefore determine the length of the downtime. Theorem 2. In a preempt-repeat environment with fixed resource allocations, the expected duration extension due to breakdowns for an activity with duration di and resource usage rik of renewable resource type k for which the time to failure of each resource unit is exponentially distributed with parameter λk and the time to repair is exponentially distributed with parameter µk is given by: ! X λk rik ψi P 1+ − di (17) (1 − ψi )( λk rik ) µk k k

−di

with ψi = 1 − e

P k

λk rik

. 12

Proof. If we let pk represent the probability that the breakdown is caused by resource type k and that therefore the repair time will be exponentially distributed with parameter µk , we can write: E[Ri ] =

X

pk

k

1 µk

(18)

The probability pk is then the probability that the minimum time to failure over all resource units of resource type k is smaller than the minimum time to failure over all resource units of all resource types l 6= k. The property of competing exponentials is used in the calculation of pk : Property 3. Let X and Y be independent stochastic variables that are both exponentially distributed, respectively with parameters λ and µ. The probability that X will be smaller than Y is then: P r(X < Y ) =

λ λ+µ

We use properties 1 and 3 to determine pk : pk = P r(Xkmin < minl6=k (Xlmin )) λk rik P = λl ril λk rik + l6=k

λk rik = P λl ril

(19)

l

Combining (9), (10), (16), (18), (19) and (14) yields Equation 17.

4.2

Preempt-Resume

4.2.1

Single resource type

Again, due to breakdowns, the real duration of activity i is a stochastic variable Di consisting of a deterministic part di , corresponding to the duration of the activity when no interruption occurs and after which i is terminated, and a stochastic part σi . The difference with the preempt-repeat case is that σi

13

now only has to include the total repair time Yi . If we preserve the notation of Section 4.1, Equations 5 through 8 now become: Di = di + σi σi = Yi Yi = Ri1 + ... + RiNi

(20) (21) (22)

From 9 we know that E[σi ] can be calculated when we know the expected number of interruptions E[Ni ] and the expected repair duration E[Y ]. The expected number of interruptions can be calculated by dividing the duration of the activity i by the expected value of the time to interruption. The time to interruption is distributed according to the minimum of ri independently and identically distributed variables X: Theorem 3. In a preempt-resume environment with fixed resource allocations, the expected duration extension due to breakdowns for an activity with duration di and resource usage ri of a single renewable resource type for which the time to failure of each resource unit is exponentially distributed with parameter λ and the time to repair is exponentially distributed with parameter µ is given by: E[σi ] = 4.2.2

λri di µ

(23)

Multiple resource types

These results can easily be extended to the multiple resource type case. When calculating the expected number of interruptions, we now need to consider the minimum time to failure over all resource units over all resource types. Furthermore, we should take care to weigh the mean time to repair with the probability that the disruption is caused by a given resource type. This allows us to determine the expressions for the case of preempt-resume with multiple resource types: Theorem 4. In a preempt-resume environment with fixed resource allocations, the expected duration extension due to breakdowns for an activity with duration di and resource usage rik of renewable resource type k for which the time to failure of each resource unit is exponentially distributed with parameter λk and the time to repair is exponentially distributed with parameter µk 14

is given by: E[σi ] = di

X λk rik k

5

µk

(24)

Time Buffering

The time buffering step consists of two separate phases. The resource allocations phase deals with the construction of a resource flow network to describe non-technological relations between activities (this topic will be thoroughly dealt with in Section 5.2). In the second phase, explicit idle time will be inserted in the initial precedence, resource and deadline feasible schedule S u , so that the preemption of a disrupted activity does not translate into the disruption of a related activity. The decisions variable Bi indicates how many time units an activity should be started beyond its earliest precedence feasible starting time. The vector B comprising the buffer decisions can be decoded into a buffered schedule S b using Algorithm 1. In this algorithm, we first construct a list L that contains the activities ordered according to non-decreasing starting times in the initial schedule S u . Those activities are then scheduled in the order of list L with Li the activity in position i of list L. Each activity is buffered at its earliest precedence feasible starting time increased with a number of time units as indicated by its buffer amount BLi . The earliest feasible starting time is obtained by considering the finish times of the immediate predecessors of activity Li (P REDLi ). Algorithm 1 Decoding procedure 1: 2: 3: 4:

L = (i ∈ N : according to non-decreasing sui ) (tie-break lowest number) sb1 = 0 for i := 2 to n do sbLi = max(suLi , maxj∈P REDLi (sbLj + dLj )) + BLi

Note that this procedure does not explicitly need to consider resource feasibility, as the addition of the resource arcs dictated by the resource flow network resolves the resource conflicts that might have been present. This is a clear advantage in our approach, as it allows us to seriously speed up the buffer insertion step.

15

In what follows, we present a steepest descent time buffering procedure that estimates the objective function value by means of simulation. However, because this approach is computationally quite demanding, we also present a heuristic that uses information regarding expected duration increases due to resource breakdowns that are calculated as shown in Section 4. Finally, since the procedures in Section 4 allow us to translate resource breakdowns into activity duration increases, it will become possible to use approaches developed for the resource-constrained project scheduling problem subject to stochastic activity durations such as the Starting Time Criticality (STC) heuristic by Van de Vonder et al. (2008).

5.1

Simulation-based time buffering

The easiest and most reliable way to estimate the quality of a buffered schedule with respect to the weighted instability cost objective function is by using simulation. Note that except for the buffer insertion procedure, we do not need to use a resource flow network in this case because the average instability costs over a number of simulation runs for a given reactive policy give an accurate representation of the real costs. The schedule is now iteratively buffered as follows. In each iteration every activity (except the dummy start activity) is considered for buffering. We buffer the activity leading to the highest improvement in the objective function value that yields a schedule respecting the deadline constraint. If no such activity can be found, the procedure is terminated. The pseudocode for this steepest-descent approach is given in Algorithm 2, with S(B) the schedule corresponding to the buffer decisions B and I(S(B)) its simulated average instability cost value. The main advantage of using simulation is that it is a pretty reliable approach. As we will see in Section 6, simulation-based time buffering is the best performing time buffering procedure. However, there are also two drawbacks to using simulation. First of all, it does not deliver any new insights into the problem structure. Secondly, simulation is very computationally demanding. This could become a problem in practice whenever very large projects are considered or whenever the decision maker wants to perform what if-analyzes for a wide variety of scenarios. If, on the other hand, the computation times are unimportant, then simulation will clearly be the most interesting approach.

16

Algorithm 2 Time buffering heuristic using simulation 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11:

zbest := Simulate(I(S(B))) while improvement found do i∗ := 0 for i := 2 to n do Bi := Bi + 1 z := Simulate(I(S(B))) if sn 6 δ AND z < zbest then i∗ := i, zbest := z Bi := Bi − 1 if i∗ 6= 0 then Bi∗ := Bi∗ + 1

5.2

Time buffering using surrogate measures

Surrogate measures try to estimate the real instability costs in an efficient and effective manner and are calculated using information regarding activity characteristics and resource breakdown parameters. In Section 4 it was shown how resource breakdowns can be translated into activity duration increases under various scenarios. This information will be used in this section to calculate two surrogate robustness measures. Furthermore, a third measure, based on the estimated probability that the start of an activity has to be postponed, is introduced. All of these surrogate measures use uncertainty information to determine the impact a duration change of the considered activity has on activities that are planned later in time. In order to accurately calculate this impact, we need to introduce the concept of resource flow networks. 5.2.1

Resource flow networks

It is easy to see that disruptions of the starting time of an activity can be caused by a disruption of one of its predecessors. However, such a disruption can also be caused by a disruption of an non-precedence related activity planned earlier in time with which the considered activity shares a scarce resource. Imagine the case of a project in which no precedence relations exist but for which only a limited number of the activities can actually be executed in parallel due to scarce resources. In such a case, it is not hard

17

to see that non-precedence related activities can indeed have an impact on one-another. Propagations of disruptions throughout the network are determined by strictly technological precedence relations as well as by resource-driven precedence relations. Resource flow networks offer an elegant way to represent those resource-driven relations. A resource flow network is an extension of the set of arcs A with the resource flow arcs AR . The resource flow arcs are precedence relations that are added in order to fix resource allocations and that eliminate explicit resource constraints. Artigues and Roubellat (2000) introduce a simple method for generating a resource flow network. The main advantage of this approach is that it is very fast but unfortunately it does not take schedule robustness into account. The Myopic Activity Based Optimization heuristic (MABO) by Van de Vonder (2006) does take robustness into account. However, we decided to use Artigues procedure as it is faster and because we found that testing the pairwise differences between using Artigues procedure and MABO for our problem setting did not yield significantly different results. The reader may be further comforted by knowing that the resource allocations are only kept fixed during the time buffering step. The reason behind this decision is that fixing resource allocations during execution unnecessarily restricts rescheduling flexibility causing instability costs that are on average more than a factor 100 higher than those obtained when releasing resource allocations. 5.2.2

Surrogate measures

The first surrogate objective (Surr1) is calculated as follows: X X Surr1 = wj max(0, si + di + LP Lij + E[σi ] − sj ) j∈N

(25)

i∈P REDj∗

For each activity j all predecessors, immediate as well as transitive (i ∈ P REDj∗ ), are considered given the current extended network (strictly technical as well as resource arcs). For each such predecessor the expected impact of a duration increase of i on the starting time of j is calculated and these values are weighted with the instability weight of activity j and summed. In this equation, LP Lij represents the length of the longest path between activities i and j. This longest path is determined based on the extended network and the given activity durations using a full enumeration approach. 18

The second surrogate objective (Surr2) looks quite similar: X Surr2 = maxi∈P REDj∗ wj max(0, si + di + LP Lij + E[σi ] − sj )

(26)

j∈N

The main difference is that now the maximum starting time disruption for each activity j is calculated over all of its predecessors. Finally, we consider a third measure (Surr3) that is inspired on the starting time criticality heuristic (STC) (Van de Vonder et al., 2008): X Surr3 = ST Cj (27) j∈N

The Starting Time Criticality (STC) heuristic is an elegant approach for generating time buffered schedules when faced with stochastic activity durations. It exploits information about the weights of the activities as well as about the probability distributions of the activity durations. The authors define the starting time criticality of activity j as follows: ST Cj = wj P r(Sj > sj )

(28)

Using the observation that the starting time of activity j is disturbed whenever the duration increase of one of its predecessors i (be they immediate or transitive predecessors) is of such magnitude that it forces the delay of activity j in order to maintain precedence feasibility, the authors calculate the probability that activity j cannot start at its planned starting time sj as follows: X P r(Sj > sj ) = P r(si + LP Lij + Di > sj ) (29) i∈P REDj∗

Here, the assumption is made that predecessors start at their baseline starting times and that only one activity at a time changes Sj . In case activity j is the dummy start activity or in case activity j’s sole predecessor is the dummy start activity, the starting time criticality of j is equal to 0. Because it is very hard to analytically calculate the probability distributions of the durations due to resource breakdowns, we approximate them by using simulation, assuming that resource allocations are fixed. For each activity, 1000 simulation runs are executed to determine the probability for each possible duration outcome, given the mean time to failure and the mean time to repair for each resource type used by that activity. 19

These surrogate measures are now used in the tabu search metaheuristic shown in Algorithm 3 in order to evaluate the performance of the intermediate schedules. The neighbourhood this tabu search procedure considers is obtained by considering a decrease (if possible) or increase in the buffer amount for each of the non-dummy activities. After each such neighbourhood evaluation, the move leading to the highest improvement in the objective function value (calculated using one of the above surrogate measures for the schedule corresponding to B (S(B)) is selected, executed and made tabu for the next T iterations. Furthermore, if no improvement of the globally best solution B best is found, the move is stored in the frequency-based memory F REQ. Algorithm 3 Tabu search based buffer insertion procedure 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19:

B best := B := (0, 0, ..., 0), z best := z := Surr(S(B)) + T ABUi+ := T ABUi− := F REQ+ i := F REQi := 0 (i := 1, ..., n) T := n, iter := 0 while (iter < MAXITER) do i∗ := 0, z := ∞ for i := 2 to n − 1 do if Bi > 0 then Bi := Bi − 1 if ( (Surr(S(B)) < z best OR iter > T ABUi− ) AND Surr(S(B))+ ∗ F REQ− i < z AND sn 6 δ) then i := −i, z := Surr(S(B)) Bi := Bi + 1 Bi := Bi + 1 if ( (Surr(S(B)) < z best OR iter > T ABUi+ ) AND Surr(S(B)) + ∗ F REQ+ i < z AND sn 6 δ) then i := i, z := Surr(S(B)) Bi := Bi − 1 if i∗ 6= 0 then + if (i∗ < 0) then B−i∗ := B−i∗ − 1, T ABU−i := iter + T , ∗ − − F REQ−i∗ := F REQ−i∗ + 1 if (i∗ > 0) then Bi∗ := Bi∗ + 1, T ABUi−∗ := iter + T , F REQ+ i∗ := + F REQi∗ + 1 if (z < z best ) then z best := z, B best := B, F REQ+ := F REQ+ := (0, 0, ..., 0) iter := iter + 1 generate buffered schedule S b corresponding to B best

We now illustrate the use of Surr1 for generating a robust schedule starting 20

from the minimal makespan schedule S u = (0, 0, 2, 4, 0, 7, 7, 9, 13, 18) in Figure 3. Assuming a preempt-repeat scenario with exponentially distributed times to failure (mttf = 15) and repair times (mttr = 3), we can calculate the expected duration increases using Theorem 1. This yields (rounded to the nearest integer): E(σ) = (0, 1, 17, 5, 9, 24, 7, 6, 3, 0). Using the longest path values shown in Table 2 that correspond to the resource flow network in Figure 5, we can calculate the surrogate objective function values for buffering activities 2 through 9 yielding: (1457, 1447, 1520, 1549, 1452, 1501, 1449, 1460). The lowest value is obtained when buffering activity 3, yielding the schedule S = (0, 0, 3, 4, 0, 7, 7, 10, 14, 18). The procedure finally terminates with the schedule in Figure 6. ——————————— Insert Table 2 ——————————— ——————————— Insert Figure 5 ——————————— ——————————— Insert Figure 6 ———————————

5.3

Time buffering using the STC heuristic

In addition to the approaches presented in the previous two paragraphs, we also implemented the STC heuristic. This heuristic iteratively buffers the activity with the highest STC value so that the deadline constraint is respected until no more activities with an STC value larger than zero can be buffered without creating a deadline infeasible schedule. The STC values are calculated as shown in Equations 28 and 29. For more details regarding its operation, we refer the reader to Van de Vonder et al. (2008).

6

Computational Experiment

In order to compare the performance of simulation based time buffering, time buffering using surrogate measures and the ST C heuristic, we set up an extensive computational experiment. As a test set we used the 480 30-activity instances of PSPLIB set of test instances developed by Kolisch and Sprecher (1997). Mean times to failure were drawn from a uniform distribution between the minimal project makespan (obtained by solving the deterministic RCPSP) and two times this minimal makespan whereas mean repair times were drawn from a uniform distribution between 1 and 5. The instability weights were drawn from a triangularly shaped distribution between 1 and 10. The weight of the dummy 21

end activity, however, was set to 10 times the average of this distribution in order to reflect the relatively higher importance of finishing the project in time than meeting individual milestones. Finally, the project deadline was set at the project’s minimal makespan increased with 30%. Baseline schedules were constructed using the flowchart in Figure 1. This means that one either chooses to use resource buffering (res buf ) or not (no res buf ) followed by the generation of an initial schedule using minimal makespan scheduling (RCPSP ) or ‘highest CIW first’ scheduling (CIW ). This schedule can then be made robust using simulation based time buffering (SIM ), surrogate measure based time buffering (SU RR1 ,SU RR2 ,SU RR3 ), the ST C heuristic (ST C) or not using time buffering at all (None). Average instability costs are calculated by means of simulation using 10 repetitions per instance yielding the results in Figures 7 and 8. In our experiment, we terminated the tabu search procedure used for time buffering based on surrogate measures after 1000 iterations. Further increasing the number of iterations did not significantly improve our results. Furthermore, we chose to restrict the number of simulation runs in the simulation based heuristic to 100 in order to keep computation times within an acceptable level and because time buffering based on simulation is presented here mostly as a benchmark for assessing the performance of the surrogate measures that were proposed in the previous sections. ——————————— Insert Figure 7 ——————————— ——————————— Insert Figure 8 ——————————— If we take a look at these figures, we immediately remark that the best performance is obtained whenever resource buffered (res buf) minimal makespan scheduling (RCPSP) is combined with simulation based time buffering (SIM). The worst result, on the other hand, corresponds to using minimal makespan scheduling without any form of buffering. Using minimal makespan scheduling combined with resource buffering and simulation based time buffering allows us to improve on simple minimal makespan scheduling without buffering with 91% for preempt-repeat and 93% for preempt-resume. Using the best non-simulation time buffering heuristic (Surr1 for preempt-repeat and ST C for preempt-resume) decreases these percentages to respectively 84% and 89%. Let us also consider the impact of resource buffering and of initial schedule generation. As long as resource buffering nor time buffering are used, ‘highest CIW first’ scheduling performs better than minimal makespan scheduling. This changes whenever a form of buffering is included. Resource buffering 22

always performs better than no resource buffering. This is especially noticeable if no time buffering is used. In that case, the performance difference is a factor 3 on average. Moreover, the results of resource buffering without time buffering are almost never outperformed by those of time buffering without resource buffering. One exception is the case of simulation-based time buffering which performs on average 2 times better than pure resource-based buffering. ——————————— Insert Table 3 ——————————— The average computation times are given in Table 3. Those for the nontime buffered proactive strategies are negligibly small. As expected, ‘highest CIW first’ scheduling is very fast as it uses the quick serial schedule generation scheme, but minimal makespan scheduling requires significantly more time due to the more time intensive branch-and-bound procedure. Observe that resource buffered scheduling is slightly slower that no resource buffering. This is not because of the average availability calculation which is very fast, but due to the fact that recalculation of the corresponding schedule might be required in case the average availability is insufficient to meet the project deadline. We can see from the results of the surrogate measures that our tabu search procedure is very fast compared with the steepest descent STC heuristic. However, using tabu search for the simulation based time buffering procedure caused computation times to explode. Even now, using the simple steepest descent heuristic, simulation based time buffering is on average a factor 6 slower than surrogate measure based time buffering procedures. For our instances, these computation times are still acceptable, but this will not necessarily be the case for practical project networks consisting of 300 or more activities.

7

Conclusions

We can conclude that time buffering is a very interesting alternative for incorporating robustness into a schedule. We gave an overview of analytical approaches for determining the expected duration increase an activity experiences due to resource breakdowns. Those results are used to create an effective and efficient algorithm for inserting explicit idle time into an initial, unbuffered schedule in order to protect it from the propagation of disruptions throughout the project network. It was shown that time buffering based on simulation performs far better than surrogate objective functions, 23

but the reader should keep the higher computational demands in mind. Especially in practical project scheduling those computational demands will often become prohibitive. Therefore we suggest to either implement time buffering based on the first surrogate objective function or using the ST C heuristic. ST C offers the additional advantage that it has proven to be a good buffering strategy in case stochastic activity durations are considered. It would therefore be an interesting topic for further research to develop an integrated approach combining uncertain activity durations with unexpected machine breakdowns. The advantages of robust project scheduling for practical project management are obvious. Less rescheduling and replanning allows for a decrease in the costs resulting from those actions. Furthermore, the project manager will be able to quote reliable milestone delivery dates facilitating negotiations with customers and sub-contractors.

References Artigues, C. and Roubellat, F. (2000). A polynomial activity insertion algorithm in a multi-resource schedule with cumulative constraints and multiple modes. European Journal of Operational Research, 127:294–316. Barlow, R. and Proschan, F. (1996). Mathematical Theory of Reliability. John Wiley & Sons Inc, New York. Blumenfeld, D. (2001). Operations research calculations handbook. CRC Press LLC, Florida. Brucker, P., Drexl, A., M¨ohring, R., Neumann, K., and Pesch, E. (1999). Resource-constrained project scheduling: Notation, classification, models and methods. European Journal of Operational Research, 112:3–41. Demeulemeester, E. and Herroelen, W. (2002). Project scheduling - A research handbook, volume 49 of International Series in Operations Research & Management Science. Kluwer Academic Publishers, Boston. Herroelen, W., De Reyck, B., and Demeulemeester, E. (1998). Resourceconstrained scheduling: A survey of recent developments. Computers and Operations Research, 25:279–302. Herroelen, W. and Leus, R. (2004). The construction of stable project baseline schedules. European Journal of Operational Research, 156(3):550–565. 24

Kolisch, R. and Sprecher, A. (1997). PSPLIB - A project scheduling library. European Journal of Operational Research, 96:205–216. Lambrechts, O., Demeulemeester, E., and Herroelen, W. (2007a). Exact and suboptimal reactive strategies for resource-constrained project scheduling with uncertain resource availabilities. Research Report KBI0702, Department of Decision Sciences and Information Management, K.U.Leuven, Belgium, 36 pp. Lambrechts, O., Demeulemeester, E., and Herroelen, W. (2007b). A tabu search procedure for developing robust predictive project schedules. International Journal of Production Economics, 111(2):493–508. Lambrechts, O., Demeulemeester, E., and Herroelen, W. (2008). Proactive and reactive strategies for resource-constrained project scheduling with uncertain resource availabilities. Journal of Scheduling, 11(2):121–136. Leus, R. (2004). The generation of stable project plans. 4OR, The Quarterly Journal of the Belgian, French and Italian Operations Research Societies, 2(3):251–254. Leus, R. and Herroelen, W. (2004). Stability and resource allocation in project planning. IIE Transactions, 36(7):667–682. O’Donovan, R., Uzsoy, R., and McKay, K. (1999). Predictable scheduling of a single machine with breakdowns and sensitive jobs. International Journal of Production Research, 37(18):4217–4233. Schatteman, D., Herroelen, W., Van de Vonder, S., and Boone, A. (2008). A methodology for integrated risk management and proactive scheduling of construction projects. Journal of Construction Engineering and Management, 134(11):885–893. Van de Vonder, S. (2006). Proactive-reactive procedures for robust project scheduling. PhD thesis, Faculty of Economics and Applied Economics, Department of Decision Sciences and Information Management, K.U. Leuven, Belgium. Van de Vonder, S., Ballestin, F., Demeulemeester, E., and Herroelen, W. (2007a). Heuristic procedures for reactive project scheduling. Computers & Industrial Engineering, 52(1):11–28. 25

Van de Vonder, S., Demeulemeester, E., and Herroelen, W. (2007b). Heuristic procedures for generating stable project baseline schedules. European Journal of Operational Research, to appear. Van de Vonder, S., Demeulemeester, E., and Herroelen, W. (2008). Proactive heuristic procedures for robust project scheduling: an experimental analysis. European Journal of Operational Research, 189(3):723–733. Van de Vonder, S., Demeulemeester, E., Herroelen, W., and Leus, R. (2005). The use of buffers in project management: The trade-off between stability and makespan. International Journal of Production Economics, 97(2):227– 240. Van de Vonder, S., Demeulemeester, E., Herroelen, W., and Leus, R. (2006). The trade-off between stability and makespan in resourceconstrained project scheduling. International Journal of Production Research, 44(2):215–236. Yu, G. and Qi, X. (2004). Disruption Management - Framework, models and applications. World Scientific, New Jersey.

26

Table 1: Notation di σi Di rik P REDi P REDi∗ wi LP Lij δ Lp si Schedule Si SCHEDU LE Bi Busyt ak Akt Xmk M T T Fk Ymk M T T Rk Xi Yi Ni Fip Rip ψi

deterministic duration of activity i stochastic duration increase of i due to breakdowns and repairs stochastic duration of activity i due to breakdowns and repairs per period usage of resource type k for activity i set of immediate predecessors of activity i set of immediate and transitive predecessors of activity i instability weight of activity i length of the longest path between activities i and j project deadline element in position p of list L planned starting time of activity i vector containing the baseline starting times si real, stochastic starting time of activity i vector containing the real starting times Si buffer amount for i set of activities in progress during period t number of renewable resource units of type k allocated to the project real availability of resource type k in period t stochastic time to failure of resource unit m of resource type k expected value of Xmk (λk = M T1T F ) k stochastic repair time of resource unit m of resource type k 1 expected value of Ymk (µk = M T T R ) k stochastic part of σi attributable to failed executions stochastic part of σi attributable to repairs stochastic number of interruptions experienced by i before completion stochastic duration of the p’th failed execution experienced by i stochastic duration of the p’th repair experienced by i probability that i is interrupted before it is terminated

Table 2: Longest path lengths for all (i, j)

1 2 3 4 5 6 7 1 / 0 2 4 0 7 7 2 / / 0 0 / 3 3 3 / / / / / / / 4 / / / / / 0 0 5 / / / 0 / 3 3 6 / / / / / / / 7 / / / / / / / 8 / / / / / / / 9 / / / / / / / 10 / / / / / / /

27

8 9 10 9 13 15 7 11 13 0 4 6 / 6 8 0 9 11 / / 0 / 0 2 / 0 2 / / 0 / / /

Table 3: Computation times

no res buf no time buf res buf no res buf Surr1 res buf no res buf Surr2 res buf no res buf Surr3 res buf no res buf ST C res buf no res buf SIM res buf

repeat

resume

average

RCPSP

0.06 s

0.06 s

0.06 s

CIW

0.00 s

0.00 s

0.00 s

RCPSP

0.08 s

0.08 s

0.08 s

CIW

0.00 s

0.00 s

0.00 s

RCPSP

0.41 s

0.43 s

0.42 s

CIW

0.62 s

0.64 s

0.63 s

RCPSP

0.53 s

0.55 s

0.54 s

CIW

0.85 s

0.87 s

0.86 s

RCPSP

0.43 s

0.46 s

0.44 s

CIW

0.65 s

0.67 s

0.66 s

RCPSP

0.55 s

0.58 s

0.56 s

CIW

0.88 s

0.90 s

0.89 s

RCPSP

0.57 s

0.55 s

0.56 s

CIW

0.78 s

0.76 s

0.77 s

RCPSP

0.69 s

0.67 s

0.68 s

CIW

1.02 s

0.99 s

1.00 s

RCPSP

0.28 s

0.23 s

0.25 s

CIW

0.50 s

0.45 s

0.47 s

RCPSP

0.41 s

0.36 s

0.38 s

CIW

0.74 s

0.69 s

0.72 s

RCPSP

3.80 s

3.41 s

3.70 s

CIW

3.04 s

2.71 s

2.87 s

RCPSP

4.44 s

3.86 s

4.15 s

CIW

3.64 s

3.14 s

3.39 s

28

Figure 1: Robust schedule construction

29

Figure 2: Example project network

Figure 3: Minimal makespan schedule with fixed allocations

30

Figure 4: Minimal makespan schedule with fixed allocations after a resource breakdown (preempt-repeat)

Figure 5: Resource flow network

31

Figure 6: Buffered schedule

Figure 7: Instability costs for preempt-repeat

32

Figure 8: Instability costs for preempt-resume

33

1

Introduction

Most of the research in project scheduling deals with the generation of an initial project schedule (baseline schedule) in a static and deterministic environment with complete information. For an extensive overview we refer to Brucker et al. (1999), Herroelen et al. (1998) and Demeulemeester and Herroelen (2002). Unfortunately, these underlying assumptions simply do not always hold in practice. In the real world, a project manager often has to deal with a stochastic and dynamic scheduling environment. He has to protect the initial baseline schedule from the adverse effects of possible disruptions because often project activities are subcontracted or executed by resources that are not exclusively reserved for the current project. A change in the starting times of such activities could lead to additional costs due to required subcontractor flexibility and due to schedule nervousness. A possible measure 1

for the deviation between the initial schedule and the realized schedule is the weighted instability cost. It can be calculated by taking the sum of the expected weighted absolute deviations between the planned and the actually realized activity starting times. The weight wi , assigned to each activity i, reflects that activity’s importance of starting it at its planned starting time in the initial schedule. More specifically, wi denotes the marginal cost of deviating from the planned starting time of activity i during project execution. The instability weight can be quantified using for example the computer supported risk management system by Schatteman et al. (2008). Recent research by Leus (2004), Herroelen and Leus (2004), Leus and Herroelen (2004) and Van de Vonder et al. (2005, 2006, 2007b and 2008) considers the weighted instability cost objective for the case of project scheduling with stochastic activity durations. Other possible causes for uncertainty in project execution might be, amongst others, project content changes, bad weather conditions or unavailability of resources. In this paper we study the last of these possible causes. Resource breakdowns have been cited by numerous authors as one of the most important sources of disruptions in practical project management (see amongst others Yu and Qi (2004)). We only consider renewable resources. This means that each resource type k (k = 1, ..., R) is modeled as a set of individual resource units. In the deterministic case these resource units are assumed to be available throughout the project on a period-per-period basis. In the stochastic case, on the other hand, breakdowns may occur. Whenever a resource unit breaks down, it has to be repaired before it becomes available again. The time between the end of a repair period and a new failure for resource unit m of resource type k is modeled by means of a stochastic variable Xmk . The time needed to repair a resource unit m of type k is also represented by means of a stochastic variable Ymk . In order to cope with uncertainty, one has several options at one’s disposal. Proactive scheduling focuses on the construction of predictive schedules that use statistical knowledge of the uncertainties with the aim of increasing schedule robustness. A schedule is considered to be robust if it can absorb anticipated disruptions without affecting planned external activities while maintaining high shop performance (O’Donovan et al., 1999). Robustness can be induced by allocating extra resource capacity and/or execution time to each activity so that its execution uncertainty can be compensated to a certain extent without the need for rescheduling. Our proactive approach is depicted in Figure 1. 2

——————————— Insert Figure 1 ——————————— First, one has to decide whether to schedule the project using the maximal, deterministic resource availability ak or to use a buffered availability a∗k . This buffered availability can be calculated by taking the expected value of the steady state availabilities. The idea being that resource breakdowns can be compensated to a certain extent by not fully utilizing the available capacity, effectively using resource redundancy. Next, an initial schedule is generated using either an optimal approach for minimizing the project makespan or by scheduling activities having a high Cumulative Instability Weight (CIW) as early as possible in time. The aim of the latter approach, that we adopt from Lambrechts et al. (2008), is to reduce the probability that high-impact activities are delayed due to disruptions of activities taking place earlier in time. The CIW of an activity can be calculated by taking the sum of its instability weight and the instability weights of all of its successors, immediate as well as transitive. After constructing an initial schedule S u using one or more of the procedures we just introduced, time buffering can be added to this initial schedule. Our aim will be to improve the robustness of S u by inserting explicit idle time into the project schedule. The inclusion of slack time in front of activities allows the schedule to absorb potential disruptions caused by earlier resource breakdowns and the resulting activity shifts. Our objective is to insert a time buffer of size Bi in front of the starting times si of each activity i so that the expected instability costs are minimized without exceeding the project deadline. The next section will briefly introduce our scheduling problem together with a number of definitions and concepts that will be of importance in the remainder of the paper. The assumptions made in Section 3 allow us to analytically translate resource breakdowns into activity duration increases as we will see in Section 4. This information will be used in Section 5 to develop an approach for strategically inserting explicit idle time in the schedule. In a computational experiment this approach is compared with a simulation-based approach and with a dedicated approach for minimizing the instability costs in case of stochastic activity durations. The results of this experiment are given in Section 6. Note that all of these approaches assume that we know the distribution of the times to failure and the repair times. If this is not the case, one can resort to robust scheduling techniques that do not exploit this information but that are consequently also less efficient and effective. For an overview of proactive strategies based on the free slack measure we would 3

like to refer the interested reader to Lambrechts et al. (2007b). Unfortunately, no matter how much care is taken in constructing a proactive schedule, disruptions can never be totally prevented. In case an activity is delayed due to for example an unforeseen resource breakdown, the schedule may become infeasible. A reactive procedure must then be used to repair the schedule. The aim of this reactive procedure is to restore schedule feasibility in such a way that some objective function (such as the deviation from the baseline schedule) is optimized. However, here, we only focus on the construction of a robust predictive schedule. For more information regarding the reactive phase, we would like to refer the interested reader to Van de Vonder et al. (2007a) for the stochastic duration case and to Lambrechts et al. (2007a) for the stochastic resource availability case.

2

Problem Statement

The aim of the proactive baseline scheduling problem is to generate a project schedule that is feasible as well as robust. We represent the project using the activity-on-node representation: the digraph G = (N, A) contains a set of nodes N and a set of arcs A. The nodes represent the activities constituting the project whereas the arcs represent the finish-start, zero-lag precedence relations. Whenever (i, j) ∈ A we say that activity i (i = 1, ..., n) is an immediate predecessor of activity j, implying that activity j cannot start before activity i has finished: ∀(i, j) ∈ A

s i + di 6 s j

(1)

with si representing the starting time of activity i and di the deterministic duration of activity i. For ease of reference, Table 1 is included summarizing the notation that will be used throughout this paper. ——————————— Insert Table 1 ——————————— In resource-constrained project scheduling we also have to take the renewable resource constraints into account. As we indicated in Section 1, we assume that a finite amount ak of each resource type k is available on a period-per-period basis. Resource feasibility then implies that for each time period t and for each resource type k the sum of the resource requirements rik of the activities that are in progress during period t (Busyt ) cannot exceed

4

the availability ak : X

rik 6 ak

∀t, ∀k

(2)

i∈Busyt

Finally, a given project deadline δ has to be respected: sn 6 δ

(3)

with n the dummy end activity having a duration and a resource usage equal to 0 and representing project completion. Our objective then becomes the generation of a baseline schedule (s1 , ..., sn ) respecting constraints (1), (2) and (3) and minimizing the expected instability costs: X wi |E(Si ) − si | (4) i∈N

The real starting times Si in Equation 4 are stochastic variables that depend on the realization of the stochastic resource availabilities (Akt ), on the planned starting times (si ) and on the reactive policy that is used to repair a disrupted schedule. We assume that a railroad scheduling approach is used, meaning that activities are never started before their planned starting time (Si > si ). Lambrechts et al. (2007a) describe exact and suboptimal reactive scheduling procedures. The exact procedure relies on branch-and-bound while the heuristics involve tabu search and a so-called scheduled order list procedure. In this paper, we restrict ourselves to the scheduled order list procedure to deal with schedule disruptions. This policy reschedules activities in the order they were started in the baseline schedule while respecting precedence and resource constraints. In case an activity has to be interrupted due to a resource breakdown, it will either have to be restarted from scratch (preempt-repeat) or it will simply be resumed from the point where execution was halted (preempt-resume).

3

Assumptions

In this paper, we focus on the construction of a robust baseline schedule by inserting explicit idle time into the project schedule. In order to be able to do this in an efficient way, we need a more thorough understanding of the nature of resource breakdowns and their impact on the duration of the disrupted activity. Before doing so, we need to state some assumptions that allow us to determine this impact: 5

Exponentially distributed times to failure and repair times We chose to model times to failure and repair times using the exponential distribution. This choice is supported by empirical evidence as well as by mathematical arguments (Barlow and Proschan, 1996). A further advantage of using the exponential distribution is that it is unambiguously defined by its expected value. This means that we only need to know the mean time to failure and the mean time to repair for each resource type k (M T T Fk and M T T Rk ) to know the failure and repair distribution function. The formulas that will be derived in the following section can easily be adapted to deal with non-exponentially distributed repair times. For the times to failure, however, this modification is not so straightforward. Nevertheless, we are confident that the use of exponentially distributed times to failure does not pose a serious limitation to the use of our approach in practice. Resource allocations are fixed in advance The baseline schedule determines the starting time of each activity and consequently also the amount of resource units of each resource type required in each time period. The project manager can decide to allocate specific resource units to individual activities in advance. There are two advantages to this choice. First of all, rescheduling becomes far easier as the affected activities simply have to be right-shifted to restore schedule feasibility. Secondly, fixing resource allocations enables us to predict the impact of resource breakdowns on activity duration increases. We will illustrate these two observations by means of a small example. Suppose that we wish to plan the project depicted in Figure 2 consisting of eight real activities (activity 1 being the dummy start activity and activity 10 the dummy end activity). Above each activity we indicate its duration, its resource requirement of a single renewable resource type with a per period availability equal to eight and its instability weight. The project deadline is equal to 18 time units. Finally, we assume that each unit of the considered resource type has a mean time to failure equal to 15 and a mean time to repair equal to 3. ——————————— Insert Figure 2 ——————————— Consider the minimal makespan schedule for this project that is given in Figure 3 for the case in which resource allocations are fixed. ——————————— Insert Figure 3 ——————————— The horizontal bands represent the allocation of each resource unit. Re-

6

source unit 5, for example, is allocated to activity 2 in time interval [0, 2], to activity 7 in interval [7, 13] and to activity 9 in interval [13, 15]. For the remaining time periods it is idle. Imagine that we are interested in the impact of a resource breakdown on activity 4. In case resource allocations would be free, a number of things could happen. If only one unit would break down during the execution of activity 4 nothing would happen as sufficient idle resource capacity is available to deal with this disruption. In case more units would break down, activity 3 and/or 4 would have to be preempted depending on the total magnitude of the breakdown and the best rescheduling action. It is therefore not hard to see that it is practically impossible to predict the impact of a resource breakdown on the effective duration of an activity without fixed resource allocations. The case of fixed resource allocations is far easier to analyze. Activity 4 would only have to be preempted in case resource units 1, 2, 3 and/or 4 break down while being used by activity 2. Imagine the situation depicted in Figure 4. We see that resource unit 4 experiences a breakdown after 5 time periods (X41 = 5) and that its repair takes 2 time periods (Y41 = 2). Resource allocations are fixed, meaning that only the activity that is using resource unit 4 between time points 5 and 7 is affected. The result will be a preemption of activity 4 until time point 7. In case of preempt-repeat, this means that activity 4 will have to be restarted from scratch after the resource unit is repaired as is shown in the figure. Preempting activity 4 results in the postponement of activities 6, 7, and 9, resulting in a huge increase of the instability costs. ——————————— Insert Figure 4 ——————————— Summarized, this means that in case activity i requires ri resource units, only these specific resource units will be used for executing i and a breakdown of one or more of these units translates directly into an interruption of i. This example also immediately shows the main drawback of fixing resource allocations on rescheduling flexibility. This flexibility will be lost because idle resource units, nor resource units allocated to less important activities, can be used to overcome a possible resource shortage for activity i. In our example, the duration increase of activity 4 due to the breakdown of resource unit 4 could easily have been prevented by replacing it with resource unit 5 in case resource allocations were free. However, omitting this assumption renders the calculation of expected duration increases practically impossible because of the difficulties of predicting the rescheduling action that will be taken whenever a resource unit breaks down. Because this rescheduling action will depend on the state of the schedule at the disruption time, hoping to predict 7

it in advance seems unreasonable. Both failure and repair time distributions are identical for each resource unit m of a resource type k This assumption actually implies that we consider each resource unit of a given resource type to be identical with respect to its failure and repair characteristics. This is in no way restrictive as non-identical resource units can (and should) be modeled as separate resource types. Failure and repair times are mutually independent The mutual independence of failure times and repair times is often assumed in theory, but is unfortunately not always realistic in practice. It is likely that severe problems causing a resource breakdown happen less frequently, but that they require longer repair times. This would imply a positive correlation between interfailure times and repair times and thus a violation of the assumption. Nevertheless, the assumption is needed here to be able to obtain our theoretical results. Resource units can only break down when they are being used In case resource units can also break down when not in use (e.g. a worker gets ill during the weekend), it becomes very difficult to link resource outages to effective activity duration increases because activities may have to be postponed because of breakdowns before they are even started. Furthermore, the possibility of a second resource unit used by the considered activity breaking down while the first unit is still being repaired renders the translation of resource breakdowns into activity duration extensions intractable because of the number of scenarios one has to condition upon. Therefore, we assume that resources only break down when in use.

4

Breakdown Process

In this section, we will show how resource breakdowns affect an activity’s real duration under various scenarios. Throughout this analysis, we use the assumptions introduced in the previous section. The preempt-repeat case is studied in Section 4.1, the preempt-resume case in Section 4.2. In each section, we first analyze the impact of resource breakdowns on an activity’s

8

duration in case only one resource type is used to execute that activity. This analysis will then be extended to deal with multiple resource types.

4.1 4.1.1

Preempt-Repeat Single resource type

Because of interruptions such as resource breakdowns, the real duration of activity i becomes a stochastic variable Di consisting of a deterministic part di , corresponding to the duration of the activity when no interruption occurs and after which i is terminated, and a stochastic part σi , corresponding to the total failed execution time (i.e. not resulting in activity completion) Xi together with the total repair time Yi . If we denote the length of the r’th failed execution or repair time as Fir , respectively Rir , and the number of interruptions as Ni , we can define: Di σi Xi Yi

= = = =

di + σi Xi + Yi Fi1 + ... + FiNi Ri1 + ... + RiNi

(5) (6) (7) (8)

We can calculate the expected value of σi as follows: E[σi ] = E[Ni ]E[Fi ] + E[Ni ]E[Ri ]

(9)

because we assume that the interfailure times are independent from the repair times. Note that this does unfortunately not hold for the interfailure times and the number of interruptions, because whenever the probability of large interfailure times is relatively high, fewer interruptions will occur on average. Nevertheless, simulation results obtained from an experiment in which the average duration increase was calculated for a wide variety of problem parameters (resource usage, activity duration, failure and repair time distribution) show that assuming independence in our equation does not significantly alter the results. In case we only consider one resource type, the time to restart execution of an interrupted activity i is equal to the time to repair a resource unit used by that activity i, i.e. Ri = Y . Note that this does not hold for Fi . Fi differs from X in two important respects. First of all, whereas X represents the time to failure of a single resource unit, Fi represents the time to interruption of 9

the activity i using ri resource units. This distinction is important because i is supposed to be interrupted as soon as one of the resource units used to execute i breaks down. Secondly, whereas X is able to take on values larger than di , this would clearly not make any sense in our analysis as this does not correspond to a failed execution but to a completion of i and the probability density function (pdf ) should therefore be conditioned on this fact. The distribution function of the time to interruption for activity i can be derived from the failure time distribution of the ri resource units that are used to execute i using Properties 1 and 2 (Blumenfeld, 2001). Property 1. Let X1 , X2 , ..., Xn be independently and identically distributed stochastic variables with cdf (cumulative distribution function) F (x). The minimum of these variables, Z with cdf G(z), will then be distributed as follows: G(z) = 1 − [1 − F (z)]n Property 2. Let X be a stochastic variable with pdf (probability distribution function) f (x). This pdf can be modified into a pdf constrained between 0 and m as follows: g(x) g(x|x < m) = P r(x < m) Apart from the expected value of the time to interruption and the repair time, we also need to know the expected value of the number of interruptions experienced by activity i throughout its execution (Ni ). E[Ni ] can be calculated using Lemma 1. Lemma 1. If we let ψi represent the probability that activity i is interrupted in a preempt-repeat scenario, the expected number of interruptions until i finishes is given by: ψi (10) E[Ni ] = 1 − ψi Proof. If ψi represents the probability that activity i is interrupted then the number of interruptions is obviously distributed with pdf : h(Ni ) = (1 − ψi )ψiNi

(Ni = 0, 1, 2, ...)

(11)

Notice that this pdf is very similar to the pdf of a geometric distribution (Blumenfeld, 2001): 10

Observation 1. The random variable X has a geometric distribution if P (x) is given by P (x) = p(1 − p)x (x = 0, 1, 2, ...) The expected value of the geometric distribution is given by E[x] =

1−p p

Substituting p with (1 − ψ) and substituting x with Ni yields the result in Lemma 1. The parameter ψi can easily be calculated using Property 1. All of the above leads to the following result: Theorem 1. In a preempt-repeat environment with fixed resource allocations, the expected duration extension due to breakdowns for an activity with duration di and resource usage ri of a single renewable resource type for which the time to failure of each resource unit is exponentially distributed with parameter λ and the time to repair is exponentially distributed with parameter µ is given by: 1 1 ψi + − di (12) E[σi ] = 1 − ψi λri µ with ψi = 1 − e−λri di . Proof. First of all, we need to know the expected value of E[Ni ]. Using Equation 10 and Property 1 we get: 1 − e−λri di (13) e−λri di The expected value of the time to interruption is obtained using Properties 1 and 2: Zdi E[Fi ] = xf 0 (x|x < di )dx E[Ni ] =

x=0

1 = ψi

Zdi

xλri e−λri x dx

x=0

1 di (1 − ψi ) = − λri ψi 11

(14)

The repair time is exponentially distributed with parameter µ so that: 1 (15) E[Y ] = µ Substituting (13), (14) and (15) in (9) yields Equation 12. 4.1.2

Multiple resource types

The case of multiple resource types is somewhat more complicated. We now use R different resource types. In each time period of its execution, activity i requires rik units of resource type k. We extend the notation of the time to failure and the repair time with a subscript k to represent the considered resource type. The main differences with the single resource type case are in the calculation of ψi and E[Yi ]. The time to interruption of activity i (Fi ) is now determined by the minimum time to failure over all resource units over all resource types constrained between 0 and di . Let Xmk represent the time to failure of resource unit m of resource type k. Using property 1 we can write: ψi = P r(i interrupted) = P r[min(X11 , X21 , ..., Xri1 1 , ..., X1R , X2R , ..., XriR R ) 6 di ] = 1 − [1 − F1 (di )]ri1 ...[1 − FR (di )]riR

(16)

Allowing for multiple resource types does not change the distribution function of Ni so that Equation 10 remains valid. Deriving the expected value of Ri is more complicated as the distribution function depends on the resource type that causes the disruption and that resource type will therefore determine the length of the downtime. Theorem 2. In a preempt-repeat environment with fixed resource allocations, the expected duration extension due to breakdowns for an activity with duration di and resource usage rik of renewable resource type k for which the time to failure of each resource unit is exponentially distributed with parameter λk and the time to repair is exponentially distributed with parameter µk is given by: ! X λk rik ψi P 1+ − di (17) (1 − ψi )( λk rik ) µk k k

−di

with ψi = 1 − e

P k

λk rik

. 12

Proof. If we let pk represent the probability that the breakdown is caused by resource type k and that therefore the repair time will be exponentially distributed with parameter µk , we can write: E[Ri ] =

X

pk

k

1 µk

(18)

The probability pk is then the probability that the minimum time to failure over all resource units of resource type k is smaller than the minimum time to failure over all resource units of all resource types l 6= k. The property of competing exponentials is used in the calculation of pk : Property 3. Let X and Y be independent stochastic variables that are both exponentially distributed, respectively with parameters λ and µ. The probability that X will be smaller than Y is then: P r(X < Y ) =

λ λ+µ

We use properties 1 and 3 to determine pk : pk = P r(Xkmin < minl6=k (Xlmin )) λk rik P = λl ril λk rik + l6=k

λk rik = P λl ril

(19)

l

Combining (9), (10), (16), (18), (19) and (14) yields Equation 17.

4.2

Preempt-Resume

4.2.1

Single resource type

Again, due to breakdowns, the real duration of activity i is a stochastic variable Di consisting of a deterministic part di , corresponding to the duration of the activity when no interruption occurs and after which i is terminated, and a stochastic part σi . The difference with the preempt-repeat case is that σi

13

now only has to include the total repair time Yi . If we preserve the notation of Section 4.1, Equations 5 through 8 now become: Di = di + σi σi = Yi Yi = Ri1 + ... + RiNi

(20) (21) (22)

From 9 we know that E[σi ] can be calculated when we know the expected number of interruptions E[Ni ] and the expected repair duration E[Y ]. The expected number of interruptions can be calculated by dividing the duration of the activity i by the expected value of the time to interruption. The time to interruption is distributed according to the minimum of ri independently and identically distributed variables X: Theorem 3. In a preempt-resume environment with fixed resource allocations, the expected duration extension due to breakdowns for an activity with duration di and resource usage ri of a single renewable resource type for which the time to failure of each resource unit is exponentially distributed with parameter λ and the time to repair is exponentially distributed with parameter µ is given by: E[σi ] = 4.2.2

λri di µ

(23)

Multiple resource types

These results can easily be extended to the multiple resource type case. When calculating the expected number of interruptions, we now need to consider the minimum time to failure over all resource units over all resource types. Furthermore, we should take care to weigh the mean time to repair with the probability that the disruption is caused by a given resource type. This allows us to determine the expressions for the case of preempt-resume with multiple resource types: Theorem 4. In a preempt-resume environment with fixed resource allocations, the expected duration extension due to breakdowns for an activity with duration di and resource usage rik of renewable resource type k for which the time to failure of each resource unit is exponentially distributed with parameter λk and the time to repair is exponentially distributed with parameter µk 14

is given by: E[σi ] = di

X λk rik k

5

µk

(24)

Time Buffering

The time buffering step consists of two separate phases. The resource allocations phase deals with the construction of a resource flow network to describe non-technological relations between activities (this topic will be thoroughly dealt with in Section 5.2). In the second phase, explicit idle time will be inserted in the initial precedence, resource and deadline feasible schedule S u , so that the preemption of a disrupted activity does not translate into the disruption of a related activity. The decisions variable Bi indicates how many time units an activity should be started beyond its earliest precedence feasible starting time. The vector B comprising the buffer decisions can be decoded into a buffered schedule S b using Algorithm 1. In this algorithm, we first construct a list L that contains the activities ordered according to non-decreasing starting times in the initial schedule S u . Those activities are then scheduled in the order of list L with Li the activity in position i of list L. Each activity is buffered at its earliest precedence feasible starting time increased with a number of time units as indicated by its buffer amount BLi . The earliest feasible starting time is obtained by considering the finish times of the immediate predecessors of activity Li (P REDLi ). Algorithm 1 Decoding procedure 1: 2: 3: 4:

L = (i ∈ N : according to non-decreasing sui ) (tie-break lowest number) sb1 = 0 for i := 2 to n do sbLi = max(suLi , maxj∈P REDLi (sbLj + dLj )) + BLi

Note that this procedure does not explicitly need to consider resource feasibility, as the addition of the resource arcs dictated by the resource flow network resolves the resource conflicts that might have been present. This is a clear advantage in our approach, as it allows us to seriously speed up the buffer insertion step.

15

In what follows, we present a steepest descent time buffering procedure that estimates the objective function value by means of simulation. However, because this approach is computationally quite demanding, we also present a heuristic that uses information regarding expected duration increases due to resource breakdowns that are calculated as shown in Section 4. Finally, since the procedures in Section 4 allow us to translate resource breakdowns into activity duration increases, it will become possible to use approaches developed for the resource-constrained project scheduling problem subject to stochastic activity durations such as the Starting Time Criticality (STC) heuristic by Van de Vonder et al. (2008).

5.1

Simulation-based time buffering

The easiest and most reliable way to estimate the quality of a buffered schedule with respect to the weighted instability cost objective function is by using simulation. Note that except for the buffer insertion procedure, we do not need to use a resource flow network in this case because the average instability costs over a number of simulation runs for a given reactive policy give an accurate representation of the real costs. The schedule is now iteratively buffered as follows. In each iteration every activity (except the dummy start activity) is considered for buffering. We buffer the activity leading to the highest improvement in the objective function value that yields a schedule respecting the deadline constraint. If no such activity can be found, the procedure is terminated. The pseudocode for this steepest-descent approach is given in Algorithm 2, with S(B) the schedule corresponding to the buffer decisions B and I(S(B)) its simulated average instability cost value. The main advantage of using simulation is that it is a pretty reliable approach. As we will see in Section 6, simulation-based time buffering is the best performing time buffering procedure. However, there are also two drawbacks to using simulation. First of all, it does not deliver any new insights into the problem structure. Secondly, simulation is very computationally demanding. This could become a problem in practice whenever very large projects are considered or whenever the decision maker wants to perform what if-analyzes for a wide variety of scenarios. If, on the other hand, the computation times are unimportant, then simulation will clearly be the most interesting approach.

16

Algorithm 2 Time buffering heuristic using simulation 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11:

zbest := Simulate(I(S(B))) while improvement found do i∗ := 0 for i := 2 to n do Bi := Bi + 1 z := Simulate(I(S(B))) if sn 6 δ AND z < zbest then i∗ := i, zbest := z Bi := Bi − 1 if i∗ 6= 0 then Bi∗ := Bi∗ + 1

5.2

Time buffering using surrogate measures

Surrogate measures try to estimate the real instability costs in an efficient and effective manner and are calculated using information regarding activity characteristics and resource breakdown parameters. In Section 4 it was shown how resource breakdowns can be translated into activity duration increases under various scenarios. This information will be used in this section to calculate two surrogate robustness measures. Furthermore, a third measure, based on the estimated probability that the start of an activity has to be postponed, is introduced. All of these surrogate measures use uncertainty information to determine the impact a duration change of the considered activity has on activities that are planned later in time. In order to accurately calculate this impact, we need to introduce the concept of resource flow networks. 5.2.1

Resource flow networks

It is easy to see that disruptions of the starting time of an activity can be caused by a disruption of one of its predecessors. However, such a disruption can also be caused by a disruption of an non-precedence related activity planned earlier in time with which the considered activity shares a scarce resource. Imagine the case of a project in which no precedence relations exist but for which only a limited number of the activities can actually be executed in parallel due to scarce resources. In such a case, it is not hard

17

to see that non-precedence related activities can indeed have an impact on one-another. Propagations of disruptions throughout the network are determined by strictly technological precedence relations as well as by resource-driven precedence relations. Resource flow networks offer an elegant way to represent those resource-driven relations. A resource flow network is an extension of the set of arcs A with the resource flow arcs AR . The resource flow arcs are precedence relations that are added in order to fix resource allocations and that eliminate explicit resource constraints. Artigues and Roubellat (2000) introduce a simple method for generating a resource flow network. The main advantage of this approach is that it is very fast but unfortunately it does not take schedule robustness into account. The Myopic Activity Based Optimization heuristic (MABO) by Van de Vonder (2006) does take robustness into account. However, we decided to use Artigues procedure as it is faster and because we found that testing the pairwise differences between using Artigues procedure and MABO for our problem setting did not yield significantly different results. The reader may be further comforted by knowing that the resource allocations are only kept fixed during the time buffering step. The reason behind this decision is that fixing resource allocations during execution unnecessarily restricts rescheduling flexibility causing instability costs that are on average more than a factor 100 higher than those obtained when releasing resource allocations. 5.2.2

Surrogate measures

The first surrogate objective (Surr1) is calculated as follows: X X Surr1 = wj max(0, si + di + LP Lij + E[σi ] − sj ) j∈N

(25)

i∈P REDj∗

For each activity j all predecessors, immediate as well as transitive (i ∈ P REDj∗ ), are considered given the current extended network (strictly technical as well as resource arcs). For each such predecessor the expected impact of a duration increase of i on the starting time of j is calculated and these values are weighted with the instability weight of activity j and summed. In this equation, LP Lij represents the length of the longest path between activities i and j. This longest path is determined based on the extended network and the given activity durations using a full enumeration approach. 18

The second surrogate objective (Surr2) looks quite similar: X Surr2 = maxi∈P REDj∗ wj max(0, si + di + LP Lij + E[σi ] − sj )

(26)

j∈N

The main difference is that now the maximum starting time disruption for each activity j is calculated over all of its predecessors. Finally, we consider a third measure (Surr3) that is inspired on the starting time criticality heuristic (STC) (Van de Vonder et al., 2008): X Surr3 = ST Cj (27) j∈N

The Starting Time Criticality (STC) heuristic is an elegant approach for generating time buffered schedules when faced with stochastic activity durations. It exploits information about the weights of the activities as well as about the probability distributions of the activity durations. The authors define the starting time criticality of activity j as follows: ST Cj = wj P r(Sj > sj )

(28)

Using the observation that the starting time of activity j is disturbed whenever the duration increase of one of its predecessors i (be they immediate or transitive predecessors) is of such magnitude that it forces the delay of activity j in order to maintain precedence feasibility, the authors calculate the probability that activity j cannot start at its planned starting time sj as follows: X P r(Sj > sj ) = P r(si + LP Lij + Di > sj ) (29) i∈P REDj∗

Here, the assumption is made that predecessors start at their baseline starting times and that only one activity at a time changes Sj . In case activity j is the dummy start activity or in case activity j’s sole predecessor is the dummy start activity, the starting time criticality of j is equal to 0. Because it is very hard to analytically calculate the probability distributions of the durations due to resource breakdowns, we approximate them by using simulation, assuming that resource allocations are fixed. For each activity, 1000 simulation runs are executed to determine the probability for each possible duration outcome, given the mean time to failure and the mean time to repair for each resource type used by that activity. 19

These surrogate measures are now used in the tabu search metaheuristic shown in Algorithm 3 in order to evaluate the performance of the intermediate schedules. The neighbourhood this tabu search procedure considers is obtained by considering a decrease (if possible) or increase in the buffer amount for each of the non-dummy activities. After each such neighbourhood evaluation, the move leading to the highest improvement in the objective function value (calculated using one of the above surrogate measures for the schedule corresponding to B (S(B)) is selected, executed and made tabu for the next T iterations. Furthermore, if no improvement of the globally best solution B best is found, the move is stored in the frequency-based memory F REQ. Algorithm 3 Tabu search based buffer insertion procedure 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19:

B best := B := (0, 0, ..., 0), z best := z := Surr(S(B)) + T ABUi+ := T ABUi− := F REQ+ i := F REQi := 0 (i := 1, ..., n) T := n, iter := 0 while (iter < MAXITER) do i∗ := 0, z := ∞ for i := 2 to n − 1 do if Bi > 0 then Bi := Bi − 1 if ( (Surr(S(B)) < z best OR iter > T ABUi− ) AND Surr(S(B))+ ∗ F REQ− i < z AND sn 6 δ) then i := −i, z := Surr(S(B)) Bi := Bi + 1 Bi := Bi + 1 if ( (Surr(S(B)) < z best OR iter > T ABUi+ ) AND Surr(S(B)) + ∗ F REQ+ i < z AND sn 6 δ) then i := i, z := Surr(S(B)) Bi := Bi − 1 if i∗ 6= 0 then + if (i∗ < 0) then B−i∗ := B−i∗ − 1, T ABU−i := iter + T , ∗ − − F REQ−i∗ := F REQ−i∗ + 1 if (i∗ > 0) then Bi∗ := Bi∗ + 1, T ABUi−∗ := iter + T , F REQ+ i∗ := + F REQi∗ + 1 if (z < z best ) then z best := z, B best := B, F REQ+ := F REQ+ := (0, 0, ..., 0) iter := iter + 1 generate buffered schedule S b corresponding to B best

We now illustrate the use of Surr1 for generating a robust schedule starting 20

from the minimal makespan schedule S u = (0, 0, 2, 4, 0, 7, 7, 9, 13, 18) in Figure 3. Assuming a preempt-repeat scenario with exponentially distributed times to failure (mttf = 15) and repair times (mttr = 3), we can calculate the expected duration increases using Theorem 1. This yields (rounded to the nearest integer): E(σ) = (0, 1, 17, 5, 9, 24, 7, 6, 3, 0). Using the longest path values shown in Table 2 that correspond to the resource flow network in Figure 5, we can calculate the surrogate objective function values for buffering activities 2 through 9 yielding: (1457, 1447, 1520, 1549, 1452, 1501, 1449, 1460). The lowest value is obtained when buffering activity 3, yielding the schedule S = (0, 0, 3, 4, 0, 7, 7, 10, 14, 18). The procedure finally terminates with the schedule in Figure 6. ——————————— Insert Table 2 ——————————— ——————————— Insert Figure 5 ——————————— ——————————— Insert Figure 6 ———————————

5.3

Time buffering using the STC heuristic

In addition to the approaches presented in the previous two paragraphs, we also implemented the STC heuristic. This heuristic iteratively buffers the activity with the highest STC value so that the deadline constraint is respected until no more activities with an STC value larger than zero can be buffered without creating a deadline infeasible schedule. The STC values are calculated as shown in Equations 28 and 29. For more details regarding its operation, we refer the reader to Van de Vonder et al. (2008).

6

Computational Experiment

In order to compare the performance of simulation based time buffering, time buffering using surrogate measures and the ST C heuristic, we set up an extensive computational experiment. As a test set we used the 480 30-activity instances of PSPLIB set of test instances developed by Kolisch and Sprecher (1997). Mean times to failure were drawn from a uniform distribution between the minimal project makespan (obtained by solving the deterministic RCPSP) and two times this minimal makespan whereas mean repair times were drawn from a uniform distribution between 1 and 5. The instability weights were drawn from a triangularly shaped distribution between 1 and 10. The weight of the dummy 21

end activity, however, was set to 10 times the average of this distribution in order to reflect the relatively higher importance of finishing the project in time than meeting individual milestones. Finally, the project deadline was set at the project’s minimal makespan increased with 30%. Baseline schedules were constructed using the flowchart in Figure 1. This means that one either chooses to use resource buffering (res buf ) or not (no res buf ) followed by the generation of an initial schedule using minimal makespan scheduling (RCPSP ) or ‘highest CIW first’ scheduling (CIW ). This schedule can then be made robust using simulation based time buffering (SIM ), surrogate measure based time buffering (SU RR1 ,SU RR2 ,SU RR3 ), the ST C heuristic (ST C) or not using time buffering at all (None). Average instability costs are calculated by means of simulation using 10 repetitions per instance yielding the results in Figures 7 and 8. In our experiment, we terminated the tabu search procedure used for time buffering based on surrogate measures after 1000 iterations. Further increasing the number of iterations did not significantly improve our results. Furthermore, we chose to restrict the number of simulation runs in the simulation based heuristic to 100 in order to keep computation times within an acceptable level and because time buffering based on simulation is presented here mostly as a benchmark for assessing the performance of the surrogate measures that were proposed in the previous sections. ——————————— Insert Figure 7 ——————————— ——————————— Insert Figure 8 ——————————— If we take a look at these figures, we immediately remark that the best performance is obtained whenever resource buffered (res buf) minimal makespan scheduling (RCPSP) is combined with simulation based time buffering (SIM). The worst result, on the other hand, corresponds to using minimal makespan scheduling without any form of buffering. Using minimal makespan scheduling combined with resource buffering and simulation based time buffering allows us to improve on simple minimal makespan scheduling without buffering with 91% for preempt-repeat and 93% for preempt-resume. Using the best non-simulation time buffering heuristic (Surr1 for preempt-repeat and ST C for preempt-resume) decreases these percentages to respectively 84% and 89%. Let us also consider the impact of resource buffering and of initial schedule generation. As long as resource buffering nor time buffering are used, ‘highest CIW first’ scheduling performs better than minimal makespan scheduling. This changes whenever a form of buffering is included. Resource buffering 22

always performs better than no resource buffering. This is especially noticeable if no time buffering is used. In that case, the performance difference is a factor 3 on average. Moreover, the results of resource buffering without time buffering are almost never outperformed by those of time buffering without resource buffering. One exception is the case of simulation-based time buffering which performs on average 2 times better than pure resource-based buffering. ——————————— Insert Table 3 ——————————— The average computation times are given in Table 3. Those for the nontime buffered proactive strategies are negligibly small. As expected, ‘highest CIW first’ scheduling is very fast as it uses the quick serial schedule generation scheme, but minimal makespan scheduling requires significantly more time due to the more time intensive branch-and-bound procedure. Observe that resource buffered scheduling is slightly slower that no resource buffering. This is not because of the average availability calculation which is very fast, but due to the fact that recalculation of the corresponding schedule might be required in case the average availability is insufficient to meet the project deadline. We can see from the results of the surrogate measures that our tabu search procedure is very fast compared with the steepest descent STC heuristic. However, using tabu search for the simulation based time buffering procedure caused computation times to explode. Even now, using the simple steepest descent heuristic, simulation based time buffering is on average a factor 6 slower than surrogate measure based time buffering procedures. For our instances, these computation times are still acceptable, but this will not necessarily be the case for practical project networks consisting of 300 or more activities.

7

Conclusions

We can conclude that time buffering is a very interesting alternative for incorporating robustness into a schedule. We gave an overview of analytical approaches for determining the expected duration increase an activity experiences due to resource breakdowns. Those results are used to create an effective and efficient algorithm for inserting explicit idle time into an initial, unbuffered schedule in order to protect it from the propagation of disruptions throughout the project network. It was shown that time buffering based on simulation performs far better than surrogate objective functions, 23

but the reader should keep the higher computational demands in mind. Especially in practical project scheduling those computational demands will often become prohibitive. Therefore we suggest to either implement time buffering based on the first surrogate objective function or using the ST C heuristic. ST C offers the additional advantage that it has proven to be a good buffering strategy in case stochastic activity durations are considered. It would therefore be an interesting topic for further research to develop an integrated approach combining uncertain activity durations with unexpected machine breakdowns. The advantages of robust project scheduling for practical project management are obvious. Less rescheduling and replanning allows for a decrease in the costs resulting from those actions. Furthermore, the project manager will be able to quote reliable milestone delivery dates facilitating negotiations with customers and sub-contractors.

References Artigues, C. and Roubellat, F. (2000). A polynomial activity insertion algorithm in a multi-resource schedule with cumulative constraints and multiple modes. European Journal of Operational Research, 127:294–316. Barlow, R. and Proschan, F. (1996). Mathematical Theory of Reliability. John Wiley & Sons Inc, New York. Blumenfeld, D. (2001). Operations research calculations handbook. CRC Press LLC, Florida. Brucker, P., Drexl, A., M¨ohring, R., Neumann, K., and Pesch, E. (1999). Resource-constrained project scheduling: Notation, classification, models and methods. European Journal of Operational Research, 112:3–41. Demeulemeester, E. and Herroelen, W. (2002). Project scheduling - A research handbook, volume 49 of International Series in Operations Research & Management Science. Kluwer Academic Publishers, Boston. Herroelen, W., De Reyck, B., and Demeulemeester, E. (1998). Resourceconstrained scheduling: A survey of recent developments. Computers and Operations Research, 25:279–302. Herroelen, W. and Leus, R. (2004). The construction of stable project baseline schedules. European Journal of Operational Research, 156(3):550–565. 24

Kolisch, R. and Sprecher, A. (1997). PSPLIB - A project scheduling library. European Journal of Operational Research, 96:205–216. Lambrechts, O., Demeulemeester, E., and Herroelen, W. (2007a). Exact and suboptimal reactive strategies for resource-constrained project scheduling with uncertain resource availabilities. Research Report KBI0702, Department of Decision Sciences and Information Management, K.U.Leuven, Belgium, 36 pp. Lambrechts, O., Demeulemeester, E., and Herroelen, W. (2007b). A tabu search procedure for developing robust predictive project schedules. International Journal of Production Economics, 111(2):493–508. Lambrechts, O., Demeulemeester, E., and Herroelen, W. (2008). Proactive and reactive strategies for resource-constrained project scheduling with uncertain resource availabilities. Journal of Scheduling, 11(2):121–136. Leus, R. (2004). The generation of stable project plans. 4OR, The Quarterly Journal of the Belgian, French and Italian Operations Research Societies, 2(3):251–254. Leus, R. and Herroelen, W. (2004). Stability and resource allocation in project planning. IIE Transactions, 36(7):667–682. O’Donovan, R., Uzsoy, R., and McKay, K. (1999). Predictable scheduling of a single machine with breakdowns and sensitive jobs. International Journal of Production Research, 37(18):4217–4233. Schatteman, D., Herroelen, W., Van de Vonder, S., and Boone, A. (2008). A methodology for integrated risk management and proactive scheduling of construction projects. Journal of Construction Engineering and Management, 134(11):885–893. Van de Vonder, S. (2006). Proactive-reactive procedures for robust project scheduling. PhD thesis, Faculty of Economics and Applied Economics, Department of Decision Sciences and Information Management, K.U. Leuven, Belgium. Van de Vonder, S., Ballestin, F., Demeulemeester, E., and Herroelen, W. (2007a). Heuristic procedures for reactive project scheduling. Computers & Industrial Engineering, 52(1):11–28. 25

Van de Vonder, S., Demeulemeester, E., and Herroelen, W. (2007b). Heuristic procedures for generating stable project baseline schedules. European Journal of Operational Research, to appear. Van de Vonder, S., Demeulemeester, E., and Herroelen, W. (2008). Proactive heuristic procedures for robust project scheduling: an experimental analysis. European Journal of Operational Research, 189(3):723–733. Van de Vonder, S., Demeulemeester, E., Herroelen, W., and Leus, R. (2005). The use of buffers in project management: The trade-off between stability and makespan. International Journal of Production Economics, 97(2):227– 240. Van de Vonder, S., Demeulemeester, E., Herroelen, W., and Leus, R. (2006). The trade-off between stability and makespan in resourceconstrained project scheduling. International Journal of Production Research, 44(2):215–236. Yu, G. and Qi, X. (2004). Disruption Management - Framework, models and applications. World Scientific, New Jersey.

26

Table 1: Notation di σi Di rik P REDi P REDi∗ wi LP Lij δ Lp si Schedule Si SCHEDU LE Bi Busyt ak Akt Xmk M T T Fk Ymk M T T Rk Xi Yi Ni Fip Rip ψi

deterministic duration of activity i stochastic duration increase of i due to breakdowns and repairs stochastic duration of activity i due to breakdowns and repairs per period usage of resource type k for activity i set of immediate predecessors of activity i set of immediate and transitive predecessors of activity i instability weight of activity i length of the longest path between activities i and j project deadline element in position p of list L planned starting time of activity i vector containing the baseline starting times si real, stochastic starting time of activity i vector containing the real starting times Si buffer amount for i set of activities in progress during period t number of renewable resource units of type k allocated to the project real availability of resource type k in period t stochastic time to failure of resource unit m of resource type k expected value of Xmk (λk = M T1T F ) k stochastic repair time of resource unit m of resource type k 1 expected value of Ymk (µk = M T T R ) k stochastic part of σi attributable to failed executions stochastic part of σi attributable to repairs stochastic number of interruptions experienced by i before completion stochastic duration of the p’th failed execution experienced by i stochastic duration of the p’th repair experienced by i probability that i is interrupted before it is terminated

Table 2: Longest path lengths for all (i, j)

1 2 3 4 5 6 7 1 / 0 2 4 0 7 7 2 / / 0 0 / 3 3 3 / / / / / / / 4 / / / / / 0 0 5 / / / 0 / 3 3 6 / / / / / / / 7 / / / / / / / 8 / / / / / / / 9 / / / / / / / 10 / / / / / / /

27

8 9 10 9 13 15 7 11 13 0 4 6 / 6 8 0 9 11 / / 0 / 0 2 / 0 2 / / 0 / / /

Table 3: Computation times

no res buf no time buf res buf no res buf Surr1 res buf no res buf Surr2 res buf no res buf Surr3 res buf no res buf ST C res buf no res buf SIM res buf

repeat

resume

average

RCPSP

0.06 s

0.06 s

0.06 s

CIW

0.00 s

0.00 s

0.00 s

RCPSP

0.08 s

0.08 s

0.08 s

CIW

0.00 s

0.00 s

0.00 s

RCPSP

0.41 s

0.43 s

0.42 s

CIW

0.62 s

0.64 s

0.63 s

RCPSP

0.53 s

0.55 s

0.54 s

CIW

0.85 s

0.87 s

0.86 s

RCPSP

0.43 s

0.46 s

0.44 s

CIW

0.65 s

0.67 s

0.66 s

RCPSP

0.55 s

0.58 s

0.56 s

CIW

0.88 s

0.90 s

0.89 s

RCPSP

0.57 s

0.55 s

0.56 s

CIW

0.78 s

0.76 s

0.77 s

RCPSP

0.69 s

0.67 s

0.68 s

CIW

1.02 s

0.99 s

1.00 s

RCPSP

0.28 s

0.23 s

0.25 s

CIW

0.50 s

0.45 s

0.47 s

RCPSP

0.41 s

0.36 s

0.38 s

CIW

0.74 s

0.69 s

0.72 s

RCPSP

3.80 s

3.41 s

3.70 s

CIW

3.04 s

2.71 s

2.87 s

RCPSP

4.44 s

3.86 s

4.15 s

CIW

3.64 s

3.14 s

3.39 s

28

Figure 1: Robust schedule construction

29

Figure 2: Example project network

Figure 3: Minimal makespan schedule with fixed allocations

30

Figure 4: Minimal makespan schedule with fixed allocations after a resource breakdown (preempt-repeat)

Figure 5: Resource flow network

31

Figure 6: Buffered schedule

Figure 7: Instability costs for preempt-repeat

32

Figure 8: Instability costs for preempt-resume

33