Blind Fair Routing in Large-Scale Service Systems - CiteSeerX

Blind Fair Routing in Large-Scale Service Systems Mor Armony1

Amy R. Ward2

February 19, 2010 Abstract In a call center, arriving customers must be routed to available servers, and servers that have just become available must be scheduled to help waiting customers. These dynamic routing and scheduling decisions are very difficult, because customers have different needs and servers have different skill levels. A further complication is that it is preferable that these decisions are made blindly; that is, they depend only on the system state and not on system parameter information such as call arrival rates and service speeds. This is because this information is generally not known with certainty. Ideally, a dynamic control policy for making routing and scheduling decisions balances customer and server needs, by keeping customer delays low, but still fairly dividing the workload amongst the various servers. In this paper, we propose a blind dynamic control policy that routes according to a longest weighted idle server first rule, and schedules according to a generalized cµ rule. We show that this policy is asymptotically optimal in the Halfin-Whitt many-server heavy traffic limit with respect to a finite time horizon problem that requires a fair division of idle time amongst servers be maintained at all times, and also performs well for a relaxed version of this problem that only requires that server fairness is achieved in the long run.

Acknowledgement: We thank Itay Gurvich and Avi Mandelbaum for many valuable discussions. 1 2

Stern School of Business, New York University, [email protected]. Marshall School of Business , University of Southern California, [email protected]

1

1

Introduction

Large scale call centers and other parallel server service systems with heterogeneous customer and server populations have brought up the need for skill-based control policies that dynamically match waiting customers and available servers. Indeed, many such dynamic control policies have been developed in industry and academia over the last two decades. Traditionally, these policies have been customer-centric in the sense that they have focused on customer related goals, such as assigning customers to servers whose expertise best match the needs of these customers and minimizing customer wait time. More recently, the recognition that employee satisfaction is important for business success has led to policies that also consider servercentric goals, such as considering server preference when assigning customers to servers and making sure all servers have some idle time. This leads to the desire for dynamic control policies that balance customer and server goals. For example, the control policies in Armony [2], Gurvich and Whitt [15], and Dai and Tezcan [12] asymptotically minimize customer delay cost, but this is at the expense of the faster servers having a heavier workload, and the slower servers experiencing most all of the idle time. This is unfair. At the same time, fairness has been shown by psychologists to be a key component in employee satisfaction [10, 11]. This motivates defining a fair policy and having a revised goal of minimizing customer delay cost within the smaller class of fair policies. It is also the case that model parameters may not be known. In particular, parameters such as arrival rates and mean service times can have large forecast errors associated with them. However, many of the dynamic control policies for parallel server systems that have been proposed rely on the assumption that these parameters are known. This issue calls for policies that either estimate these parameters in real time, or do not use parameter information at all. We refer to the latter as blind policies. We are interested in finding a blind dynamic control policy that minimizes customer delay cost subject to fairness constraints with respect to how the system idleness is divided among servers. We would like the policy to be simple and easily implementable. The issue is that finding dynamic control policies for parallel server systems that are optimal in some sense is a notoriously difficult problem due to the “curse of dimensionality”. Furthermore, it is also true that in general optimal policies that are simple and easily implementable can only be found for relatively simple models. For larger and more complex models, there is a need for simple and easily implementable policies 2

that are approximately optimal. We will follow the approach of looking for such policies by considering heavy traffic asymptotics. More specifically, we will evaluate system performance in the many-server HalfinWhitt heavy traffic regime. There are two components to any control policy: a routing component that specifies which server should handle an arriving call when more than one server is available, and a scheduling component that specifies which call a newly available server should take when more than one customer is waiting. We propose a policy that routes newly arrived customers to the server whose weighted idle time is the longest, and schedules newly available servers based on a generalized version of the familiar cµ rule. We call this policy the LWI-Gcµ policy, for longest weighted idleness and generalized cµ. The LWI-Gcµ policy is blind, simple, and easily implementable. We show in this paper that it performs well both for a finite time horizon version and a steady-state version of the aforementioned optimization problem that minimizes customer delay cost subject to fairness constraints. The model formulation we consider here generalizes the one studied in Armony and Ward [3]. In that paper, the model is the inverted-V model, in which customers are homogeneous, and servers are heterogeneous with respect to the speed at which they serve customers. More specifically, there is a single class of customers and multiple server pools that are determined by service speed. The problem considered here has multiple customer classes, and server pools that may also be differentiated by the set of customer classes the servers in a particular pool have the skill to serve. In the setting of [3], we showed that a threshold routing policy is asymptotically optimal in the Halfin-Whitt many-server heavy traffic regime with respect to a steady-state problem formulation, where customers are assigned to server pools based on a set of thresholds on the total number of customers in the system. Here, the same policy is optimal in the limit, with the additional generalized cµ scheduling component, and we call this policy the TR-Gcµ policy. One main drawback of the TR-Gcµ policy is that it is highly dependent on the system parameters, and is, therefore, not blind. We show that the LWI-Gcµ policy is asymptotically optimal in the Halfin-Whitt limiting regime with respect to a finite horizon problem formulation, in which fairness is required to be maintained at each point in time, with high probability, and performs well compared to the non-blind TR-Gcµ policy with respect to another, more relaxed, steady-state formulation. Interestingly, we show that, in the limit, the percentage cost increase between LWI-Gcµ and TR-Gcµ does not depend on the actual cost function. This is significant, as it suggests that the blind LWI-Gcµ performs well across the board for large systems. 3

The key to establishing the asymptotic optimality of LWI-Gcµ with respect to the finite horizon problem is to show that fairness is indeed asymptotically achieved at each point in time. In essence, this means that, for every pair of server pools, there is a unique ratio between the idling time of the server that has been idle the longest in these two pools. This is a form of state-space collapse (SSC) that holds in the limit, as knowing the idling time of the server that has been idle the longest in one pool, implies the knowledge of the longest idling time in the other pools. To prove this state-space collapse, our paper builds on the methodology developed in [14] and [15] where SSC is first shown to hold for a sequence of stopped processes. Then the corresponding stopping times are shown to be diverging to infinity, and hence are inconsequential in the limit. In order to establish that the LWI-Gcµ policy performs well with respect to the steady-state problem formulation, we must have a comparison policy. To see that the TR-Gcµ policy is an appropriate comparison policy, we solve the limiting steady-state problem that emerges in the Halfin-Whitt many-server heavy traffic regime. This is possible because there is a separability that emerges in that regime. The routing component determines how to minimize the total number in queue, subject to fairness. The scheduling component divides this queue length between the various customer classes so as to minimize delay cost. In particular, the routing ignores the cost function, and the scheduling ignores the fairness constraints. One theme that comes out in this paper is duality between customers and servers in the QED regime. More specifically, there is a duality between customer delay and server idleness. Customer delay is the time between a customer arrival and the start of his service, while server idleness is the time between a service completion by the server and the start of a new service by the same server. In the QED regime, customer delay and server idleness a) both are of an order of magnitude that is inversely proportional to the squareroot of the system arrival rate, b) both satisfy the snapshot principle, which is an asymptotic form of Little’s law that holds at each time point, and c) in a Markovian system, both satisfy an asymptotic PASTA property. We complete this introduction with a brief review of the most relevant literature. Then, in Section 2, we specify the details of our model and formulate the finite time and steady state optimization problems with fairness constraints that we would like to solve. We present the LWI-Gcµ policy in Section 3, and the manyserver heavy-traffic limiting regime in Section 4. We prove the LWI-Gcµ policy is asymptotically optimal with respect to the finite time horizon problem in Section 5. In Section 6, we establish that the TR-Gcµ policy is optimal for the limiting version of the steady-state problem, and compare the performance of the LWI-Gcµ and TR-Gcµ policies. We end with some concluding remarks and directions for future research 4

in Section 7.

1.1

Literature Review

The literature on skill-based routing is extensive. A recent survey of this literature is included in Aksin, Armony, and Mehrotra [1]. Here we survey only the most closely related stream of literature. We begin with the paper by Gurvich and Whitt [15]. That paper shows that a generalized cµ scheduling rule combined with a fastest-server-first routing rule stochastically minimizes customer delay costs over a finite horizon in a parallel server system with multiple customer classes and multiple server pools when the delay cost functions are strictly convex in the Halfin-Whitt many-server limiting regime3 . The generalized cµ scheduling rule follows Van Mieghem [21] and Mandelbaum and Stoylar [20], who proved asymptotic optimality of this scheduling rule in the conventional heavy traffic limit. Under fastest-server-first routing, nearly all of the system idle time is experienced by the servers from the pool with the slowest service speed. From the server perspective, this is unfair. Our finite time horizon problem formulation is exactly the customer delay cost minimization problem in [15] with an added fairness constraint. It is also true in the closely related work by Dai and Tezcan [12] that nearly all the system idle time is experienced by the servers from the pool with the slowest service speed. Their model is a parallel server system model that also allows for customer reneging, and they show an asymptotic optimality result that is similar to [15], but uses different proof techniques. The holding and reneging costs are restricted to be linear, and the reneging rates are assumed to be ordered so that classes that are less expensive to hold also have higher reneging rates. In addition to the paper [3] discussed in the Introduction, there are several recent papers that address the server fairness issue, all in the context of the inverted-V model. Atar [6] proposes a blind policy that routes newly arrived customers to the server that has been idle the longest. The routing component of our proposed LWI-Gcµ policy extends his policy by weighting the server pools, and routing calls to the server that has experienced the longest weighted idle time. This extension is significant because it allows for great flexibility in terms of the fairness related parameters, and also introduces new technical challenges. Atar, Shaki, and Shwartz [7] analyze the blind policy that routes newly arrived customers to the pool that has 3

Actually, the proposed scheduling rule in [15]is more general than generalized cµ, and their optimality results require the less

restrictive condition that the delay cost functions are convex. However, it is only in the case of strictly convex delay cost functions that the proposed scheduling rule is asymptotically equivalent to a generalized cµ rule.

5

the longest weighted cumulative idleness process, where the cumulative idleness process for a particular pool is defined by summing over the idle times each server in that pool has experienced since time 0. The policy proposed in Tseytlin [25] balances the number of idle servers from each pool by routing newly arrived customers randomly, with the probability that a given pool is chosen being proportional to the number of idle servers in that pool divided by the total number of idle servers in the system. To the best of our knowledge, this is the first paper that considers the server fairness issue in a more general parallel server system, that has both multiple server pools and multiple customer classes.

2

Model and Problem Formulation

Consider a Parallel-Server System (PSS), as shown in Figure 1, with fixed sets I = {1, . . . , I} that represents customer classes and J = {1, . . . , J} that represents server types (each type in its own server pool). Customers arrive exogenously according to independent Poisson processes Ai , i ∈ I. The class i arrival P rate is λi . The total arrival rate to the system is λ := Ii=1 λi . Customers from each class enter service in the order of their arrival. Service times are independent and exponential, and the average service time of a P customer served by a server from pool j ∈ J is 1/µj . There are Nj servers in pool j. We let N = j∈J Nj ~ = (N1 , . . . , NJ ) denote the staffing vector. Here and elsewhere, ~x is be the total number of servers and N used to denote a vector whose elements are x1 , x2 , . . .. Note that we follow the notation in [14].

OI

O1

Routing and Scheduling

P1

PJ

Figure 1: An Example Parallel Server System.

6

The set of possible assignments of customers to servers in this system can be represented as the set E of edges in the bipartite graph G formed from the set of nodes I ∪ J having allowable edges E = {(i, j) ∈ I × J }. An edge (i, j) ∈ E corresponds to pool j being able to serve class i customers. Given the assignment graph G := (I ∪ J , E) with E ⊂ E, we let I(j) := {i ∈ I : (i, j) ∈ E} J(i) := {j ∈ J : (i, j) ∈ E}. In words, I(j) is the set of classes that a pool j server can serve, and J(i) is the set of all server pools that can serve class i. A control policy has two components: a routing component that specifies what to do when a customer arrives to the system, and a scheduling component that specifies what to do when a server completes service ³ ´ ~ a policy that operates in the system with arrival rate and becomes available. Denote by π := π ~λ, N ~λ and staffing vector N ~ (in general, we omit the arguments ~λ and N ~ when it is clear from the context which arguments should be used). Let t ≥ 0 be an arbitrary time point. We denote the queue-length of class i customers by Qi (t; π) and the number of idle servers in pool j by Ij (t; π). Then, the total number P of customers queueing is QΣ (t; π) := i∈I Qi (t; π), and the total number of idle servers is IΣ (t; π) := P j∈J Ij (t; π). Let Zj (t; π) be the number of busy servers in pool j. (Note that we do not need to track which servers are serving which customer classes, because we have assumed the rate at which a server serves a customer depends only on his pool, and not on the customer’s class.) It follows that Ij (t; π) = P Nj − Zj (t; π) and IΣ (t; π) = N − j∈J Zj (t; π). We let Sj , j ∈ J , be independent, unit rate Poisson processes, also independent of Ai , i ∈ I, so that ¶ µ Z t ¶ µ Z t Zj (s; π)ds = Sj µj (Nj − Ij (s; π)) ds Dj (t; π) := Sj µj 0

0

represents the cumulative number of service completions by pool j servers. The overall number of customers in the system is XΣ (t; π) :=

X

Qi (t; π) +

i∈I

X

Zj (t; π).

j∈J

We let Wi (t; π) be the waiting time of the customer in class i ∈ I that has been waiting the longest and Yj (t; π) be the idle time of the server in pool j ∈ J that has been idle the longest. Also, let Vi (t; π) denote the virtual class i waiting time and Uj (t; π) denote the virtual pool j idle time. That is, Vi (t; π) (Uj (t; π)) is the amount of time a class i customer (pool j server) that arrived (became idle) at time t would have to wait to receive service (become busy). 7

We omit the time argument when we refer to an entire process. We use t = ∞ when we refer to a process in steady-state. Also, we omit π from the notation unless it is necessary to avoid confusion between different routing policies. Throughout the paper we assume, for simplicity, that Qi (0) = Ij (0) = 0, i ∈ I, j ∈ J . This, in particular, implies that Wi (0) = Yj (0) = 0, i ∈ I, j ∈ J . Let Π be the set of non-anticipating, non-preemptive policies, under which a steady-state exists for XΣ , Qi , i ∈ I, and Zj , j ∈ J (which implies a steady-state exists for Vi , i ∈ I and Uj , j ∈ J ). Non-anticipating (roughly speaking) means that the policy cannot require knowledge of the future. By non-preemptive, we mean that once a call is assigned to a particular server, it cannot be transferred to another server of a different pool, nor can it be preempted by another call. We also assume that any policy π ∈ Π serves customers firstcome first-served within each customer class and, similarly, that servers become busy in accordance with a first-idle first-busy policy within each server pool. It follows from this assumption that Wi (t) is identical to the waiting time of the customer at the head of class i at time t, and that Yj (t) is identical to the idle time of the server at the “head” of pool j. We call a policy π ∈ Π an admissible policy, and restrict our attention to such policies. A good control policy results in both low delays for customers and a fair division of idle time between the servers. Let Ci : 0 are small, Wi,k is the waiting time experienced, up to time T , by the k th class i customer who P U (t;π) has arrived after time 0, and UΣ (t; π) := j∈J Uj (t; π). By convention, we set UΣj (t;π) := ηj , whenever UΣ (t; π) = 0. We call the constraint in (1) the fairness constraint, and the parameters ηj , j ∈ J , the fairness parameters. An ideal control policy π ∈ Π is a blind policy that solves (1). By blind, we mean that the policy does 8

not require knowledge of system parameters such as arrival rates, service rates, and pool sizes. Our first objective is to find a blind policy π ∈ Π that solves (1). The problem (1) intends to minimize the finite horizon average delay cost, subject to fairness with respect to idling time, at each point in time over this horizon. This raises the following questions. What happens if one is only concerned with obtaining fairness on average, or in the long run? How well does a policy that is optimal for (1) perform for a steady-state formulation? Our second objective is to find a blind policy π ∈ Π that performs well for a steady-state formulation. Note that, in general, we cannot expect that a policy that is optimal for (1) is also optimal for a steadystate formulation. This is because it is a relaxation of the constraint in (1) to require that fairness is only achieved in the long run; that is, that EUj (∞; π) = ηj EUΣ (∞; π). In the absence of the fairness constraint, we expect that any policy that minimizes the finite horizon average delay cost (the objective function in (1)) for any T > 0 will also minimize the steady-state delay cost; that is, such a policy will also minimize I X

ai E [Ci (Vi (∞; π))] ,

i=1

for ai := λi /λ, i ∈ I, under the technical condition of tightness. The discussion in the preceding paragraph leads us to also consider the following problem minimizeπ∈Π subject to:

PI

i=1 ai E

[Ci (Vi (∞; π))]

(2)

EUj (∞; π) = ηj EUΣ (∞; π), j ∈ J .

Then, we can compare the performance between a policy that solves (1) and a policy that solves (2).

3

A Blind Routing and Scheduling Policy

We propose a policy that schedules servers using a generalized cµ-rule, and routes according to a weighted longest-idle-server first policy. We will show that this policy is asymptotically optimal in the Halfin-Whitt many-server heavy traffic regime [16] for the finite time horizon problem (1), and that it also performs well for the steady-state problem (2).

9

Definition 3.1. The Longest Weighted Idleness and Delay Generalized cµ Scheduling (LWI-Gcµ) Policy: Upon the arrival of a class i customer at time t, the customer will be routed to an available server in pool j ? , where j ? := j ? (t) = max

argmax j∈J(i),Yj (t)>0

{Yj (t)/ηj } ;

i.e., the customer will be routed to the server that has the longest weighted idleness. If there are no servers available, the customer waits in queue i, to be served in the order of arrival. Upon service completion by a type j server at time t, the server will admit to service the customer from the head of queue i? := i? (t) = max

argmax i∈I(j),Qi (t)>0

0

ai Ci (Wi (t)) ;

i.e., the server prioritizes classes using a generalized cµ rule that ranks classes according to the cost of the waiting time of the customer that has been waiting the longest in each class. The LWI-Gcµ policy is blind in the sense that it does not require knowledge of the system parameters such as the arrival rates, the service rates, and the pool sizes. The only parameters that are required for its implementation are the customers class proportions, ai , i ∈ I. If these parameters are not known, one can replace them by the proportion of arrivals of each class over a certain time window. Also note, that while the performance in (1) is stated in terms of the variables Wi,k and Uj , the LWI-Gcµ policy uses instead the head-of-the-line waiting time and idleness variables, Wi and Yj . This is because a policy in Π is required to be non-anticipative, and the value of the actual delay and virtual idleness are not known at time t. It is difficult to solve either (1) or (2) exactly. However, this is possible in the Halfin-Whitt manyserver heavy traffic regime (defined in Section 4). This is because in that regime there is the following separability in the routing and scheduling. The routing component controls the idleness in each server pool, and additionally fixes the total number of customers waiting for service. The scheduling component then decides how that total number of customers waiting for service is divided amongst the different classes. We perform a simulation study that illustrates the aforementioned separability. Specifically, for the N model shown in Figure 2, Figure 3 presents the results of a simulation study that compares the queue sizes under two different policies that route identically but schedule differently. In the N -model, routing is only 10

O1

P1 N1

O2

800

P2

1

N2

340

800

2 650

Figure 2: The N -model used in our simulations. relevant for class 1, and both policies route as in the LWI-Gcµ rule; that is, j ? (t) = argmax {Y1 (t)/η1 , Y2 (t)/η2 } .

(3)

j=1,2

Scheduling is only relevant for pool 2. The longest delay (LD) policy is identical to the generalized cµ rule in the LWI-Gcµ policy for quadratic cost functions that are identical among customer classes, and has i? (t) = argmax{W1 (t), W2 (t)},

(4)

i=1,2

We arbitrarily compare LD to the static priority (SP) policy that gives priority to class 1 whenever there are waiting customers from both classes. Note that the total average number of customers waiting for service is close under both policies (within 6%). However, the number of customers waiting from each class is divided evenly under LD, while SP mostly maintains only class 2 customers waiting. The reported results here, and in every simulation study in this paper, unless otherwise noted, show the average over 100 simulation runs, where each run has a 100,000 arrival “warm-up” period (in which statistics are not recorded), and then 500,000 subsequent arrivals (for which statistics are recorded). The reason separability is important is that it allows us to ignore the constraints in the problem formulations (1) and (2) when solving for a scheduling policy that minimizes the objective function. Therefore, we can be confident that a scheduling policy that is optimal asymptotically, in the Halfin-Whitt many-server limit regime, schedules according to the generalized cµ rule. This is because the generalized cµ rule was shown in [15] to be asymptotically optimal for the finite-time horizon problem formulation (1) when there are no constraints, and we also expect that the generalized cµ rule is asymptotically optimal for the steady11

‫ޝ‬ ‫ޝ‬

(a) The total number of customers waiting for service as a func- (b) The number of class 1 and class 2 customers waiting for sertion of η2 for fixed η1 = 1.

vice as a function of η2 for fixed η1 = 1.

Figure 3: A simulation comparison of the performance of the policy LD that schedules according to (4) and the SP policy that gives static priority to class 1 when scheduling. Both policies route according to (3). state problem formulation (2) when there are no constraints, under the technical condition that there is tightness. Therefore, we need only focus on the routing component to establish how well the LWI-Gcµ policy performs for the problem formulations (1) and (2).

4

The Many-Server Heavy Traffic Limiting Regime

It is difficult to solve either (1) or (2) exactly. Our approach is to solve the problems asymptotically in the Halfin-Whitt many-server heavy-traffic limit regime [16]; i.e., for large systems (systems with many servers and large demand) that are heavily loaded. Specifically, we consider a family of systems indexed by the aggregate arrival rate λ and let λ → ∞. The service rates µj , j ∈ J , the routing graph G, and the ratios ai = λi /λ are all held fixed. The associated family of staffing vectors is N λ := (N1λ , . . . , NJλ ). Our convention is to superscript all processes and quantities associated with the system having arrival rate λ by λ. We also define the scaled processes √ √ Qλ i (t) ˆ λ (t) := λW λ (t), Vˆ λ (t) := λV λ (t), i √ , W i i i i λ √ λ √ λ Ijλ (t) λ λ λ ˆ ˆ ˆ Ij (t) := √λ , Yj (t) := λYj (t), Uj (t) := λUj (t), j ∈ √ ˆ λ = λW λ , i ∈ I, k = 1, 2, . . . . W i,k i,k

ˆ λ (t) := Q i

12

∈ I, J,

and λ (t) λ IΣλ (t) − NΣλ Σ ˆ λΣ (t) := Q√ ˆ Σλ (t) := XΣ (t) √ , IˆΣλ (t) := √ , and X . Q λ λ λ

We assume that the number of servers in each pool is of the same order as the arrival rate, and that the system is heavily loaded. Assumption 4.1. There is a strictly positive vector ν that satisfies

PJ

j=1 µj νj

= 1 such that

Njλ lim → νj , j ∈ J . λ→∞ λ Furthermore,

Njλ − νj λ √ = θj , j ∈ J . λ→∞ λ lim

for θ ∈ 0.

j∈J

Assumption 4.1 implies that J X

³√ ´ √ µj Njλ = λ + β λ + o λ as λ → ∞.

j=1

When J = 1, this is the Halfin-Whitt many-server heavy-traffic condition that appears in (2.2) in [16]. We also require a resource pooling condition. Assumption 4.2. There exists a vector x ∈ 0} is a connected graph. Assumption 4.2 guarantees that with multiple customer classes each class has access to more than the minP imal capacity that it requires, that is, that j∈J(i) µj νj > ai (with strict inequality). This local excess capacity condition guarantees that if all the capacity in the set of pools J(i) is directed to serve the class i queue, the queue can be drained extremely fast, and practically instantaneously as the system size grows. For the remainder of this paper, we will assume that a vector x ∈ 0 ˆ λΣ (t) ∧ IˆΣλ (t) ⇒ 0 as λ → ∞, sup Q

(7)

0≤t≤T

and also ˆ λ (∞) ∧ Iˆλ (∞) ⇒ 0 as λ → ∞. Q Σ Σ We let Πe be the family of asymptotically efficient control sequences; i.e., Πe := {π λ : π λ ∈ Π for every value of λ and the limit in (7) holds }. When the system is operated under an asymptotically efficient control, there cannot be a significant number of customers in any queue while there are idle servers in some of the server pools. In order to state our results, it is necessary to define the notation “⇒” that we use to mean weak convergence. For each positive integer m, we let Dm be the set of all functions ω : 0. Suppose that for a sequence of systems that operates under an asymptotically ˆλ ⇒ X ˆ as λ → efficient and asymptotically feasible control (with respect to the problem (1)) we have that X Σ ∞. Then for every j ∈ J we have that Iˆjλ ⇒ Iˆj for all j ∈ J , as λ → ∞, where ˆ −. Iˆj (t) := fj X(t)

18

ˆ λ is convergent is not restrictive. As it turns out, the requirement in Lemma 5.1 that the process X Σ ˆ λ must be tight, and therefore, that it has a convergent The following lemma shows that the process X Σ subsequence. Lemma 5.2. Fix T > 0. Then, for any sequence of systems that operates under an asymptotically efficient ˆ λ is tight. and asymptotically feasible control (with respect to the problem (1)) we have that X Σ Next, to show that the LWI-Gcµ policy is asymptotically optimal, we first show that any convergent ˆ Σ . In particular, the drift of the underlying diffusion is completely determined subsequence has the limit X by the routing rule, because the routing rule determines whether or not the policy is asymptotically feasible. Then, the total number of customers in the system is fixed, and, to find an asymptotically optimal policy, we must find a policy that schedules so that the number of customers waiting for service in each class minimizes the total delay cost. This is exactly what Gcµ scheduling (i.e., scheduling according to the generalized cµ rule) does. In summary, the LWI-Gcµ rule is asymptotically feasible and has Gcµ scheduling; hence, we expect it to be asymptotically optimal. Theorem 5.2. The blind policy LWI-Gcµ is asymptotically optimal with respect to the problem (1) among all asymptotically efficient policies.

6

The Steady-State Problem

Theorem 5.2 establishes that the LWI-Gcµ policy is near optimal for the finite-time horizon problem formulation (1). In this section, we show that, while not optimal for the steady-state problem formulation (2), LWI-Gcµ performs well for this problem as well. We do this by comparing the performance of the LWIGcµ policy to a policy that has near optimal performance. In Section 6.1, we informally motivate this near optimal policy in the context of the N -system shown in Figure 2. We then conduct a simulation study that compares the performance of the LWI-Gcµ policy to the performance of the aforementioned near optimal policy. To find the near optimal policy, we follow the general approach outlined by Harrison [17]. In Section 6.2, we derive the diffusion control problem that arises when formally passing to the limit in the control problem (2) having arrival rate λ. In Section 6.3, we solve that diffusion control problem. In Section 6.4, we show how to translate the solution to the diffusion control problem to a near optimal policy for routing and scheduling. Finally, we end in Section 6.5 with a calculation that quantifies the limiting percentage cost increase when using the LWI-Gcµ policy instead of the aforementioned near optimal policy. 19

6.1

The Performance of the LWI-Gcµ Policy for the Steady-State Problem

A policy with near optimal performance can be found by recalling that the routing and scheduling problems are separable in the Halfin-Whitt many-server heavy traffic limit. Then, the routing component should be the threshold policy introduced in Armony and Ward [3] that determines server pool priorities based on the total number of customers in the system. Theorem 3 in [3] shows that a continuous modification of that policy is asymptotically optimal for the pure-routing inverted-V system, when the objective in (2) is to minimize expected customer wait time. Since the constraints in (2) uniquely identify the threshold routing policy, and define the expected number of customers waiting in steady-state, separability suggests that this same routing policy is asymptotically optimal for (2), even under the modified network structure and objective function. Next, the scheduling component should be the generalized cµ rule that was shown to be asymptotically optimal in Theorem 3.4 in [15] for the finite time horizon problem formulation (1) with no constraints, and is exactly as in the LWI-Gcµ policy. Since we expect that, under some additional technical restrictions, the same scheduling policy that minimizes the finite time horizon problem objective (1) also minimizes the steady-state problem objective (2), we have a logical argument that motivates that a policy that combines threshold routing and generalized cµ scheduling is asymptotically optimal for the problem formulation (2). We call this policy the TR-Gcµ policy. For the purpose of this discussion, we only describe the TR-Gcµ policy for systems with two server pools (J = 2). The policy is defined more generally in Section 6.4. When J = 2 the routing is such that when the total number of customers in the system is above (below) a threshold level L, all newly arrived customers, will be served by a fast (slow) server, whenever possible. In the limit, this implies that all fast (slow) servers are busy when the total number in the system is above (below) the threshold. The threshold levels should be set so that the fairness constraint is satisfied in steady state, and this can be achieved asymptotically, exactly as in [3]. The scheduling component is identical to the scheduling component of LWI-Gcµ. Figure 4 presents the results of a simulation study that compares the performance of the LWI-Gcµ and TR-Gcµ policies. The parallel server system structure that we used is the N -system shown in Figure 2. Routing under the LWI-Gcµ policy is as in (3), and, under the TR-Gcµ policy has   1 if XΣ (t) ≤ L j ? (t) = ,  2 if X (t) > L, Σ

when I1 (t) > 0 and I2 (t) > 0, and L is a function of η1 and η2 (to be defined precisely in Section 6.4). 20

Ͳ Ͳ

! Ͳ Ͳ "#$ ! Ͳ Ͳ

!" Ͳ Ͳ

(a) The mean delay cost as a function of η1 when C1 (x) = (b) The mean delay cost as a function of η1 when C1 (x) = C2 (x) = (40x)1.1 .

(40x)1.1 and C2 (x) = 2(40x)1.2 .

Figure 4: A simulation comparison of the performance of the TR-Gcµ and LWI-Gcµ policies having parameters so that the fairness constraint in (2) is satisfied for two different cost functions. Scheduling is the same under both policies, but depends on the cost function. Figure 4 (a) assumes that the two classes have identical cost functions C1 (x) = C2 (x) = (40x)1.1 . Then, the scheduling policy gives priority to the customer that has experienced the longest delay, as specified in (4). Figure 4 (b) assumes that the cost functions are C1 (x) = (40x)1.1 and C2 (x) = 2(40x)1.2 . Then, the scheduling rule has ¾ ½ 1 0³ λ ´ 1 0³ λ ´ ? C W1 (t) , C2 W2 (t) i (t) ∈ argmax 2 1 2 n o 0.1 = argmax 1.1 (40W1 (t)) , 2.4 (40W2 (t))0.2 . In both Figures 4 (a) and (b), the mean delay cost is shown as a function of η1 , noting that η2 = 1 − η1 . In both figures it is apparent that, while the policy LWI-Gcµ is near optimal for the finite horizon problem formulation (1), it is not near optimal for the steady-state problem formulation (2). However, the simulated cost increase when using the LWI-Gcµ policy instead of the TR-Gcµ policy is small; in particular, the maximum simulated percentage cost increase is a little under 7%.4 Also note that there is not too much difference (less than 6%) between the simulated percentage cost increases shown in Figures 4 (a) and (b), even though the cost function used when generating Figure 4 (b) has a strictly higher class 2 delay cost. There is a theoretical reason for this (that we explain in Section 6.5): for any given system structure and 4

The maximum predicted percentage cost increase is 7.59%. However, there is enough variability in the simulation runs so that

we are unable to reliably predict the simulated percentage cost increase. In general, we predict the expected delay cost under either the TR-Gcµ of the LWI-Gcµ policy within 10% relative error.

21

parameters, the percentage cost increase when using the LWI-Gcµ policy over the TR-Gcµ policy does not depend on the cost function. In all the simulation runs, we calculate the mean idle time of servers in pools 1 and 2, and find that the constraint in (2) is satisfied with a tolerance of 0.01. To calculate the mean idle time for a given pool j, each time a server in that pool becomes idle, we record the amount of time he idles, and then take the average of those times at the end of the simulation run. The server PASTA property in Theorem 4.2 ensures that this method is valid.

Figure 5: The fraction of slow servers that are idle, under both the TR-Gcµ and LWI-Gcµ policies, on one run, as a function of time. The percentage cost increase is not the only factor in the decision of whether to use the LWI-Gcµ policy or the TR-Gcµ policy. The difference in the problem formulations (1) and (2) raises the following question: does it matter how steady-state fairness is achieved? Figure 5 shows that the LWI-Gcµ policy maintains fairness at all times, whereas the TR-Gcµ policy only maintains fairness in the long run. This is evidenced by the fact that the fraction of idle servers that are from pool 1 is mostly constant under the LWI-Gcµ policy, but is much more variable under the TR-Gcµ policy. This is because under the TR-Gcµ policy, the pool priority changes as a function of the number of customers in the system, so that slow servers idle when there are many customers in the system and fast servers idle when there are few customers in the system. It is exactly the fact that the LWI-Gcµ policy maintains fairness at all times that is the reason for the increase in the cost when compared to the TR-Gcµ policy. However, in many situations, it may be more appropriate to maintain fairness at all times.

22

6.2

The Diffusion Control Problem

We informally derive the diffusion control problem that arises as an approximation to (2) as λ becomes large. This is the first step in deriving the near optimal TR-Gcµ policy used for comparison purposes in Figure 4. It follows from Assumption 4.3 that ´ ³√ ´ ³ λViλ (∞; π) . Ciλ Viλ (∞; π) = Ci Hence, the objective function in (2) is equivalent to minimizeπ∈Π

I X

h ³ í ai E Ci Vîλ (∞; π) .

i=1

The snapshot principle implies that for large λ λi Viλ (∞; π) ≈ Qλi (∞; π), i ∈ I, or, in terms of the scaled processes, ˆ λi (∞; π), i ∈ I. ai Vîλ (∞; π) ≈ Q The constraint in (2) can be written in terms of the scaled processes as ˆjλ (∞; π)/ηj = E U ˆ λ (∞; π)/ηk , j, k ∈ J . EU k The server snapshot principle in Theorem 4.1 implies that ˆjλ (∞; π) ≈ Iˆjλ (∞; π), νj µj U so that the constraint in (2) can be rewritten as E Iˆjλ (∞; π) = fj , j ∈ J , E Iˆλ (∞; π) Σ

where fj is as defined in Section 5. We conclude that the problem minimizeπ∈Π subject to:

í h ³ λ (∞; π) λ 1Q ˆ a E C i i=1 i ai i

PI

EIjλ (∞;π) λ (∞;π) EIΣ

may be regarded as an asymptotic analog of (2). 23

= fj , j ∈ J ,

(10)

ˆ λ , we would like to find functions Ideally, given X Σ p : 0 ˆ ? (t) < L? , L?j−1 ≤ −X Σ j it follows that

√ √ N λ − L?j λ ≤ XΣλ (t) < N λ − L?j−1 λ.

27

(21)

When routing occurs as specified in the TR-Gcµ policy, we expect that in the limit, when (21) holds, the only idle servers are from pool j, because these servers have lowest priority. This is consistent with the drift function in the limit diffusion. Next, for the scheduling component, first observe that p? is an admissible ratio function as defined in Definition 2.2 in [14]. Hence we expect that the scheduling policy that has, upon service completion by a type j server at time t, that server next admitting to service the customer from the head of queue i? , where ?

?

i := i (t) ∈

argmax ˆ λ (t)>0 i∈I(j),Q i

½ h i+ µh i+ ¶¾ λ λ ? λ ˆ ˆ ˆ Qi (t) − XΣ (t) pi XΣ (t) ,

when there are waiting customers, will achieve the desired queue lengths for which h i+ µh i+ ¶ λ λ ? λ ˆ ˆ ˆ Qi (t) ≈ XΣ (t) pi XΣ (t) for all i ∈ I and t > 0. Then, it follows similarly as in the proof of Theorem 3.2 in [15] that letting µ ?

i (t) ∈


0 ai Ciλ

Qλi (t) ai λ

¶

µ =


ai Ci0

¶ 1 ˆλ Q (t) ai i

ˆ λ and Vˆ λ , so that produces the same result. Since we expect that an asymptotic Little’s law holds for both W i i ˆ λ (t) = Vˆ λ (t) = W i i

1 ˆλ ai Qi (t)

i? (t) ∈

for all i ∈ I and t > 0, letting argmax

ˆ λ (t)>0 i∈I(j),Q i

³ ´ ˆ iλ (t) = ai Ci0 W


ai Ciλ

0

³ ´ Wiλ (t)

the scheduling component of the TR-Gcµ policy follows. The analysis in Sections 6.2-6.4 suggests that (subject to some technical conditions, as in [3]) the TRGcµ is asymptotically optimal with respect to the problem (2). Therefore, the numerical comparison performed in Section 6.1, suggests that the asymptotic performance of the LWI-Gcµ policy is close to optimal with respect to the steady-state problem (2).

6.5

Performance Comparison

We end by explaining why the simulated percentage cost increase shown in Figure 4 does not depend on the cost function. This explanation also provides a method for determining the predicted percentage cost increase when using the LWI-Gcµ policy instead of the TR-Gcµ policy for any parallel server system configuration. 28

Given any cost functions that satisfy Assumption 4.3, and associated CΣ? , we can estimate the percentage ˆ Σ (·) satisfy (9), and let X ˆ ? (·) satisfy (12) with infinitesimal increase in expected delay cost as follows. Let X Σ drift m = mL defined in (18) and threshold levels L?1 , . . . , L?J defined in (19). Then, when λ is large, we ˆ Σ (·) by Theorem 5.1. Furthermore, can approximate system behavior under the LWI-Gcµ policy using X the analysis in Sections 6.2-6.4 suggests that in the limit the minimum attainable cost occurs under the TRˆ ? (·). It follows from Gcµ policy, and that system behavior under that policy can be approximated using X Σ Browne and Whitt [9] that í Z h ³ + ? ˆ = E CΣ XΣ (∞)

∞

0

for

bCΣ? (x)βe−βx dx,

³ ´  ³ ´ −1  √ √ β β J √ √ f φ f φ X Σ Σ 1 1 fΣ fΣ ³ ´  1 + ³ ´  , with fΣ := b :=  µj fj , β Φ √β β Φ √β fΣ

and

fΣ

h ³ í Z ˆ ? (∞)+ = E CΣ? X Σ

∞

0

j=1

rJ+1 CΣ? (x)βe−βx dx, PJ j=1 rj

for r1 := 1 i Y gj−1 (−LJ−j+1 ) ri := , i ∈ {2, . . . , J + 1}, gj (−LJ−j+1 ) j=2

where gj (x) =

³ Φ

´ √ + x µ J−j+1 µJ−j+1 ´ ³ ´ , j = 1, . . . , J. √ √ β − LJ−j µJ−j+1 − Φ √µJ−j+1 − LJ−j+1 µJ−j+1 √ µJ−j+1 φ

β √ µJ−j+1

³

√

β

Therefore, we find that as λ becomes large, the percentage increase in cost of the LWI-Gcµ policy over the TR-Gcµ policy is h ³ í h ³ í r ˆ Σ (∞)+ − E C ? X ˆ ? (∞)+ b − PJJ+1 E CΣ? X Σ Σ j=1 rj h ³ í . = rJ+1 PJ ˆ ? (∞)+ E CΣ? X Σ j=1 rj

(22)

It is interesting to observe that the percentage increase in (22) does not depend on the cost function. This is especially surprising given that the Gcµ scheduling policy strongly depends on the cost function. The reason is separability; in particular, it is because the scheduling part is the same for both policies, and the ³ ´ ˆ Σ (∞) > 0 . In fact, a similar analysis shows that routing part only affects the expected cost through P X the percentage increase in delay cost does not depend on the cost function for any two policies that have 29

different routing components, but the same scheduling component, under which asymptotic efficiency holds and there is state-space collapse.

7

Conclusions and Future Research Directions

We have proposed a simple, blind, and fair control policy for large parallel server systems that have multiple customer classes and multiple server pools. We named the policy we proposed the LWI-Gcµ policy because it routes newly arrived customers to available servers in accordance with a longest weighted idle server first policy, and schedules newly available servers to waiting customers in accordance with a generalized cµ policy. The policy is blind because it requires no information regarding system parameters, such as arrival or service rates. It is fair because it divides the total server idle time amongst the various server pools. We have established that the LWI-Gcµ policy is asymptotically optimal in the Halfin-Whitt limit regime for a finite time horizon optimization problem having the objective to minimize convex delay costs subject to maintaining a fair allocation of idle time amongst the various servers at all time points. We have further shown that the LWI-Gcµ policy also performs well for a steady-state relaxation of the aforementioned optimization problem in which fairness need only be attained in the long run. We were able to evaluate the performance of the LWI-Gcµ policy because we made the important realization that the routing and scheduling parts of the control policy are separable in the Halfin-Whitt many-server heavy-traffic limit. An interesting direction for future research is to incorporate customer abandonment. When there is no fairness constraint, the incorporation of customer abandonment changes the form of the optimal control policy; that is, in general the generalized cµ policy will no longer be optimal. This can be seen from previous work on parallel server systems with reneging in the many server heavy traffic limit [18], and in the conventional heavy traffic limit [4] [13]. Hence we expect that the incorporation of customer abandonment will also change the nature of the optimal control for parallel server systems that have fairness constraints as well. We would also like to remove the restriction that the service rates are pool-dependent, because in many applications it is natural for the service speed to depend on both the server pool and the customer class. This appears to be technically challenging, because in order to know the departure rate from a given server pool we must also track the number of customers in each class being served by that server pool. Another interesting direction for future research is to look at a joint staffing and dynamic control prob30

lem. The question to address is to what extent does the incorporation of fairness constraints change upfront staffing decisions. Finally, there are potentially many ways to define fair policies. The constraints in (1) and (2) represent two natural definitions. Another approach is to not define fairness through a constraint, but, rather, let the concept arise naturally, because the server compensation mechanism induces the desired server effort.

References [1] Z. Aksin, M. Armony, and V. Mehrotra. The modern call-center: A multi-disciplinary perspective on operations management research. Production and Operations Management, Special Issue on Service Operations in honor of John Buzacott (ed. G. Shantikumar and D. Yao), 16(6):655–688, 2007. [2] M. Armony. Dynamic routing in large-scale service systems with heterogeneous servers. Queueing Systems, 51(3-4):287–329, 2005. [3] M. Armony and A. R. Ward. Fair dynamic routing in large-scale heterogeneous-server systems, 2009. Forthcoming in Operations Research. [4] Baris Ata and Melanie Rubino. Dynamic control of a make-to-order parallel server system with cancellations, 2008. Forthcoming in Operations Research. [5] R. Atar. Scheduling control for queueing systems with many servers: Asymptotic optimality in heavy traffic. The Annals of Applied Probability, 15(4):2606–2650, 2005. [6] R. Atar. Central limit theorem for a many-server queue with random service rates. The Annals of Applied Probability, 18(4):1548–1568, 2008. [7] R. Atar, Y. Y. Shaki, and A. Schwartz. A blind policy for equalizing cumulative idleness, 2009. Working Paper. [8] P. Billingsley. Convergence of Probability Measures. Second Edition. John Wiley & Sons, New York, 1999. [9] S. Browne and W. Whitt. Piecewise-linear diffusion processes. In Dshalalow, editor, Advances in queueing: Theory, methods, and open problems, pages 463–480. CRC Press, Boca Raton, FL, 1995. 31

[10] Y. Cohen-Charash and P. E. Spector. The role of justice in organizations: A meta-analysis. Organizational Behavior and Human Decision Processes, 86(2):278–321, 2001. [11] J. A. Colquitt, D. E. Conlon, M. J. Wesson, C. O. L. H. Porter, and K. Y. Ng. Justics at the millennium: A meta-analytic review of 25 years of organizational justics research. Journal of Applied Psychology, 86(3):425–445, 2001. [12] J. G. Dai and T. Tezcan. Optimal control of parallel server systems with many servers in heavy traffic. Queueing Systems, 59:95–134, 2008. [13] S. Ghamami and A. R. Ward. Dynamic scheduling of an N system with reneging, 2009. Working Paper. [14] I. Gurvich and W. Whitt. Queue-and-idleness-ratio controls in many-server service systems, 2009. [15] I. Gurvich and W. Whitt. Scheduling flexible servers with convex delay costs in many-server service systems, 2009. [16] S. Halfin and W. Whitt. Heavy-traffic limits for queues with many exponential servers. Operations Research, 29(3):567–588, 1981. [17] J. M. Harrison. Brownian models of queueing networks with heterogeneous customer populations. In W. Fleming and P. L. Lions, editors, Stochastic Differential Systems, Stochastic Control Theory and Applications, volume 10 of IMA Volumes in Mathematics and its Applications, pages 147–186. Springer-Verlag, New York, 1988. [18] J. M. Harrison and A. Zeevi. Dynamic scheduling of a multiclass queue in the halfin and whitt heavy traffic regimes. Operations Research, 52:243–257, 2004. [19] T. Ibaraki and N. Katoh. Resource Allocation Problems: Algorithmic Approaches. MIT Press, Cambridge, MA, 1988. no. 4 in Foundations of Computing Series. [20] A. Mandelbaum and S. Stoylar. Scheduling flexible servers with convex delay costs: Heavy-traffic optimality of the generalized cµ-rule. Operations Research, 52(6):836–855, 2004. [21] J. A. Van Mieghem. Dynamic scheduling with convex delay costs: The generalized cµ rule. Ann. Appl. Probab., 5(3):809–833, 1995.

32

[22] G. Pang, R. Talreja, and W. Whitt. Martingale proofs of many-server heavy-traffic limits for Markovian queues. Probability Surveys, 4:193–267, 2007. [23] M. Patriksson.

A survey on the continuous nonlinear resource allocation problem, 2006.

Dept. Mathematics, Chalmers University of Technology, Gothenburg, Sweden. Available at: http://www.cs.chalmers.se/ mipat/LATEX/survey610.pdf. [24] A. Puhalskii. On the invariance principle for the first passage time. Mathematics of Operations Research, 19(4):946–954, 1994. [25] Y.

Tseytlin.

tients’

flow

Queueing in

hospitals.

systems

with

Master’s

heterogeneous

thesis,

Technion,

servers: 2007.

Improving available

paon

http://iew3.technion.ac.il/serveng/References/proposal Yulia.pdf. [26] W. Whitt. Some useful functions for functional limit theorems. Mathematics of Operations Research, 5(1):67–85, 1980. [27] W. Whitt. Stochastic Process Limits. Springer, New York, 2002. [28] P. H. Zipkin. Simple ranking methods for the allocation of one resource. Management Science, 26(1):34–43, 1980.

33