BROADBAND NETWORK DESIGN WITH CONTROLLED ...

Canadian Conference on Broadband Research (CCBR'97), Ottawa, April 16-17, 1997, pp.172-183.

BROADBAND NETWORK DESIGN WITH CONTROLLED EXPLOITATION OF FLOW CONVERGENCE OVERLOADS IN ATM VP-BASED RESTORATION Y. Zheng, W. Grover, M. MacGregor TRLabs & University of Alberta, Dept. of Electrical and Computer Engineering, Edmonton, T6G 2E1, Canada Abstract ---This paper studies the capacity placement problem arising in the design of ATM VPbased restoration networks. Previous work on this problem has either been heuristic in nature and / or has treated the ATM spare capacity design problem in a manner that is essentially the same as for STM path restorable networks. In this paper, we develop an exact optimization approach which lets us exploit the inherently statistical nature of the traffic in ATM in capacity planning for restoration. This is based on formulation of a backup-VP restoration capacity design that permits a controlled maximum of convergent flow overloads on spans during restoration. Results show that significant capacity savings can be obtained relative to STM if ATM restoration is allowed even modest restoration-induced oversubscription of bandwidth on surviving spans. We give two Integer Program (IP) formulations and a pair of algorithms for upper and lower bounding of the spare capacity required for ATM Backup-VP restoration. These methods should facilitate further exploration of ATM restoration strategies which exploit the intrinsic differences between ATM and STM transport.

1. INTRODUCTION 1.1. Background Work to date on capacity design for backup VP restoration in ATM has either been heuristic in nature [1] or, where exact [2], has treated the ATM capacity design problem in a manner that is essentially the same as for STM path restorable networks [3]. By this we mean that the spare capacity plan aims to support all restoration demands with an exact match of restoration bandwidth to failed working VP bandwidth. This is certainly a valid and defensible basis for planning a practical ATM network today. But one can observe that this is treating the ATM spare capacity problem as essentially equivalent to STM planning in that failed VPs are rerouted over backup VPs of exactly equal bandwidth allocation regardless of actual VP utilization. This is analogous to STM type restoration of STS-n signals as integral entities regardless of their actual payload fill. There is no way to take signal fill into account in STM restoration: each signal unit must be replaced exactly or all services borne on the affected transport signals experience hard outage. But this hard outage aspect of STM does not pertain in ATM. Because ATM uses statistical multiplexing, two or more VPs of a unit bandwidth allocation could indeed be re-routed for restoration and converge on a link of unit spare bandwidth. Both VPs are functionally rerouted; the link bandwidth is technically oversubscribed at this point; and the services in both VPs may have undergone a degradation on Quality of Service (QoS). This degradation might be severe but, unlike STM, it is a soft, continuous, form of degradation that arises if the replacement bandwidth is not an exact match to the failed working bandwidth. Moreover the actual degradation that occurs depends on the actual VP utilizations at failure time. If utilizations are low, then the oversubscription of bandwidth on restoration may not cause QoS to degrade below acceptable service levels. This means that ATM restoration planning could -if we wish to consider it- exploit a domain that is not available to STM. That is to contemplate bandwidth planning that does not support strictly perfect replacement of each VPs initial bandwidth allocation. While not dismissing or minimizing the potential impact on service which could be severe if oversubscription effects are uncontrolled,


we think it could be valuable to at least inspect the trade-off between network capacity requirement in dependence on a limited designed-in allowance for bandwidth oversubscription upon restoration. Specifically our interest is in recognizing the inherently statistical nature of the traffic flows in ATM and formulating the backup VP design process to permit a controlled maximum amount of convergent flow overloads on spans during restoration. A partial analogy for this line of thinking seems to be found in the airline business: most flights are slightly overbooked as part of an overall optimum economic policy for revenue maximization. But most often the overbooking is unseen to users as there are almost always some “no-shows”. Similarly in an ATM network: could we not slightly (or even aggressively) overbook the restoration capacity we design into the network? Unless a failure occurs right when working VP utilizations are simultaneously at their peaks, the slight overbooking of restoration capacity may be unnoticed by customers. Indeed, if the trade-off of net capacity versus tolerable overload is steep, and/or if mechanisms can be built in to also prioritize VPs when restoration-induced congestion is manifest, then ATM networks with limited amounts of designed-in bandwidth oversubscription upon restoration may well be part of an economically optimum overall strategy.

1.2. Logical View of VP-based ATM Restoration A logical view of what happens at a surviving span during ATM restoration is illustrated in Figure 1. The span j has a some total installed bandwidth allocation that is based on its nominal working load and a reservation of spare capacity for restoration. Unlike an STM network, these working and spare bandwidth allocations are not necessarily physically distinct integral transmission subunits. Rather, each links’ total bandwidth is viewed as having been planned as two allocations from the total bandwidth present. During restoration, some working VP’s on the failed span may be rerouted to their backup VPs which traverse the present span. These backup VPs were logically present on span j prior to failure, but consumed no bandwidth. Only upon failure does the re-directed cell stream appear in each backup VP. Additionally, part of the restoration reaction of the network may give span j a reduction in cell traffic: this occurs if one or more of the working VPs on span j is also affected by the failure event, either upstream or downstream of span j on the path of these working VPs. Therefore in general surviving span j may see both a disappearance of cell flows from some of its working VPs and a sudden onset of new cell streams for activated backup VPs that traverse it. surviving ATM span Working VP’s before restoration Backup VP’s

surviving ATM span

Working VP’s stub release VP’s after restoration

Figure 1: Logical View of VP-based ATM Restoration These considerations allow us to define the restoration-induced overload factor of a span j in response to failure of another span i as the ratio of total traffic after restoration to the total installed capacity of the span. The overload factor can be expressed as follows:


W j – Rs j, i + Rr j, i X j, i ≡ -----------------------------------------Wj + Sj

(1)

where Rrj,i is the total allocated bandwidth of VPs on span i whose backup route crosses span j. Rsj,i is the total allocated bandwidth of VP’s which disappear from span j because they traverse the failed span upstream or downstream of span j. (This is called the stub release traffic.) Wj is the total allocated bandwidth of working VPs on span j before failure. Sj is the total spare bandwidth allocation on span j. Note that Xj,i can have two slightly different but important interpretations: As the variables are defined above, Xj,i is based on allocated bandwidths of working VPs throughout. In this case Xj,i is more precisely a measure of oversubscription of bandwidth not overload per-se. The actual celllevel overload that occurs depends on the actual utilization of each VP involved, not the bandwidth allocations to the VPs. Therefore, if each factor in the numerator is multiplied by a known cell-level utilization then a true overload measure results. For planning purposes however, the worst case overload is obviously the same as the oversubscription factor, so we carry on referring to Xj,i as the restoration-induced overload factor since a value Xtol will represent the designed-in maximum oversubscription of bandwidth and hence the maximum cell-level overload that could occur in the network as designed. It can be appreciated that X j, i ≤ 1 ∀( j, i ) is a basic property of STM restoration because this implies that the total bandwidth of paths available for replacement of failed transport signals is always equal to or greater than the failed bandwidth. In STM, there is no option of ‘partly’ replacing one or more failed STS-n signals. Either each is replaced exactly by a matching restoration path or all services borne by the given STS-n experience immediate total outage. In ATM, however, X j, i > 1 is definitely conceivable and technically meaningful. As argued, it simply means that span j’s total bandwidth is technically oversubscribed when span i fails. Unlike STM, this is a state of ‘partial’ restoration in which all services may be affected to a degree in terms of cell loss and delay, but no service is immediately terminated or disconnected because the restoration path bandwidths did not exactly match the pre-failure bandwidths. Whether cell-level performance exceeds objectives under X j, i > 1 will depend on the VP utilizations and traffic parameters at the time of failure. The maximum acceptable level of restoration-induced overload would depend on whether worst or average case VP utilizations and traffic statistics are assumed for determining such a guideline. It is also in part a policy or business issue; if there is to be strictly no degradation on restoration, then max(Xj,i)= Xtol = 1.0 and the planning is equivalent to STM (i.e., perfect bandwidth replacement). But in a network that is lightly loaded in terms of cell level utilization of the installed bandwidth, some X>1 could clearly be tolerated before affecting QOS guarantees. An alternate business point of view might be that all VPs can be allowed to suffer to a degree during a network restoration event. The QoS impact also depends on the time of the failure relative to the busy period and the equipment provisioning interval. All considered, a relatively high max(Xj,i) might be practical although at present, we cannot stipulate an acceptable level of restoration induced overload. In practise the aggressiveness of each network provider in designing ATM restorable networks would be expected to vary in this regard. Some quantitative guidelines as to the acceptable Xtol may be obtained from sub-studies of the theoretical queuing delay and cell loss increase effects for different merging traffic types. Our continuing work is addressing this. What is useful at this stage, however, is to provide a design formulation that would allow us to explore the capacity savings that are obtainable in ATM


restoration depending on the maximum restoration-induced overload factor that is considered admissible.

1.3. Prior Work Involving Uncontrolled Overload Effects A heuristic algorithm for capacity placement to support backup VP restoration in ATM was previously proposed in [1]. It was by studying this algorithm that we first realized how severe the restoration-induced overloads could be in some proposals for restorable ATM capacity planning. The main issue in [1], described below, is that while every working VP is assigned a backup VP route which is coordinated with other VPs to yield a near-minimum in total backup capacity allocations, there is no designed-in control to coordinate the backup VPs with respect to overloads arising from the set of working VPs that are cut by the same physical failures. The result is that while a logical replacement route exists to functionally replace each failed working VP, the total cell-level traffic impinging on other network spans is uncontrolled. However, this work has created expectations in the industry of ATM network spare capacity ratios as low as 30%. We therefore discuss this algorithm further here both to develop the planning issues involved in backup VP capacity planning and to set the stage for our formulation in which overload effects can still be allowed, but their maximum intensity will be limited by design. In the algorithm [1], hereafter called H-Alg, the shortest route is first set as the initial backup route for each working VP. Then the algorithm substitutes an alternative backup route for one working VP. The spare capacity is calculated with this set of backup routes. If a smaller total amount of spare capacity is achieved by using this substitution, it is kept. Every VP is tested in this manner to find which of all its possible disjoint alternate routes requires the least additional spare capacity given the current state of spare bandwidth allocations already placed for previously decided backup VP routes. This process is repeated until no improvement can be made. In the resultant design the spare capacity of a span is forced by the largest VP whose backup route traverses it. This procedure is conceptually similar to the “max-latching” heuristic in [4] except that in H-Alg each working VP is considered individually while in [4] all working paths involved in each physical span cut that is possible are considered for re-routing as a group in a manner that requires the least incremental spare capacity allocation. Both H-Alg and max-latching [4] are heuristics which depend on the order in which spans or working VPs are considered in sequence. Figure 2 illustrates the capacity minimization principle in [1] and how it results in uncontrolled overloads. In the example span ab serves on the backup routes for both VPi (capacity 5) and VPj (capacity 7). Assume H-Alg has first considered VPj and accordingly assigned span ab a spare capacity of 7 units. a

b

Span ab : working=9 spare: to be assigned VPi (capacity= 5) x

y

VPj (capacity= 7) Working VP Backup VP

Figure 2: H-Alg Backup VP Capacity Allocation Once span ab has 7 units of restoration capacity assigned to it, H-Alg will later realize that it is efficient to route the backup VP for working VPi also over span ab because more than enough capacity is already reserved on ab to serve VPj which needs only 5 units of bandwidth. This reuse


of span ab in the example assumes that H-Alg also finds that the rest of the backup VP route for VPi is suitable efficient on other spans as well. H-Alg chooses a complete set of backup VPs which are efficient in this sense of re-use of capacity. Thus, functionally speaking, a logical backup VP is planned for each working VP. Such backup VPs would be fully adequate if one VP failed at a time. What is missing, however, is consideration that if VPi and VPj happen to share the same physical span xy, say, then in case of its failure, VPi and VPj will be re-routed simultaneously onto backup VPs which traverse span ab. Therefore, omitting any ‘stub release’ effects for the example, but assuming a Wj of 9 units on ab, the result of span cut xy is a restoration-induced total load of 9 +5+7 = 21 units of capacity on span ab, which only has a total capacity allocation of 9+7 = 16 units. So the restoration induced overload factor Xab,xy= 21/16 = 1.31. What is missing, therefore, are considerations to coordinate the set of backup VPs from each physical span failure as a simultaneously instantiated group of backup VPs. The result has been extremely attractive and widely publicized predictions of very low spare capacity levels. But we now see that this will be accompanied by essentially uncontrolled restoration overloads on the surviving spans. To reproduce the predictions of very low spare capacity with H-Alg and validate our concerns of uncontrolled restoration overloads with H-Alg, four networks and demand matrices previously studied for STM restoration [3] were used to test H-Alg. Net 1 is a U.S. metropolitan area model (aka the “Bellcore” study network). Net 2 is a metropolitan area model of Calgary, Canada. Net 3 and 4 topologies and demands are based on European and US interexchange networks, respectively. Table 1 shows the spare capacity requirement with H-Alg (which are similar to results in [1]) and the new data for the consequent overload effects. The overload data were obtained by actual rerouting experiments on each designed network based on all possible span cuts, rerouting each working VP to its designed backup VP and then applying Eq.1 to obtain the restoration-induced overload on all spans for each span cut. The spare capacity predictions are indeed much lower than required by STM networks. This finding has gained much industry attention contributing to a general notion that ATM-based restoration will require very much less capacity than STM-based restoration. It is important therefore to note that these particularly low spare capacity levels are accompanied by significant and strictly uncontrolled overload effects on surviving spans. With the levels of overload reaching 3 to 10 times nominal utilization, cell loss and cell delay in ATM networks would very likely be intolerable for many applications. TABLE 1. Spare Capacity and Overload Factors in Designs with H-Alg Network

H-Alg

Average overload

Maximum overload

Net1

51.72%

1.39

3.00

Net2

54.33%

1.37

3.32

Net3

31.16%

1.37

4.16

Net4

38.59%

1.46

10.00

This is the stepping off point for our new work on this problem: The capacity savings that Table 1 implies relative to STM networks are very attractive. But the uncontrolled overload implications are probably unacceptable in practise. Our aim therefore is to formulate optimal capacity allocation methods that will still gain as much ATM-related capacity savings as safely possible by giving us a controlling input on the maximum extent of the restoration-induced overloads. Results show that this is possible and that capacity savings similar to those in Table 1 may indeed be obtained at fairly aggressive levels of design overload factors but that these designed-in limits are considerably lower than the uncontrolled peak instances that occur in designs with H-Alg.


2. IP FORMULATIONS FOR SPARE CAPACITY ALLOCATION WITH DESIGN LIMITS ON RESTORATION OVERLOAD 2.1. IP-1: Minimize Spare Capacity with Given Maximum Overload The IP formulation presented here optimizes the spare capacity placement of a restorable ATM network given a maximum allowable overload factor in the network. The objective function is:  S    C ⋅ Sj   j   j = 1 

∑

Minimize:

(2)

Subject to: 1. Sparing is sufficient to keep restoration overload below the design limit, Xtol, for all failures:

( X j, i ≤ X tol )

∀ ( i,

j) ∈ S (i ≠ j)

(3)

2. Backup VP’s are sufficient to meet the target restoration levels (Rr,q) for all working VP’s

fk

r, q

= g

r, q

αk

r, q

⋅R

r, q

∀k ∈

P

r, q

∀( r,

q)

(4)

3. Only one backup VP can be used for each working VP, i.e. VP flows are not split.

∑

k∈P

αk

where

r, q

αk

r, q

∀( r,

= 1

q)

(5)

r, q

=1 if the kth route for backup of

g

r, q

is chosen, otherwise

αk

r, q

= 0.

The definition of variables is as follows: First, the inputs to the IP are: C j = the cost of span j per bandwidth unit (the length of a span may be included here)

W j = the working capacity bandwidth allocation on span j S = the number of spans in the network g

r, q

P

= the working VP on path q for demand pair r

r, q

g

r, q

= the set of all distinct backup VP routes eligible for restoration of working VP = the bandwidth of the working VP on path q for demand pair r

r, q ζ i = 1 if the route of working VP

g

r, q

crosses span i, otherwise 0

r, q δ k, j = 1 if the kth route available for backup of

R

r, q

g

r, q

= the target restorability level of working VP

crosses span j

g

r, q

(1.0 used here)

g

r, q


The main output variables are

S j , the spare capacity bandwidth allocation on all spans. Also

r, q obtained in the solution is the set of values f k which are the total bandwidth used on restoration

route k for working VP

g

r, q

r, q . The f k information effectively details the restoration plan for the

r, q whole network which accompanies the optimal spare capacity values. The f k values stipulate for

the qth VP serving part of the total demand on relation r, which of the k possible routes for its backup VP is actually used in the design. To implement Constraint 1 on

X j, i , the overload level on span j

in response to failure of span i is actually represented in the IP in terms of the primary variables above: Substituting the above variables for Xj,i in Constraint 1, Eq.(3) above we get:

X j, i

 r, q r, q r, q r, q   r, q r, q   ζi ζj  +  ∑ ∑ fk δ k, j ζ i   Wj -  ∑ g  (r,q)   ( r, q )   r, q k∈P = ------------------------------------------------------------------------------------------------------------------------------------------------( Wj + Sj )

The IP also outputs the values

r, q

αk

g

backup route for working VP flow

(6)

which are the portion of traffic restored on the kth possible

r, q

. For this particular formulation, every α is either 0 or 1.

2.2. IP-2: Minimum Overload with Given Spare Capacity A related formulation applies to the case where an existing set of spare capacity allocations has been given and the problem is to find a set of backup VP allocations that results in the smallest maximum overload factor of a restorable ATM network working within the given pattern and amounts of available spare capacity. All the variables are the same as in IP-1 but

S j is a now an input. This

algorithm can be used in general to minimize the greatest impact of restoration in situations where there is not enough spare capacity for complete restoration. The objective function is: Minimize:

{

max ( X j, i )

∀( i,

j) ∈ S (i ≠ j)

}

(7)

where max( Xj,i ) is the largest restoration-induced overload resulting over all spans for all span cuts from the assignment of backup VP routes amidst the given spare capacity allocations. Xj,i is given by Eq.6. Subject to: 1. Backup VP’s are sufficient to meet the target restoration for all working VP’s

fk

r, q

= g

r, q

αk

r, q

⋅R

r, q

∀k ∈

P

r, q

∀( r, q )

(8)

2. Only one backup VP can be used for each working VP, i.e. VP flows are not split.

∑

k∈P

r, q

αk

r, q

= 1

∀( r,

q)

(9)


where

αk

r, q

=1 if the kth route for backup of

g

r, q

is chosen, otherwise

αk

r, q

= 0.

3. RELATED BOUNDS FOR ATM NETWORK SPARE CAPACITY In addition to H-Alg and the two IP formulations above, two simpler algorithms are presented here to calculate reasonably tight upper and lower bounds on the required spare capacity of a backup VPbased restorable ATM network. These bounds provide a check on the IP-based results to follow. They may also be generally useful as relatively quick procedures yielding fairly tightly bounds on the sparing requirements of a given network and working path VP routings in advance of detailed optimization. The lower bounding procedure in particular may be useful to rapidly generate starting point designs for exact designs of large networks with the IP -based optimization to reach a final complete design. The upper bounding is based on H-Alg with a simple modification to strictly eliminate any restoration induced overloads. The spare capacity on each span is set to the sum of all the working VP capacities that traverse it, rather than the maximum of such values. For example, in Figure 2, the upper bounding algorithm derived from H-Alg says that span ab needs 7+5 = 12 units of capacity, rather than max(7,5) =7 units as H-Alg does. This results in an over-provisioned design with a guaranteed maximum overload of 1.0. The lower bounding algorithm is based on IP-1 with the constraint in Eq. (5) relaxed to allow real valued α. This converts the Mixed Integer Program as presented into a real-valued Linear Program (LP) which can be solved much more quickly in general. While serving as an LP relaxation of an IP problem, it also represents conceptually a class of restoration system where VPs would be arbitrarily decomposable for restoration rerouting. The closest physical meaning for this would be represented by letting individual VC’s in a VP take different routes in restoration. Thus the LP formulation would assume we are to use several backup VP’s to handle the total flow of each working VP. The sparing thus achieved is thus a lower bound for the practical case where only one backup VP is available to restore each working VP.

4. RESULTS AND DISCUSSION 4.1. Comparative Capacity and Overload Results Table 2 summarizes the results of using the four different capacity design and bounding algorithms. TABLE 2. Comparative Spare Capacity Requirements

Network

H-Alg

IP1 @ Xtol=1

Net 1

51.72%

74.78%

78.36%

71.42%

Net 2

54.33%

82.54%

88.45%

76.85%

Net 3

31.16%

81.49%

86.88%

78.72%

Net 4

38.59%

91.94%

92.89%

91.38%

upper bound

lower bound @ Xtol=1

The results with IP-1 are based on an allowable overload factor Xtol of 1.0. H-Alg has the minimum spare capacity but has severe and frequent span overload cases. A detailed portrayal of these is portrayed in Figure 3. Figure 3 (and Figures 4, 5) have the following fine structure: For each span considered as the failure span i, on the x-axis, the (S-1) Xj,i values experienced by other spans are plotted left to right with a vertical line for each value. In a network of 10 spans, there would be ten


clusters of nine Xj,i values displayed side by side to form the plot. The horizontal lines at 1.2 and 3.04 on Figure 3 are for comparative purposes: they mark, respectively, the corresponding peak overloads achieved in the subsequent IP-1 design (Figure 4) and IP-2 design (Figure 5) for comparison to H-Alg in Figure 3. Figure 4 shows the corresponding plot for Net-3 designed with IP1 at Xtol= 1.2. Note the scale change between these figures. The IP-1 design at Xtol=1.2 (Figure 4) has about 50% more spare capacity than the H-Alg design although this is still about 30% less than the equivalent STM design. The Xj,i data in each of these plots was obtained from separate programs that conducted restoration experiments for each span failure using the assigned backup VP routes in each design. Thus, the tight clamp on Xj,i values at 1.2 in Figure 4 validates IP-1 for its intended properties. 4.16

4 3.5

IP-2 after H-Alg

Xj,i overload factor

3

3.04

2.5 2 1.5

IP-1 at Xtol=1.2

1.2

1 0.5 0 1

6

11

16

21

26

31 cut span

36

41

46

51

56

Figure 3: Overload Factors in H-Alg design for Net-3 (31% spare capacity) 1.2


1.1

1

0.9

0.8

0.7

0.6 1

6

11

16

21

26

31 cut span

36

41

46

51

56

Figure 4: Overload Factors in IP-1 design for Net-3 with Xtol=1.2 (65.9% spare capacity)


Since H-Alg allocates the least total spare capacity, it is of interest to see how low the maximum overload can be capped within this sparing if IP-2 is applied to the H-Alg spare capacity design to improve the coordination of backup VP assignments to reduce the peak overload factor. Figure 5 shows this result for Net-3 which illustrates the application of IP-2 to improve on the worst-case overload of the H-Alg spare capacity design but with exactly the same spare capacity placement that H-Alg placed in the first instance. IP-2 manages to reduce the peak overload of the H-Alg design by rearranging the backup VP assignments to 3.04 from 4.16 while retaining 31% spare capacity.

3.04

3


2.5

2

1.5

1

0.5

0

1

6

11

16

21

26

31 cut span

36

41

46

51

56

Figure 5: Overload Factors in IP-2 design for Net-3 with sparing from H-Alg (31% spare capacity) When IP-2 is similarly applied to the H-Alg result for Net 1 the maximum overload drops to 2.75 from 3.00 and the average drops to 1.28 from 1.39. The side effect of reallocating backup VPs to reduce the maximum overload is that there are more individual overload cases. When we squeeze the maximum overload down by applying IP-2, the restoration flows are distributed more extensively over all spans and more spans suffer from overloading combinations.

4.2. Spare Capacity versus Tolerable Overload Design Trade-off Using IP-1 it is possible to explore how the total spare capacity of the network responds to increasing Xtol. Table 3 summarizes the designs for each of our 4 test networks for Xtol ranging up to 2.0. For comparative presentation all spare capacity totals are normalized to that of the Xtol=1.0 case for each network. The total spare capacity decreases rather quickly as the design tolerance for restorationinduced overload increases. With 10% design maximum over-subscription of bandwidth on restoration (Xmax = 1.1), spare capacity is reduced 17% to 19%. At a more aggressive Xmax = 1.5, a full 60% to 70% reduction of the spare capacity is obtained. Xtol is, however, only the strict maximum overload level that we will tolerate in the IP-1 designs. This maximum Xj,i=Xtol may occur for only one specific combination of failure span and restoration span in the design. It is, therefore, worth inspecting the number of spans that actually experience a given level of overload within a design tolerance of Xtol. Figure 6 considers this in terms of the 90th percentile of actual overloads experienced by spans over all span cuts versus the design Xtol. The data show for example that at Xtol = 1.4, 90% of the spans actually experience overloads no greater than 1.06, 1.08, 1.21 and 1.28 in Nets 3, 4, 2 and 1, respectively. This adds to the expectation that fairly significant capacity savings could be possible in practise without severe


restoration-induced side-effects, through judicious choice of Xtol as a parameter for the basic design of the network. TABLE 3. Spare Capacity Requirement vs. Allowable Overload Factor Design Xtol

Net1

Net 2

Net 3

Net 4

1.00

100%

100%

100%

100%

1.05

91.5%

90.0%

90.0%

90.1%

1.10

82.9%

82.3%

80.6%

81.1%

1.15

75.0%

75.5%

76.9%

72.7%

1.20

68.0%

69.0%

65.9%

65.4%

1.25

62.0%

65.0%

59.3%

58.3%

1.30

57.3%

58.9%

53.6%

52.1%

1.40

48.6%

49.9%

43.7%

40.9%

1.50

40.9%

43.3%

34.9%

31.1%

1.75

26.5%

30.4%

22.7%

13.8%

2.00

17.3%

23.2%

14.2%

5.4%

X* | [P(Xj,i < X*) = 0.90] at given Xtol

1.5 Net1 1.4 Net2 1.3 Net3

1.2

Net4 1.1

1

0.9

1

1.2

1.4 1.6 Xtol, Design Overload Limit

1.8

Figure 6: 90th Percentile Actual Overload vs. Design Maximum Overload

2


5. CONCLUDING DISCUSSION Designing for controlled convergence of restoration flows is a proposed approach which would let the network planner mediate a controlled trade-off of temporary post-restoration ATM performance for significantly reduced network capacity. More work is required to determine the safe and acceptable restoration overload design factors. This needs to be based on experience with real ATM traffic. The benefit of the proposed design framework is, however, that it allows a network operator to first determine an acceptable restoration stress level and once this is determined, to design exactly for that grade of restoration performance with a known minimum of total capacity for restoration. We think this design approach contributes to recognizing and enabling the exploitation of the intrinsic difference between ATM and STM transport methods from a restoration viewpoint. The potential compromise in QoS that is inherent in the prospect of designing to accept restorationinduced oversubscription of bandwidth could be further alleviated by a restoration oriented priority congestion control. In this approach the network spare capacity design could be based on a reasonably aggressive Xtol value to obtain significant capacity savings. Then, at the time of an actual failure, each surviving span would assess its actual cell-level utilization after allowing enough time for backup VP switching to occur. It would then either do nothing in which case utilizations were low enough to provide restoration for all services, or, it would mark the lower priority VPs traversing it with a throttling indication to be acted upon either by the VP sources themselves or neighbouring switches. This gives several attractive properties: Despite the number of logical VPs traversing the span after restoration, all VPs will inherently enjoy transparent continuation of service if actual conditions permit restoration of all VPs. On the other hand if the net cell-level utilization does constitute a sufficient overload, then priority VPs can be restored selectively without QoS reduction by throttling lower priority service class VPs. In this way the benefits of ATM capacity design to exploit restoration-induced oversubscription of bandwidth can be pursued with a protective mechanism to ensure QoS for selected services while still granting all services restoration on a bestefforts basis whenever actual network circumstances permit. Our further work in this area is oriented towards sub-studies of the cell-level performance degradation of merging restoration flows in dependence on traffic types and number of VPs. Our aim is to produce quantitative guidelines for input to the decision as to what Xtol value with which to design a given backup VP-restorable ATM network. This data will complete the new framework for ATM backup VP capacity design: The network operator would determine the traffic assumptions they wish to adopt and the acceptable QoS impacts during an assumed busy-hour restoration event. This leads to an Xtol recommendation. Once Xtol is determined, IP-1 can realize the corresponding minimum capacity restorable network.

6. REFERENCES [1]

R. Kawamura, K. Sato, and I. Tokizawa, “Self-healing ATM networks based on virtual path concept”, IEEE Journal on Selected Areas in Communications, Vol. 12, No. 1, 1994, pp. 120-127.

[2]

Y. Xiong, L. Mason, “Restoration strategies and spare capacity requirements in self-healing ATM Networks”, to appear in Infocom 97, Kobe, Japan, April, 1997.

[3]

R. R. Iraschko, M.H. MacGregor, W.D.Grover, “Optimal Capacity Placement for Path Restoration in Mesh Survivable Networks”, IEEE ICC’96, June 1996, pp.1568-1574.


[4]

W.D. Grover, V. Rawat, M. MacGregor, “A Fast Heuristic Principle for Spare Capacity Placement in Mesh-Restorable SONET / SDH Transport Networks”, accepted for publication in Electronics Letters, Jan. 7, 1997.