Multiobjective design of survivable IP networks

0 downloads 0 Views 378KB Size Report
Aug 22, 2006 - DOI 10.1007/s10479-006-0067-y. Multiobjective design of survivable IP networks. Peter Broström · Kaj Holmberg. Published online: 22 August ...
Ann Oper Res (2006) 147:235–253 DOI 10.1007/s10479-006-0067-y

Multiobjective design of survivable IP networks Peter Brostr¨om · Kaj Holmberg

Published online: 22 August 2006  C Springer Science + Business Media, LLC 2006

Abstract Modern communication networks often use Internet Protocol routing and the intradomain protocol OSPF (Open Shortest Path First). The routers in such a network calculate the shortest path to each destination and send the traffic on these paths, using load balancing. The issue of survivability, i.e. the question of how much traffic the network will be able to accommodate if components fail, is increasingly important. We consider the problem of designing a survivable IP network, which also requires determining the routing of the traffic. This is done by choosing the weights used for the shortest path calculations. Keywords Internet protocol . OSPF . Network design . Survivability . Weight optimization The Internet consists of a huge number of communication networks, which are connected to each other in order to enable data communication around the globe. Internet Protocol (IP) is the method by which data is sent from one computer to another on the Internet. Networks within an Autonomous System (AS) share routing information using an Interior Gateway Protocol (IGP). When several systems are involved in the communication, routers at the borders use Border Gateway Protocol (BGP) for exchanging routing information. The design of an IP network is a long-term investment where many aspects must be taken into consideration. Some of them are the cost of the design, network survivability and aspects regarding traffic engineering. In this work, we address the problem of designing IP networks with high survivability at minimal monetary costs. A network is said to be survivable if it is able to handle disturbances caused by network failures to a given extent. In order to distinguish between different networks with respect to their capability of handling failures,

P. Brostr¨om . K. Holmberg () Division of Optimization, Department of Mathematics, Link¨oping Institute of Technology, SE-581 83 Link¨oping SWEDEN e-mail: [email protected] P. Brostr¨om e-mail: [email protected] Springer

236

Ann Oper Res (2006) 147:235–253

we introduce measures of survivability. After discussing advantages and disadvantages with different measures, one of them is selected for further use. An IP network consists of a number of routers, which are connected to each other by a set of links. Each link has a bandwidth that determines the amount of communication that this link can perform. Congestion occurs when the traffic on a link approaches the bandwidth of the link. Most IP networks handle communication in a satisfactory way when all components of the network are functional, but the performance may be worse when network components fail. The most usual failures in an IP network are link failures. When this happens, traffic on the failed link is routed through the network using other paths, which might cause congestion on other links. If each pair of routers still is connected by functional links, such rerouting will be performed. When this is not the case, the network has been divided into disjoint parts, and routers located in different parts can not communicate until the link is repaired. A network is more reliable when there are several disjoint paths between every pair of routers and when large amounts of bandwidth are installed on each link. It is easy to realize that reliable networks are more costly to design. We will require that the networks are capable of satisfying all traffic demands without exceeding the bandwidth on any link when all network components are functional, but not at failures. The OSPF (Open Shortest Paths First) protocol is one of the most commonly used IGPs and is used as routing protocol in this work. It is a dynamic protocol where every link has a link weight set by the network operator. A metric is a vector containing all link weights of the network. Every router contains an updated database with the metric and information regarding the status of each link (functional or non-functional). OSPF determines the routing by shortest path computations with respect to the metric, using only functional links. Since each router has access to an updated database, it can perform shortest path computations locally. If several paths to a destination are equally short, they are all used and the flow is evenly balanced on these outgoing links (the so called equalcost multi-path principle, ECMP). For a more detailed description of the OSPF protocol, see (Moy, 1998). Traffic engineering in an existing network (i.e. without the design part) is a closely related problem. Fortz and Thorup (2000) showed that optimizing the OSPF link weights for a given network is an NP-hard problem. Their results show that a network can become capable of handling more demand when the link weights are carefully chosen. This is because the network becomes more evenly used. Ericsson, Resende, and Pardalos (2002) and Buriol et al. (2003) used genetic algorithms on the same problem. An extension of this optimization problem is to optimize weights for a fixed network when link failures are taken into consideration, and this problem is considered in Fortz and Thorup (2003). Bley, Gr¨otschel, and Wess¨aly (2000) considered the problem of designing survivable IP networks. Their model ensures that a chosen percentage of each demand can be sent during any single failure. They use OSPF as routing protocol and require that each demand is sent on a unique path. This restriction is not made by Holmberg and Yuan (2004), which to our knowledge was the first paper describing OSPF with load balancing in a mathematical optimization model. In their work, however, aspects regarding network survivability are not included. We present a multiobjective mixed-integer model for designing survivable IP networks. The first objective function minimizes the design cost, while the second maximizes network survivability. The model also contains a new set of constraints used for modeling the load balancing in OSPF. Springer

Ann Oper Res (2006) 147:235–253

237

Solutions to the problem are generated by a heuristic solution method. Our solution method is capable of yielding both reliable networks that require large investments, and unreliable networks obtainable by small investments as well as solutions in between these extreme cases. A network provider can choose the one fulfilling his/hers desired balance between survivability and investments. A paper outline is as follows. Section 1 is used for specifying the problem in detail. Different ways of measuring network survivability are discussed in Section 2. A multiobjective model is presented in Section 3, while Section 4 contains the solution method. Results from numerical experiments are presented in Section 5. Finally, we draw conclusions and suggest directions for further research in Section 6.

1. Problem formulation The basic conditions for our work are as follows. Pairs of routers can be connected by optical fibers. We mainly consider the case when routers are only allowed to be connected to routers in the immediate surroundings. Let N be the set of routers and A the set of potential links. A is symmetric and contains pairs of links, (i, j) and ( j, i). Our aim is to design a network from the potential graph G = (N , A). The cost for link (i, j) depends on its traffic and on the number of optical fibers installed. Fibers can only be installed when a conduit is installed, and this is modeled by a fixed charge pi j that corresponds to manual work, ground-maintenance, machinery etc. Once a link connection is open, a number of optical fibers with capacity u i j can be installed at unit cost f i j . The parameter ti j limits the number of optical fibers installed on the link connection between routers i and j. There is also an operating cost ci j that depends on the usage of the link. Since the normal state of a network is that all components are functional, the operating costs are based on this state. It follows that the cost function is piece-wise linear with an initial large fixed charge and minor uniform steps for capacity extensions. A set of commodities C is given. Each commodity represents a demand of data communication. A triplet (o(k), d(k), d k ) describes the origin node, the destination node and the demand of commodity k ∈ C. Each demand should be routed from the origin to the destination according to the OSPF protocol. Our task is to design a network from the potential network by allocating capacities and weights to each link that is opened. Link weights should be chosen such that all demands can be routed through the network without exceeding the link capacities on any link. However, traffic disturbances are allowed when traffic is rerouted during link failures. The aims are to minimize the cost of the network and to minimize the traffic disturbances during link failures.

2. Survivability of an existing IP network We have already mentioned that IP networks can get congested when network components break down. In the worst case, data communication between different parts of the network may be lost until the failure is repaired. A network is said to be survivable if it is able to reroute all traffic demand without exceeding the bandwidth of too many links. The amount of saturated links is then related to the level of survivability. Survivability might be more or less important in different cases. This may depend on the type of customers and the importance of the traffic. This section is used for describing failures that can occur in IP networks, but also to introduce a measure of network survivability. Springer

238

Ann Oper Res (2006) 147:235–253

There are mainly two different types of failures in IP networks. Link failures are most common and correspond to interruptions in the communication between two routers. Node failures correspond to broken routers and can be consider as several simultaneous link failures, where all links connected to the broken node are considered to fail. In this work we only consider link failures. They happen due to technical errors, but also when fibers are cut off during construction work. Since a non-functional link in most cases is repaired before the next failure occurs, we assume that at most one link fails at a time. This assumption is also made by Fortz and Thorup (2003). Here, we want to point out that our solution method easily can be implemented for the case of several simultaneous link failures as well. The main reason for disregarding from node failures is that link failures occur much more frequently. When a link fails, the shortest paths to certain destinations change and traffic is redistributed accordingly. Links might become overloaded when traffic is redirected and this causes disturbances in network performance. The following definitions state how a failure is defined, how congestion is defined and when a network is considered as fully survivable. Definition 1. We say that a network is at perfect state when all links in the network are functional. When a link is non-functional, we say that the network is at failure state. Definition 2. A link becomes congested when the traffic of the link exceeds its bandwidth. Definition 3. A network capable of rerouting all demands without exceeding the bandwidth on any link, for all failure states (i.e. for any single link failure), is called fully survivable. A network which is not fully survivable is called partly survivable. The reason for introducing the concept of partly survivable networks is that we want to be able to compare disturbances in different networks. This is accomplished by measuring the amounts of demand that is affected by congestion and interruptions. The effect of interruptions is measured by summing up the total amounts of demand which can not be routed during a failure. An alternative way would be to count the number of commodities affected by a failure, but then no distinction would be made between large and important customers and smaller customers that might be less important. We let γgh be the sum of interrupted traffic when the communication link between routers g and h fails. It is more difficult to introduce an appropriate measure of link congestion. Four different functions that can be used for this purpose are given by F1–F4, and we will see that these measures favor different types of networks. The notation used in these functions is as follows. We let xi j be the amount of traffic on link (i, j) and Ui j the bandwidth of the same link. For notational convenience, we introduce (x)+ = max{x, 0}. σ 1 (x, u) =

1  (xi j − Ui j )+ |A| (i, j)∈A

σ 2 (x, u) = max (xi j − Ui j )+

(F2)

  1  xi j − Ui j σ (x, u) = |A| (i, j)∈A Ui j +

(F3)

(i, j)∈A

3

Springer

(F1)

Ann Oper Res (2006) 147:235–253

239

 σ 4 (x, u) = max

(i, j)∈A

xi j − Ui j Ui j

 (F4) +

Functions F1 and F2 measure the mean value and the maximum value of the link overload, while functions F3 and F4 measure the mean value and the maximum value of the link utilization. They all give the value zero when no links are congested. First, we compare the measures based on link overload, F1 and F2. In F1, the mean value is used, and we obtain the same reduction in measure by the same increase of bandwidth at any of the congested links. One disadvantage with this measure is that additional bandwidth will always be most suitable to install on the congested link where this is cheapest. This means that F1 may give a network where some links are not congested, while other links are heavily congested. This will not be the case when F2, the function based on maximum link overload, is used. Here, the measure can only be decreased by installing additional capacity on the most congested link, no matter how expensive this will be. This means that F2 will give a network where links are overloaded by approximately the same amount. Next, we compare functions F3 and F4. Using this kind of measure, an overload of fixed size is more serious when the available bandwidth is small. It follows that priority is given to links with small bandwidth, since large reductions in σ is obtainable from the same size of bandwidth extension. This can be illustrated by considering an example of two links. We assume that one link has twice as much bandwidth installed as the other link and that both of these links have the same value on their corresponding σ . To obtain the same decrease in σ , twice as much bandwidth need to be installed on the link with large capacity. If the cost for capacity extension is in the same magnitude, a doubled investment would be required. From this discussion follows that the choice of measurement could affect the structure of the network (as long as the network is partly survivable). To obtain a network with desired characteristics one should pay attention to this choice. It can in many ways be reasonable to measure congestion using link utilization. However, this kind of measure makes a design model with variable link capacity non-linear, since the link capacities are included in the denominators of F3 and F4. It is also clear from the discussion that F1 behaves in an unpleasant way, since some parts can become heavily congested while other parts of the network perform well. The remaining measure is function F2. This measurement gives an evenly congested network and can therefore be considered as equally fair to all users of the network. Such behavior is exactly what we were aiming for. Later in this paper, we will see that this measure is suitable to model and to implement in an iterative search method. Therefore, we let σgh denote the maximum overload on any link when the link connection between routers g and h have failed. Now we can sum up this section by concluding how network survivability is measured. When the link connecting routers g and h is non-functional, the traffic disturbance is computed as gh = σgh + γgh . We introduce  as the level of traffic disturbance during the most critical failure, i.e.  = max(g,h)∈A gh . When  has a small value, only a small part of the demand is affected by any single failure. We note that  = 0 corresponds to a fully survivable network. We will later also use the relative traffic disturbance,  R , which is the traffic disturbance divided by the total demand. Let us finally make the remark that when (Fortz and Thorup, 2000) optimize the OSPF link weights for a fixed network and fixed capacities, an objective function similar to F3 was used. Their aim is to use the installed bandwidths evenly such that the networks become capable of handling larger amounts of data. For this purpose, their objective function is a sum Springer

240

Ann Oper Res (2006) 147:235–253

over the link utilization. Instead of using a linear function, the cost for each link is determined by a convex piece-wise linear function. The idea is to heavily penalize usage of links with large utilization, especially when the utilization approaches 100%. 3. Mathematical formulation In this section we describe a multiobjective mixed-integer formulation of the Survivable IP Network Design and routing problem (SIPND). The objective functions are to minimize the design cost and to minimize the traffic disturbances during the most critical link failure. The SIPND model is inspired by the IPND model presented in Holmberg and Yuan (2004), in that “reduced costs” for a shortest path problem are used for modeling the OSPF protocol. The main differences between these models are the way in which load balancing is modeled and that the SIPND model treats survivability aspects, which were not considered in the IPND model. 3.1. Notation The SIPND model is based on different network states which are defined as follows. The state (g, h) is defined as the scenario obtained when the link connection between routers g and h has failed. A network can be in |A|/2 different failure states, since state (g, h) is identical to state (h, g). The state when no link connection has failed (previously defined as the perfect state) is denoted by state (0, 0). We let A¯ be the set of states, where A¯ = {(i, j) ∈ A : i < j} ∪ {(0, 0)}. Links (g, h) and (h, g) can not be used for routing traffic when the network is in state ¯ (g, h). We let Agh = A \ {(g, h), (h, g)} be the set of functional links in state (g, h) ∈ A. One may note that A00 = A, since A does not contain link (0, 0). The SIPND model is mainly used for designing networks, but could also be used for expanding the capacity of an existing network. When an existing network is expanded, the initial bandwidth of link (i, j) is given by the parameter ei j . This parameter is set equal to zero when no capacity previously have been installed. The remaining parameters are summarized below. The amount to be distributed from node o(k) to node d(k) is d k units. The fixed charge for opening a link connection between nodes i and j is given by pi j . The cost for installing an optical fiber on link (i, j) is f i j , and at most ti j fibers of capacity u i j can be installed on the link connection between nodes i and j. The operating cost for link (i, j) is ci j per unit. The same metric is used when all links are functional as well as in failure scenarios. However, even though the same weights are used in the different states, shortest paths are affected by link failures and this is why some groups of variables in the mathematical model must have state related indices. The variables are given below.

wi j = the OSPF weight of link (i, j), ⎧ ⎨1, if link (i, j) belongs to a shortest path between node l i and node l in state (g, h), yi jgh = ⎩ 0, otherwise, l πigh = the shortest path distance from node i to node l with respect to w in state (g, h),

Springer

Ann Oper Res (2006) 147:235–253

241

z i j = the number of optical fibers installed on link (i, j),  1, if the link connection between nodes i and j is opened, qi j = 0, otherwise, xikjgh = the fraction of the demand of commodity k using link (i, j) in state (g, h), k = the fraction of the demand of commodity k that is sgh unsatisfied in state (g, h),

σgh = the maximum link overload in state (g, h), γgh = the total amount of unsatisfied demand in state (g, h),  = the traffic disturbance during the most critical link failure.

3.2. The SIPND model The SIPND model is obtained by combining constraints from a multicommodity network design problem, see for example (Holmberg and Yuan, 2000), with constraints describing the OSPF protocol. Constraints are defined for the perfect state network and for all failure scenarios. The model contains two objective functions, which are given below. The first objective function minimizes the fixed charges for installation of conduits, the costs for installation of fibers and operating costs in the perfect state network. The second objective function minimizes the traffic disturbance during the most critical failure, which is the same as maximizing network survivability as defined in Section 2. 

min

pi j qi j +

(i, j)∈A:i< j

 (i, j)∈A

fi j zi j +

 

ci j d k xikj00

(1)

(i, j)∈A k∈C

min 

(2)

Let us now describe the multicommodity fixed charge network design part of the model. This part contains node balance constraints, capacity constraints and design constraints. 

xikjgh −

j:(i, j)∈Agh





x kjigh j:( j,i)∈Agh

k d k sgh = γgh

⎧ k ¯ ⎪ ⎨1 − sgh ∀(g, h) ∈ A, i = o(k), ∀k ∈ C k ¯ i = d(k), ∀k ∈ C = −1 + sgh ∀(g, h) ∈ A, ⎪ ⎩ ¯ otherwise 0 ∀(g, h) ∈ A,

∀(g, h) ∈ A¯

(3)

(4)

k∈C



d k xikjgh ≤ u i j z i j + ei j + σgh

∀(i, j) ∈ Agh , ∀(g, h) ∈ A¯

(5)

k∈C

z i j + z ji ≤ ti j qi j

∀(i, j) ∈ A : i < j

γgh + σgh ≤  ∀(g, h) ∈ A¯

(6) (7) Springer

242

Ann Oper Res (2006) 147:235–253

The node balance constraints, set 3, ensure that each demand is routed from the origin to the destination. The model contains |A|/2 + 1 groups of such node balance constraints, since node balance must hold in any network state. Interruptions are modeled by allowing demand not to be satisfied, and letting the variables s in the right-hand-side of set 3 contain the (fractions of) unsatisfied demand. The total amount of unsatisfied demand during state (g, h) is given by γgh due to set 4, and we ensure that all demand is satisfied in state (0, 0) by requiring that γ00 = 0. The next two sets of constraints regard installation of fibers and conduits. The right-handside of set 5 consists of previously available capacity (ei j ), newly added capacity (u i j z i j ) and link overload (σgh ). This set ensures that the amount of traffic exceeding the link capacity is stored as link overload. We ensure that links are not congested in state (0, 0) by requiring that σ00 = 0. Set 6 ensures that a link is opened when capacity is installed in either direction of the link connection. The sum of link overload and unsatisfied demand during a single failure is bounded by  because of constraint set 7.  will be equal to the traffic disturbance during the most critical failure, since this variable is minimized in 2. The second part of the model ensures that the traffic is routed in accordance with the OSPF protocol. This part consists of OSPF decision constraints, sets 8–10, and load balancing constraints, sets 11–12. l ¯ ∀l ∈ N wi j − πigh + π ljgh ≥ 0 ∀(i, j) ∈ Agh , ∀(g, h) ∈ A,

l ¯ ∀l ∈ N + π ljgh ≥ 1 ∀(i, j) ∈ Agh , ∀(g, h) ∈ A, yil jgh + wi j − πigh

l ¯ ∀l ∈ N + π ljgh M ≤ 1 ∀(i, j) ∈ Agh , ∀(g, h) ∈ A, yil jgh + wi j − πigh

(8) (9) (10)

The aim of the OSPF decision constraints is to find a suitable metric w and to ensure that yil j is one when link (i, j) belongs to a shortest path between node i and node l and zero if not. This is modeled using dual feasibility conditions, 8, and complementary slackness conditions, 9–10, for the shortest path problems. Every link in a shortest path has a reduced cost equal to zero and every other link has a positive reduced cost. This is stated by set 8 in which the left-hand-side, from this point denoted by c¯il j , can be interpreted as the reduced cost for link (i, j), and is valid for every l l commodity k with d(k) = l. Moreover, by requiring that πlgh = 0, the variable πigh expresses the shortest path distance from node i to node l in state (g, h) with respect to w. The complementary slackness conditions ensure that y obtain correct values from w and π. Set 9 ensures that yil j = 1 when c¯il j = 0, while set 10 ensures that yil j = 0 when c¯il j ≥ 1. (Under the assumption that the parameter M is larger than all reduced costs). The remaining part of the model ensures that commodities only use shortest paths from origins to destinations and that the traffic from each node is evenly divided on all pairs of outgoing links that belongs to a shortest path. This makes the traffic balanced in accordance with the OSPF protocol, since all links emanating from the same node either will have the same traffic or will not be used at all. xikjgh ≤ yid(k) jgh

¯ ∀k ∈ C ∀(i, j) ∈ Agh , ∀(g, h) ∈ A,

d(k) d(k) k k + yitgh ≤ 2 − xisgh + xitgh yisgh

∀s : (i, s) ∈ Agh , ∀t : (i, t) ∈ Agh , ¯ ∀i ∈ N , ∀k ∈ C ∀(g, h) ∈ A,

Springer

(11)

(12)

Ann Oper Res (2006) 147:235–253

243

Bounding flow variables with OSPF decision variables prohibits links that do not belong to a shortest path from being used. This is modeled by constraints 11. Constraint set 12 models load balancing as follows. There are two constraints for each node i and for every pair of outgoing links, (i, j1 ) and (i, j2 ), and they ensure that xikj1 gh = xikj2 gh when yikj1 gh and yikj2 gh both equal one. This follows since this set contains one constraint for s = j1 and t = j2 and another constraint for s = j2 and t = j1 . One may note that the right-hand-side of set 12 is bounded from below by one. This makes constraints from this set redundant for pairs of links with at most one y variable set to one, and this is why the traffic is evenly divided only on pairs of links that both belong to shortest paths. This completes the description of the SIPND model. The model is very large and can not be expected to be solved to optimality in a reasonable amount of time. Therefore, a heuristic solution method is proposed in the next section. 4. Solution method One of the approaches used by Holmberg and Yuan (2004) for the design and routing problem is a weight-based search method in a Simulated Annealing framework. We will also here use the idea of searching in the space of weights, using a few different strategies for the search. Additional steps need to be included in the method, as described below. Due to the fact that the optimization problem has two objective functions, the solution method must be able to find solutions that require low investments and solutions that are survivable. For this purpose, we have developed a two-phase search method where the first phase is used for finding solutions with low design cost, while the second phase is used for finding survivable solutions. Before describing the algorithm in detail, we discuss what kind of solutions we are looking for. 4.1. Pareto-optimality The aim in multiobjective optimization is to find Pareto-optimal solutions, which are solutions that are not dominated by other solutions. We say that a solution S1 dominates a solution S2 , if S1 is better than S2 in at least one objective function and not worse with respect to the other objective functions. In our case, no solution can have a lower design cost and have less traffic disturbances than a Pareto-optimal solution. When solutions are generated by our search method, their objective values can be plotted into a cost/survivability diagram. This kind of diagram is shown in Fig. 1, where circles represent solutions found by the algorithm. The dashed line connects the ones which might

traffic disturbance

Fig. 1 Cost/survivability diagram

design cost

Springer

244

Ann Oper Res (2006) 147:235–253

be pareto-optimal, while the remaining solutions are dominated and are located above or to the right of the dashed line. We can not be sure that non-dominated solutions found by the algorithm actually are pareto-optimal. So let us by the term “undominated” mean solutions that are not dominated by any known solutions. This means that an undominated solution can become dominated if we find a new solution located below or to the left of the dashed line. 4.2. The algorithm It is easy to decide how OSPF routes the traffic when weights are known. Dijkstra’s method is used for computing shortest paths in the first iteration of the algorithm, while spanning in-trees of shortest paths are maintained and updated in later iterations. A shortest path in-tree has a directed path from each node to the root node. These paths are shortest with respect to the metric and a vector of node prices allows us to find alternative shortest paths. Each shortest path tree has a thread index that enables a preorder traversal of the nodes of the tree. By traversing a part of a tree and by checking all entering/leaving links, the shortest path tree can efficiently be updated. This data structure is further described in Ahuja, Magnanti, and Orlin (1993) (Section 11.3). Shortest path trees are almost identical in subsequent iterations. This is because a single weight is changed in an iteration, which in general have minor impact on the shortest paths. The above-mentioned data structure allows us to update a shortest path tree by traversing nodes of the tree. One tree link is replaced by a non-tree link each time a shortest path tree is traversed and since at most |N | − 1 links need to be replaced, the time complexity for updating a tree will be O(|N |2 ). This is the same time complexity as in Dijkstra’s method. However, since usually only a few links need to replaced, if any, substantial computational savings are obtained compared to using Dijkstra’s method in every iteration. The algorithm is initialized by assigning a value to each link weight in the potential network and by computing shortest path trees using the Dijkstra’s method. The vector of link weights in iteration t is denoted by wt . The first phase of the algorithm is to compute the perfect state traffic with respect to w t . The shortest path tree rooted at node d(k) allows us to find all shortest paths between the origin and the destination of commodity k. When all these paths are known, it is easy to compute how the demand of commodity k is routed in accordance with the OSPF protocol. The perfect state traffic is obtained by repeating these computations for each k ∈ C. Since σ00 is fixed to zero, i.e. links may not be overloaded in state (0, 0), these computations also provide lower bounds for the capacity installations. By installing just as much bandwidth as required for feasibility, the cheapest feasible solution for this metric is obtained. Solutions obtained from this strategy are not likely to be survivable, since links have very little spare capacity. However, such solutions are still of interest since they can be pareto-optimal. Searching in the metric and installing capacities “tight” is the first phase of the method. The second phase of the algorithm improves the survivability of solutions obtained from phase one. This is done by evaluating failure scenarios and by installing additional capacity on overloaded links. Additional links are not allowed to be opened in this phase, and this is modeled by temporarily removing them from the potential network. We evaluate a link failure on link connection (g, h) by rerouting its traffic using other paths. This is done by temporarily increasing wgh and whg and by temporarily updating the shortest path trees. If links (g, h) and (h, g) are not part of any shortest path tree after this change, then other paths have become shorter and demands are rerouted on these paths. Springer

Ann Oper Res (2006) 147:235–253

245

However, if the network is divided into disjoint parts when state (g, h) occurs, there is no possibility of rerouting all demand. Then the demand between the disjoint parts can not be satisfied. These computations are repeated for each failure scenario. The one that causes the largest traffic disturbance is called “the most critical failure”, while the scenario causing the second largest traffic disturbance is called “the second most critical failure”. The level of traffic ˆ respectively. Here,  ˆ may be disturbance for these instances are denoted by  and , obtained from the same failure scenario as , but not from the same overloaded link. This ˆ is supposed to be the maximum traffic disturbance after installing additional is because  ˆ would be decreased by the same amount capacity on the most overloaded link, and  and  if they originated from the same link. ˆ t )/u i j fibers on the most Network survivability is improved by installing 1 + (t −  overloaded link for the most critical failure. This completes the second phase of the algorithm and ensures that the second most critical failure becomes the most critical failure if the second phase is repeated. The second phase is repeated until a fully survivable network is obtained or until  exclusively originates from interruptions. When this happens, additional improvements in survivability can not be made since this requires additional links to be opened. A special approach for evaluating failure scenarios is used in Fortz and Thorup (2003). They consider a subset of links as “critical links”, and obtain a significant speed-up by considering failure scenarios from this set only. A similar approach could be adapted for our problem. However, the  value will then be invalid when the most critical failure is not included in the critical set. One has to verify that undominated solutions actually are undominated by evaluating all possible failures when such solutions are found. This could probably result in a speed-up of the procedure, but has presently not been implemented. The second phase of the algorithm generates several feasible solutions with different link capacities for the same set of open links and for the same metric. The cost/survivability diagram in Fig. 2 shows that the relative traffic disturbance,  R , can be reduced by small investments in the beginning of the phase, while larger investments are needed later in the phase. This is because it suffices to install fibers on a single link at the beginning of the phase, while the same reduction in  R requires improvements on several links at the end of the phase. Fig. 2 Cost/survivability diagram showing the solution found by phase one (cross) and the solutions found by phase two (circles)

0.12

relative traffic disturbance

0.10

0.08

0.06

0.04

0.02

0 8.5

8.6

8.7

8.8 design cost

8.9

9.0

9.1 x 10 5

Springer

246

Ann Oper Res (2006) 147:235–253

This phase of the algorithm is used for finding undominated solutions for different levels of traffic disturbance, but since the design cost increases as more fibers are installed, a starting solution with very high design cost is less likely to yield undominated solutions. By introducing an intensification parameter, β, and by maintaining a vector of undominated solutions, we can speed up the procedure by only performing the second phase when undominated solutions are more likely to be found. Phase two is therefore only performed for solutions which are less than β times more costly than the undominated solution with similar level of traffic disturbance. Moreover, phase two is not performed when the current solution is more costly than the undominated solution with  = 0. Performing phase two on such a solution can not result in an undominated solution. Phase two may suggest that more than ti j fibers should be installed on a link. In this case, we install as many fibers as possible and terminate the second phase since further reductions in  would make constraint 6 violated. The metric is modified when further improvements in survivability can not be made. This is done by changing a single weight in one of the two following ways. The first way of changing weights is due to Holmberg and Yuan (2004), and use randomness both for selecting the link and for determining if the weight is increased or decreased. The weight is changed such that some origin/destination pair is affected, and this information is obtainable from reduced costs in the shortest path trees. The second way of changing weights aims at finding network topologies with very few open links. This is done by increasing a randomly chosen weight such that no origin/destination pair use the link. Weights are increased in this manner until some weight becomes greater than the threshold value wmax . When this happens, we start decreasing weights. This is done by decreasing a randomly chosen weight such that some origin/destination pair is affected. When some weight becomes smaller that the threshold value wmin , we once again turn to the phase of increasing link weights. We consider κ different changes of weights each time the metric is modified. A metric obtained in this way is said to be part of the neighborhood of the current metric, and κ is therefore referred to as the neighborhood size. The first phase of the algorithm is performed for each metric in the neighborhood and the metric with the smallest design cost is chosen as the next iterate. One may note that a small value of the neighborhood parameter, κ, results in a search method where the changes of the weights are very random. On the other hand, a too large κ results in a search method that is likely to get stuck in local optima with small design costs. This is why we have chosen to use small and moderate values of κ. We conclude this section with a summary of the algorithm, which assumes given values of κ, β, wmin and wmax . Step 0—Initialization: Let t = 0 and initialize the vector of link weights, wt . (We use the value 5 for each component in w t .) Compute the shortest paths using Dijkstra’s algorithm. Step 1—Compute the perfect state traffic: Use the shortest path trees for computing the perfect state traffic with respect to wt . Allocate

capacities and open links such that a feasible solution Dt is obtained, i.e. let z i j = ( k∈C d k xikj00 − ei j )/u i j , ∀(i, j) ∈ A, let qi j = 1 if z i j > 0 and let qi j = 0 if z i j = 0. Compute the objective function value (1) for Dt and denote it by C t . Step 2a—Evaluate link failures: Choose a failure scenario and evaluate the level of traffic disturbance. Repeat this step for all failure scenarios. Denote the traffic disturbance for the ˆ t , respectively. most and second most critical failures by t and  Springer

Ann Oper Res (2006) 147:235–253

247

Step 2b—Update the frontier of undominated solutions: Store Dt if this solution is undominated. If C t ≥ βC k , where C k is the cost of the undominated solution with similar traffic disturbance, go to Step 3. ˆ t )/u i j for the most Step 2c—Improve network survivability: Let z i j = z i j + 1 + (t −  t t−1 congested link in the most critical failure. Let t := t + 1, w := w and store the design as Dt . Let C t := C t−1 + C, where C is the cost for the additional bandwidth. Go to Step 2a. Step 3—Modify the metric: Update the metric in κ different ways in accordance with the chosen strategy of changing link weights. Update the shortest path trees and perform Step 1 for each of these candidate metrics. Let t := t + 1, let wt be the metric with lowest design cost when Step 1 was performed and set Dt and C t accordingly. Go to Step 2a. The algorithm is terminated in Step 3 when a specified time limit is reached.

5. Computational results In this section we describe test problems, specify parameter settings and present results obtained from computational experiments. Computational results are obtained on a Pentium 4 with 1,6 GHz CPU and 1 Gb physical memory. The running time is limited to 30 minutes, unless some other time limit is specified. Since the design of IP networks is a matter of tactical planning, this solution time can be regarded as fairly short. 5.1. Potential networks Potential networks in the range of 10–30 nodes and 48–148 directed links have been generated. Each node is connected to at least four other nodes in its immediate surroundings. Nodes located at the center of the potential networks are in general connected to a larger number of adjacent nodes. Some nodes are considered to be area border/AS boundary routers and we assume that most of the demand have such nodes as origin or destination. The cost of opening and allocating capacity to a link depends on the length of the link. Pairs of routers that are located far from each other are in general more costly to connect. One of our aims is to determine if our solution method performs well for different type of link costs. Some link data therefore consists of fixed charges that are large compared to the total cost for capacity installations, while in other cases the fixed charges are less dominating. The parameters t are in these test problems set to large values, so in practice there is no limit on the number of fibers that can be installed on a link. 5.2. Parameter settings Three different strategies are used in our computational experiments. Apart from Step 3 of the algorithm, these methods are identical. The main difference is the way in which link weights are modified, but there are also minor differences regarding the size of the neighborhood. Strategy A randomly decides the direction of a change of a link weight. This strategy uses κ = 5. Strategy B modifies link weights in sequences. This strategy uses κ = 5, wmin = 2 and wmax = 1000. If one weight becomes smaller than wmin and another weight becomes greater than wmax at the same time, then the metric is set equal to the initial values used in Step 0. Springer

248

Ann Oper Res (2006) 147:235–253

Table 1 Computational results obtained in 30 minutes, with large fixed charges Str.

β

A B C

1.05 1.05 1.05

454.2 428.2 459.9

441.3 417.0 420.2

421.2 409.4 413.4

373.6 372.0 376.6

487.8 498.2 480.3

477.9 478.2 443.1

446.6 460.8 435.7

395.6 378.5 380.4

A B C

1.15 1.15 1.15

451.7 452.0 430.5

431.6 439.0 417.4

413.2 428.6 409.0

378.1 377.5 372.0

462.4 459.2 449.2

448.3 447.1 436.4

432.4 433.8 422.7

385.4 385.2 380.8

R

0

Lowest design cost

0.05

0.10

Average design cost

0.20

0

0.05

0.10

0.20

Strategy C is similar to strategy B. The only difference is that strategy C uses κ = 1 when weights are decreased. The smaller neighborhood will probably imply that a larger number of iterations can be performed in a given time. 5.3. Experiments on a smaller network Our preliminary experiments use a potential network that consists of 10 nodes and 48 directed links. Computations are performed both with larger and smaller (i.e. more and less dominating) fixed charges. Table 1 summarizes the computational results obtained when the fixed charges dominate the costs of capacity. Each strategy has been used five times for each different value of β. The first two columns show the strategy used and the value of the intensification parameter β. The next four columns show the minimum design cost for different levels of traffic disturbance (in thousands). The relative traffic disturbance is is given in the last row, where for example 0.10 means that at most 10% of the total demand is allowed to be congested or not satisfied during a single link failure. The last four columns show the average design cost for the same levels of traffic disturbance. In Table 1, strategy C with β = 1.15 (denoted by C(1.15)) and strategy B with β = 1.05 (denoted by B(1.05)) yield the lowest design costs. If we rely mostly on the average, C(1.15) seems to be the most promising method. Furthermore, in the cases when B(1.05) is the best, C(1.15) is nearly as good, but when C(1.15) is best, B(1.05) is significantly worse. Furthermore, strategy A, which is a more pure random search method, is never better than the other two. Computational results are also visualized in Fig. 3. This cost/traffic disturbance diagram shows all frontiers of undominated solutions found in the five runs with C(1.15) (as circles) and the five runs with C(1.05) (as crosses). The diagram shows that solutions obtained with C(1.15) dominate those obtained with C(1.05) for most levels of traffic disturbance. The figure also indicates that C(1.05) sometimes performs very poorly for low levels of traffic disturbance. This happens when the algorithm generates a network with very few open links in an early iteration. Such networks are likely to be undominated for larger values of , since links are costly to open, but can not be made fully survivable since this require link disjoint paths between origins and destinations. The frontier of undominated solutions will therefore be of high quality for larger values of , and of worse quality for smaller values of . Improving the frontier for small values of  can only be done if networks with a larger number of open links are generated. Improvements are found in the second phase of the algorithm, since the first phase set link capacities close to the perfect state traffic which gives Springer

Ann Oper Res (2006) 147:235–253

249

Fig. 3 Frontiers of undominated solutions found when strategy C is used in 30 minutes, β = 1.05 represented by crosses and β = 1.15 represented by circles

0.7

relative traffic disturbance

0.6

0.5

0.4

0.3

0.2

0.1

0 2.5

3.0

3.5

4.0

4.5

5.0

design cost

Design cost: Rel. traffic dist.:

284912 0.507752

Design cost: Rel. traffic dist.:

5.5 x 105

430506 0

Fig. 4 The potential network, the least and the most survivable solution

large  values. However, phase two is only performed if the cost of the network is less than β times more costly than the undominated solution with similar  value. Due to the large fixed charges in this problem, the second phase of the algorithm will not be performed for all networks, if β is too small. This explains why β = 1.15 seems to be preferable for small values of . Figure 4 shows the topology of the potential network and the topology of two undominated solutions found by the algorithm, namely the least and the most survivable solution, when fixed charges are large compared to costs for capacity. With this cost structure it is not surprising that these graphs form a tree and a 2-edge-connected graph. The structure of these networks indicates that reasonable solutions are generated in terms of network topology. Table 2 summarizes computational results obtained when less dominating fixed charges are used. Design costs are given in thousands. Here we find that strategy C is more dominating. The method C(1.15) is best or very close to in all cases, while C(1.05) and B(1.15) are best in a few cases, but clearly worse in others. The differences are however smaller than in the case with larger fixed charges. Figure 5 shows frontiers of undominated solutions obtained from strategy C, and indicates that the differences between frontiers obtained with C(1.05) (crosses) and C(1.15) (circles) is less significant than in Fig. 3. However, C(1.15) seems to give the best frontier also here. Figure 6 shows the topology of the least and most survivable solutions found by strategy C, and as expected the most survivable graph contains a larger number of open links than the corresponding graph in Fig. 4. Springer

250

Ann Oper Res (2006) 147:235–253

Table 2 Computational results obtained in 30 minutes, with smaller fixed charges Str.

β

A B C

1.05 1.05 1.05

116.9 116.3 116.7

109.8 107.0 107.7

99.6 99.1 98.3

88.5 87.1 86.9

123.5 121.4 120.7

113.2 111.6 111.6

103.0 102.1 102.4

891.7 900.4 883.3

A B C

1.15 1.15 1.15

117.1 115.4 114.6

109.8 107.1 106.2

100.7 98.5 98.8

87.5 88.1 87.1

119.4 117.6 116.9

111.2 109.4 108.4

102.4 100.1 100.4

887.3 893.3 881.8

R

0

Lowest design cost

0.05

Fig. 5 Frontiers of undominated solutions found when strategy C is used in 30 minutes, β = 1.05 represented by crosses and β = 1.15 represented by circles

Average design cost

0.10

0.20

0

0.05

0.10

0.20

0.7

relative traffic disturbance

0.6

0.5

0.4

0.3

0.2

0.1

0

6

7

8

9

10

11

design cost

Fig. 6 The least and the most survivable solution

Design cost: Rel. traffic dist.:

69394 0.539171

Design cost: Rel. traffic dist.:

12

13 x 104

114575 0

There is a large difference in the number of iterations performed by the different strategies. Strategy A performs around 200000 iterations in 30 minutes running time, approximately four times as many as strategy B and three times as many as strategy C. This is because very few shortest path trees are affected by a small change of a weight, while a larger number of shortest path trees are affected by larger changes of a weight. The reason why strategy C can do a larger number of iterations than strategy B in a given time, is that a smaller κ is used when weights are decreased. We summarize these initial experiments by concluding that best results are obtained using strategy C, i.e. when link weights are changed in sequences and when a smaller neighborhood is used when weights are decreased. This strategy increases weights by larger values, which makes it possible to investigate more different parts of the search space. This advantage seems Springer

Ann Oper Res (2006) 147:235–253

251

Table 3 Computational results of the 30 node instance in 30 minutes running time Str.

β

A B1000 C1000

1.05 1.05 1.05

1.54 1.50 1.45

1.24 1.28 1.26

1.17 1.22 1.18

1.15 1.20 1.13

1.55 1.53 1.48

1.32 1.32 1.28

1.25 1.29 1.21

1.22 1.24 1.18

8835 4129 5538

A B1000 C1000

1.15 1.15 1.15

1.49 1.46 1.50

1.33 1.24 1.23

1.25 1.18 1.16

1.21 1.15 1.14

1.53 1.48 1.51

1.34 1.29 1.29

1.26 1.22 1.23

1.23 1.19 1.20

8447 4071 5556

A B100 C100

2.00 2.00 2.00

1.48 1.46 1.43

1.30 1.25 1.24

1.25 1.19 1.17

1.24 1.17 1.14

1.52 1.48 1.46

1.34 1.28 1.28

1.25 1.26 1.22

1.24 1.19 1.18

8895 7701 8565

R

0

0.05

0.10

0.20

0

0.05

0.10

0.20

Lowest design cost

Average design cost

Iter

to compensate for the disadvantage that a smaller number of iterations can be performed. Furthermore, β = 1.15 seems to give better results than β = 1.05. 5.4. Results for a larger instance A potential network of more realistic size is used in the remainder of this paper. The potential network consists of 30 nodes and 148 links with demand of 144 commodities. The fixed charges for the links are fairly large compared to the cost for bandwidth, but not as dominating as in the first case in the previous section. Our earlier investigations show that the solution method sometimes performs poorly when β = 1.05. This is the main reason for extending our computations to include β = 2, which in practice means that intensification hardly is used. Moreover, strategies B and C may use a different value of wmax in these experiments. The value of wmax is used as subscript, e.g. B100 means that strategy B is used with wmax = 100. Computational results are displayed in Table 3, where design costs are given in millions. The last column in this table shows the number of iterations performed by the algorithm (median value of 5 instances). The number of iterations performed is obviously affected by the size of the potential network, but the results obtained with β = 1.05 and β = 1.15 indicate that the number of iterations is not affected very much by the intensification parameter. If the parameter wmax is decreased, the size of the weight space is decreased, and each iteration will take less time. On the other hand certain solutions might be removed from consideration, since the corresponding weight setting is not allowed. We have tried reducing wmax from 1000 to 100 and increasing β from 1.15 to 2.00. This results in an increased number of iterations, and, more importantly, often better results. In general the results obtained by strategies B and C are better than those obtained by strategy A. It seems that undominated solutions are found mainly when weights are increased in sequence. Moreover, using a smaller neighborhood when weights are decreased implies that a larger part of the limited running time is used in the sequence of increasing weights. This would favor strategy C over strategy B. Based on the average values in Table 3, we find that C100 (2.00) yields the best results. Settings A, B1000 and C1000 have also been used with a running time of 10 hours, and the frontiers of undominated solutions are shown in Fig. 7. This figure verifies our earlier conclusion that strategy A is dominated by strategies B and C. Furthermore, strategies B and Springer

252

Ann Oper Res (2006) 147:235–253

Fig. 7 Frontiers of undominated solutions found within 10 hours running time

0.35 Strategy A Strategy B

0.30

relative traffic disturbance

Strategy C 0.25

0.20

0.15

0.10

0.05

0 1.05

Fig. 8 Comparison between the best frontier obtained when strategy C is used in 30 minutes with the one obtained in 10 hours

1.10

1.15

1.20

1.25 1.30 1.35 design cost

1.40

1.45

1.50

1.55

x 106

0.25

30 minutes 600 minutes

relative traffic disturbance

0.20

0.15

0.10

0.05

0 1.05

1.10

1.15

1.20

1.25

1.30

design cost

1.35

1.40

1.45

1.50 x 10 6

C have similar performance for low levels of traffic disturbance, while strategy C is better than strategy B for higher levels of traffic disturbance. In Fig. 8 we compare the best frontier obtained when strategy C was used in 30 minutes with the one obtained in 10 hours. Running the method for a longer time yields improvements for high and low levels of traffic disturbance, but for intermediate levels the differences are quite small. The randomness of the method even makes the solutions obtained in 30 minutes better than those obtained in 10 hours in some cases. Solutions with very large values of traffic disturbance are probably less interesting in practice, so we conclude that in some cases a running time of 30 minutes may be enough to yield useful solutions. There are presently no other known methods for this problem, so we have no other results to compare with. The resulting networks has reasonable topology for different cost structures. Strategy C has throughout these computations given good solutions, as compared to the other strategies, so our conclusion is that strategy C constitutes a reasonable solution method for this hard problem. Springer

Ann Oper Res (2006) 147:235–253

253

6. Conclusions and future work We consider the problem of designing survivable IP networks in which demands are routed in accordance with the OSPF protocol. Four different measures of survivability are introduced and one of them is used in a multiobjective model. A new set of constraints used for modeling load balancing within OSPF is presented. Test problems with different types of cost structures have been generated, and feasible solutions are found using a weight-based multiobjective search method. Three different strategies have been tested and the best results are obtained when link weights are changed in sequences by larger values. The solutions found by the algorithm looks reasonable from the viewpoint of network topology, so we claim that we have a useful solution method for the problem. Future work aims at developing other solution methods for this problem, as well as considering other mathematical formulations. Another line of research is to obtain lower bounds for the objective function values, as they would allow us to verify the quality of the solutions, and possibly insert the method into a branch-and-bound framework. Acknowledgments The authors would like to acknowledge funding from the Swedish Research Council for this research project.

References Ahuja, R.K., T.L. Magnanti, and J.B. Orlin. (1993). Network Flows. Theory, Algorithms and Applications. Prentice Hall. Bley, A., M. Gr¨otschel, and R. Wess¨aly. (2000). “Design of Broadband Virtual Private Private Networks: Model and Heuristics for the B-WiN.” In N., Dean, D.F. Hsu, and R. Rav (Eds.), Robust Communication Networks: Interconnection and Survivability volume 53 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, AMS, pp. 1–16. Buriol, L., M. Resende, C. Ribeiro, and M. Thorup. (2003). “A Hybrid Genetic Algorithm for the Weight Setting Problem in OSPF/IS-IS Routing.” Technical Report TD-5NTN5G, AT&T Labs Research, USA. Ericsson, M., M. Resende, and P. Pardalos. (2002). “A Genetic Algorithm for the Weight Setting Problem in OSPF Routing.” Journal of Combinatorial Optimization, 6, 299–333. Fortz, B. and M. Thorup. (2000). “Internet Traffic Engineering by Optimizing OSPF Weights.” In Proceedings of IEEE INFOCOM ’00, vol. 2, pp. 519–528. Fortz, B. and M. Thorup. (2003). “Robust Optimization of OSPF/IS-IS Weights.” In Proceedings of INOC2003, pp. 225–230. Holmberg, K. and D. Yuan. (2000). “A Lagrangean Heuristic Based Branch-and-Bound Approach for the Capacitated Network Design Problem.” Operations Research, 48, 461–481. Holmberg, K. and D. Yuan. (2004). “Optimization of Internet Protocol Network Design and Routing.” Networks, 43(1), 39–53. Moy, J. (1998). OSPF: Anatomy of an Internet Routing Protocol. Addison-Wesley 1998.

Springer