Mathematical Decomposition Techniques for Distributed ... - CiteSeerX

2 downloads 7752 Views 1MB Size Report
network services, it has become increasingly important .... TECHNIQUES FOR DISTRIBUTED CROSS-LAYER OPTIMIZATION OF DATA NETWORKS 3. [4].
Mathematical Decomposition Techniques for Distributed Cross-Layer Optimization of Data Networks

Björn Johansson and Mikael Johansson

Technical Report Stockholm, Sweden 2005

IR-S3-RT-0501

IR-S3-RT-0501

1

Mathematical Decomposition Techniques for Distributed Cross-Layer Optimization of Data Networks Bj¨orn Johansson, Student Member, IEEE and Mikael Johansson, Member, IEEE

Abstract— Network performance can be increased if the traditionally separated network layers are jointly optimized. Recently, network utility maximization has emerged as a powerful framework for studying such cross-layer issues. In this paper we review and explain three distinct techniques that can be used to engineer utility-maximizing protocols: primal, dual, and cross decomposition. The techniques suggest layered, but loosely coupled, network architectures and protocols where different resource allocation updates should be run at different timescales. The decomposition methods are applied to the design of fully distributed protocols for two wireless network technologies: networks with orthogonal channels and network-wide resource constraints, as well as wireless networks where the physical layer uses spatial-reuse TDMA. Numerical examples are included to demonstrate the power of the approach. Index Terms— Wireless networks, optimization, congestion control, power control, scheduling, cross-layer protocol design.

I. I NTRODUCTION

I

N order to meet the increased demand for high-bandwidth network services, it has become increasingly important to develop network control mechanisms that utilize the full capabilities of all of the layers. However, in many network technologies, including optical and wireless networks, there is an inherent coupling between different layers. Changing the resource allocation in the physical layer alters the average link rates, influences the optimal routing, and determines the achievable network utility. Under such coupling, optimizing within layers only will not be enough to achieve the optimal network performance, but the congestion control, routing, and radio layer control mechanisms need to be jointly designed. Recently, network utility maximization (NUM) has emerged as a powerful framework for studying such cross-layer issues (e.g., [4], [5], [6], [7], [8]). Although utility maximization is a mature subject in disciplines such as economics (e.g., [9]) its application to data network was pioneered by Kelly et al. [4] and by Low and Lapsley [5]. The initial work in the networking literature focused on understanding various network control schemes (e.g., TCP/AQM variants) in the fixed Internet as algorithms for solving a performance optimization problem, but it has also been used to engineer new congestion This paper is partially based on research presented in [1], [2] and [3]. This research was sponsored in part by the European Commission, the Swedish Research Council, and Wireless@KTH. The authors are with the School of Electrical Engineering, Royal Institute of Technology (KTH), 100 44 Stockholm, Sweden. Email: {bjorn.johansson | mikael.johansson}@s3.kth.se

control schemes, notably TCP FAST. The literature on utility maximization for networks with fixed link capacities is vast, and it is fair to say that there is now a relatively complete understanding of both equilibrium properties as well as the dynamics of Internet congestion control (see, e.g., [10] for a recent survey). During the last couple of years, the basic model has been extended to include the effects of the physical layer and a number of cross-layer optimal protocols have been suggested for different wireless technologies (e.g., [8], [7], [11], [2], [3]). However, one may argue that there has been limited innovation in terms of theoretical tools; almost all protocols have been designed using variations of the dual decomposition techniques employed in the initial work by Low and Lapsley. One of the key contributions of this paper is to extend the theoretical toolbox available for studying NUM problems by giving an accessible, yet relatively comprehensive, overview of three alternative decomposition schemes from mathematical programming. We demonstrate how the decomposition schemes suggest network architectures and protocols with different properties in terms of convergence speed, coordination overhead and the time-scale on which various updates should be carried out. Moreover, the techniques allow us to find distributed solutions to problems where the dual decomposition approach is not immediately applicable. Although these ideas have been pursued by the authors in a sequence of papers [1], [2], [3], we note that similar ideas have recently and independently been put forward by Palomar and Chiang [12]. The second key contribution is to show how the alternative decomposition techniques can be applied to design novel distributed protocols for two wireless network technologies: networks with orthogonal channels and network-wide resource constraints, and wireless networks where the physical layer uses spatial-reuse TDMA. The paper is organized as follows. In Section II, we present two networking problems which will be solved in the paper. Section III contains a review of three alternative decomposition principles from mathematical programming applied to the network utility maximization problem. Section IV briefly reviews the main points of congestion control for networks with fixed link capacities. Section V discusses different architectural considerations and protocol properties associated the different decomposition techniques, while Section VI and VII demonstrate how the techniques can be used to devise distributed algorithms for the two network scenarios. Considerable attention is given to the design of distributed resource allocation

2

IR-S3-RT-0501

problems that work in harmony with the congestion control in order to maximize network utility. Finally, Section VIII concludes the paper. Mathematical background and proofs are collected in the appendix.

it is therefore more natural to be explicit about the resource dependence on the link rates, and use a model on the form

II. N ETWORK U TILITY M AXIMIZATION

where s and ϕ has to be found to jointly maximize the aggregate utility, and where the link capacities is assumed todepend on a resource that has a network wide constraint ( l ϕl ≤ ϕtot ). If the resources are local, it is easy to find a distributed algorithm that solves the problem. More surprisingly, it is also possible to find an algorithm solving the optimization problem in a distributed way, even with network wide resource constraints [2]. The problem (2) is interesting since the equations can describe a wireless network or optical network that uses orthogonal channels, and supports dynamic allocation of spectrum between transmitters. This type of network is the main driver for solving (2).

We consider a communication network formed by a set of nodes located at fixed positions. Each node is assumed to have infinite buffering capacity and can transmit, receive and relay data to other nodes across communication links. The network performance then depends on the interplay between end-toend rate selection in the transport layer, routing in the network layer, and resource allocation in the physical layer. We model the network topology as a graph with L directed links shared by P source-destination pairs. To each source, we associate an increasing and strictly concave function up (sp ) which measures the utility source p has of sending at rate sp P and let u(s) = p=1 up (sp ) denote the aggregate utility. We assume that data is routed along fixed paths, represented by a routing matrix R = [rlp ] with entries rlp = 1 if source p uses link l and 0 otherwise. The optimal network operation can be found by solving the network utility maximization problem maximize subject to

u(s) Rs  c,

s ∈ S,

c∈C

(1)

in variables s and c. In words, the problem is to maximize aggregate utility by jointly choosing s and c, subject to the constraint that the total traffic across links must be below the offered link rates (Rs  c) and restrictions on the admissible end-to-end rates and link transmission rates (s ∈ S, c ∈ C). Specifically, the vector of end-to-end rates s must lie in a convex set S, typically on the form S = {s| smin  s  smax } or S = {s| smin  s}), while the capacity vector must lie in the (convex) multi-user capacity region C of the system. Any pair (s, c) that satisfies the constraints of (1) is said to be feasible, and corresponds to an admissible network operation. We will make the following technical assumptions A SSUMPTION A i) The network is connected. ii) The utility functions, up (sp ), are strictly concave, differentiable, increasing, and limsp →0 up (sp ) = −∞. iii) A strictly interior point exist and the problem is feasible. . The class of problems that fit into (1) is rather large, and to arrive at specific results we will focus on two (still quite broad) particular cases of (1). These problem classes are practically relevant and have an underlying structure that allows us to go all the way from new theoretical tools to novel distributed solutions to the utility maximization problem. A. Example: Network-wide resource constraints The model (1) is rather abstract, as it hides the complexity of optimizing the link rate vector, e.g., allocating communications resources, such as time-slots in a transmission schedule, transmission rates, powers, and bandwidths. In some cases

maximize u(s) subject to  Rs  c(ϕ), smin  s ϕ ≤ ϕ , 0 ϕ tot l l

(2)

B. Example: Distributed scheduling In other cases it is fruitful to keep the abstract structure of (1) but to restrict the capacity region to a smaller set. This can be done in the following way maximize u(s) subject to Rs  c, smin  s,

c ∈ C1

(3)

where s and c have to be found jointly to maximize the aggregate utility, and where C1 is a convex polytope, the convex hull of a finite set of points in RL . This looks very similar to the original problem, but the switch to a convex polytope as the feasible region, instead of only demanding the region to be convex, will prove to be crucial. The problem (3) is interesting since the equations can describe a network which employs (possibly spatial-reuse) TDMA. This is the type of network we have in mind when we solve (3). III. M ATHEMATICAL D ECOMPOSITION T ECHNIQUES In this section we will review some basic decomposition techniques from mathematical programming and demonstrate how they can be applied to the network utility maximization problem. Although several surveys over decomposition techniques have been written in the mathematical programming community, e.g., [13], [14], [15], their focus is typically on exploiting problem structure to improve computational efficiency. Our focus is different. We use mathematical decomposition techniques as guiding principle for protocol engineering. Rather than trying to subdivide the network utility maximization into subproblems that can be solved efficiently (say, in terms of memory or CPU cycles), we use decomposition techniques to divide the optimization of a networkwide performance measure to functions that can be executed in a distributed manner, preferably using existing protocols (or slight variations thereof). A crucial observation is that the link rates c (or the resource allocation ϕ) are complicating variables in the sense that if the link rates are fixed, the problem is simply a network flow problem which can be solved using the techniques from [5],

JOHANSSON AND JOHANSSON: MATHEMATICAL DECOMPOSITION TECHNIQUES FOR DISTRIBUTED CROSS-LAYER OPTIMIZATION OF DATA NETWORKS 3

[4]. To emphasize this fact, we will sometimes refer to the link rate constraints as coupling constraints. This terminology is also natural from a communications point-of-view, as these constraints couple the physical layer and the network layer. Below, we will review three classes of decomposition principles: dual, primal, and primal-dual. We use primal and dual in their mathematical programming meaning: primal indicates that the optimization problem are solved using the original formulation and variables, and dual indicate that the original problem has been rewritten using Lagrangian relaxation. Contrary to most literature on mathematical programming, which focuses on convex minimization problems, we present all results in the framework of concave maximization. A. Dual decomposition Dual decomposition is sometimes referred to as price directive decomposition. The name comes from the economic interpretation that the network is directed towards its optimal operation by pricing the common resources. The optimal operation of the network layer is then to maximize utility minus the total resource cost, while the physical layer should attempt to maximize total revenue. Constraints on the common resource are not explicitly enforced, but the demand is aligned with the supply using a simple pricing strategy: increase the prices on resources that are in shortage and decrease the price of resources that are in excess. From a mathematical perspective, we apply Lagrange duality to the coupling constraint of (1), Rs  c, and form the partial Lagrangian L(s, c, λ) = u(s) − λT Rs + λT c The dual function is defined as g(λ) = max L(s, c, λ) s∈S, c∈C



We have g(λ) ≥ u , where u denotes the primal optimal objective value. Intuitively, the coupling constraint is not present as such, but accounted for using the pricing scheme. We can think of g(λ) as an optimistic estimate of the total utility. Note that g(λ) is separable and can be written as g(λ) = max{u(s) − λT Rs} + s∈S    Network

max{λT c} c∈C   

Resource Allocation

Thus, to evaluate the dual function for a given price vector we need to solve a network subproblem (which coincides with the end-to-end rate allocation problem in optimization flow control) and a resource allocation subproblem. The optimal prices are obtained by solving the dual problem minimize g(λ) subject to λ  0 The interpretation is that the network should adjust prices to their optimal values. The minimization tries to recreate the effect of link-rate constraints on the relaxed problem. The optimal objective value of the dual problem is denoted d . If the primal objective value are equal to the dual objective value,

d = p , then there is no duality gap and strong duality holds, and it does not matter which problem you solve. One condition that guarantees that strong duality holds is Slater’s condition: if there exist an interior point where strict inequalities hold, then strong duality holds. See [16, chapter 5] and [17] for more details on duality. P ROPOSITION 1 Let λ(k)  0 be given, and let (s(k) , c(k) ) be associated optimal solutions to the network and resource allocation subproblems, respectively. Then, a subgradient of g(λ(k) ) is given by Rs(k) − c(k) . Proof: See [17, section 8.1]. Thus, the dual problem can be solved using the following subgradient iteration. P ROPOSITION 2 Let α(k) be a sequence of diminishing stepsizes according to (31), and the subgradients are bounded by some constant C. Then, the subgradient iteration   λ(k+1) = PΛ λ(k) + α(k) (Rs(k) − c(k) ) (4) where PΛ {·} denote projection on the positive orthant, converges in the sense that limk→∞ g(λ(k) ) = u . Proof: See the subgradient section in the appendix. From now on, PK {·} will denote projection on the convex set K. There are a number of issues in applying dual decomposition in a distributed setting. The first one comes from the fact that the above result only holds for diminishing stepsize sequences. From appendix, we know that convergence can also be obtained for constant step-sizes provided that the dual function is differentiable and Lipschitz continuous. The following result describes differentiability properties of the NUM-dual. P ROPOSITION 3 Let S and C be compact sets. If the dual function g(λ) has a unique solution (s, c) for every λ  0, then the dual function is everywhere continuously differentiable. Proof: See [17, proposition 8.1.1]. Another, maybe even more critical, issue is that the primal variables (the end-to-end rate and link rate allocations) obtained for a given link price vector λ may not be feasible, even if the dual variables are set to their optimal values [15]. The reason for this is that the dual function may be non-smooth, in which case the optimal primal solution is typically a nontrivial convex combination of extreme subproblem solutions. Within linear programming, this property has been referred to as the non-coordinability phenomenon. In off-line schemes, this problem can be circumvented if one can devise a method for supplying a feasible primal solution (see [6] for a simple primal heuristic for the problem at hand). The following result gives us conditions for convergence of the subsystem solutions. P ROPOSITION 4 If the Lagrangian is maximized for a unique s and c for every λ  0, then the subsystem solutions (s(k) , c(k) ) produced by (4) converge to their primal optimal values.

4

IR-S3-RT-0501

Proof: See the proof section in the appendix. There are several approaches for attaining primal convergence in dual decomposition schemes. One approach is to add a strictly concave term to the maximization objective, as is done in proximal point methods (see, e.g., [18]). The original problem is then replaced by the equivalent formulation maximize subject to

u(s) − εc − c˜22 Rs  c, c ∈ C,

c˜ ∈ RL

This makes the dual function smooth, and convergence of the primal variables in the limit follows. For centralized optimization problems, one may also impose primal convergence by solving a master problem (see [19]); however, since the master problem is typically a large-scale convex optimization problem, the approach appears less applicable to protocol engineering. Another alternative is based on forming weighted averages of subproblem solutions in a way that guarantees convergence to the optimal primal variables in the limit [20]; however, since the iterates themselves may not be wellbehaved, this approach is not always desirable. B. Primal decomposition Primal decomposition is also called resource-directive decomposition. Rather than introducing a pricing scheme for the common resources, the primal decomposition approach sequentially updates the resource allocation to maximize the total network utility. To perform the redistribution of resources, links need to estimate the marginal improvements of network performance they could provide, given an increase in resource allocation. It turns out that this information can be obtained from the Lagrange multipliers, or shadow prices, of the optimization flow control problem: a high shadow price indicates that a large increase in utility could be obtained by allocating more resources to the link, a small shadow price indicates the opposite. Mathematically, re-write the optimization problem in terms of the primal function ν(c) = max{u(s) | Rs  c, s ∈ S} The primal function is simply the stationary performance of the optimization flow control for the given resource allocation. Under the explicit model (2), it is more natural to consider the primal function as a function of ϕ, i.e., ν(ϕ) = max{u(s) | Rs  c(ϕ), s ∈ S} Note that the primal function is a pessimistic estimate of the achievable network utility, since the resource allocation may be fixed at sub-optimal values. The optimal network utility can be found by solving the primal problem maximize ν(c) subject to c ∈ C Although the primal function is potentially non-smooth, a subgradient of the primal function is given by the following proposition. P ROPOSITION 5 Let λ be a vector of optimal dual variables for the optimization flow control problem. Assume that the

allocated capacity c is such that there exist a strictly feasible flow vector s, i.e., ∃s ∈ S, Rs < c. A subgradient of ν(c) is given by λ. Proof: See [17, section 6.5.3]. Hence, we can invoke the subgradient results from Appendix and state the following result P ROPOSITION 6 Let α(k) be a sequence of diminishing stepsizes fulfilling (31). Let the subgradients be bounded by some constant C then the subgradient iteration   c(k+1) = PC ck + α(k) λ(k) converges in the sense that limk limk→∞ ν(c(k) ) = u .

(k) (k) ,s ) to∞ (c

= (c , s ),

Proof: See the subgradient section in the appendix. Contrary to dual decomposition, primal decomposition guarantees primal convergence by construction. However, differentiability and Lipschitz continuity are still important if we would like to guarantee convergence to the optimal point with a constant step-size. P ROPOSITION 7 The primal function ν(c) is differentiable at c with the derivative λ, if the corresponding dual problem only have one unique solution λ. Proof: See [18, proposition 6.2.1] C. Primal-dual decomposition In primal-dual decomposition schemes, one tries to exploit both primal and dual problem structures. One class of methods, sometimes called mixed decomposition applies price- and resource-directive to different components within the same system [21]. We will make use of an alternative decomposition scheme, called cross decomposition [22]. In essence, cross decomposition is an alternating price directive and resource directive decomposition approach. One alternates between the primal and dual subproblems (the primal and dual subproblems should be clearly stated as optimization problems). There is no master problem involved. In general, the pure cross decomposition approach does not converge. However, mean value cross decomposition (MVC), where one uses the average value of all previous solutions are used do converge, which has recently been established in [23]. The MVC algorithm [23] solve the following problem maximize

u(s) + v(c)

subject to A1 (s) + B1 (c)  b1 A2 (s) + B2 (c)  b2 s ∈ S,

(5) (6)

c∈C

where u(s) and v(c) are concave, A1 (s), A2 (s), B1 (c), and B2 (c) are convex functions, and finally that S and C are convex and compact sets. It is also assumed that for any c ∈ C there exist a strictly interior point, implying that strong

JOHANSSON AND JOHANSSON: MATHEMATICAL DECOMPOSITION TECHNIQUES FOR DISTRIBUTED CROSS-LAYER OPTIMIZATION OF DATA NETWORKS 5

duality holds for the two coupling constraints. Define the partial Lagrangian as L(s, c, λ) = u(s) + v(c) − λT (A1 (s) + B1 (c) − b1 )

L(s, c, λ ) ≤ L(s , c , λ ) ≤ L(s , c , λ)

and define K(c, λ) for any c ∈ C and λ  0 as  K(c, λ) = max L(s, c, λ) {A2 (s) + B2 (c)  b2 } s∈S

The primal subproblem is defined as minimize K(c, λ) subject to λ  0 and the dual subproblem is defined as maximize K(c, λ) subject to c ∈ C

(7)

Using strong duality (applicable by assumption), the primal subproblem can be rewritten as maximize u(s) + v(c) subject to A1 (s) + B1 (c)  b1 A2 (s) + B2 (c)  b2 s∈S

(8)

The MVC algorithm is now as follows: (0)

A LGORITHM 1 (MVC) Let c(0) ∈ C and λ  0. At step k (k−1) • Solve the primal subproblem (8) for c to obtain λ(k) . (k−1) • Solve the dual subproblem (7) for λ to obtain c(k) . • Update the averages via λ

(k)

=

k 1 (i) λ , k i=1

c(k) =

a primal-dual optimal point to the cross-layer NUM problem if and only if (s, c) ∈ S × C, λ  0 and (s , c , λ ) forms a saddle point of the Lagrangian, in the sense that

k 1 (i) c k i=1

Go to step k + 1. P ROPOSITION 8 The MVC algorithm 1 converges to the op(k) = λ timal point, i.e. limk→∞ c(k) = c and limk→∞ λ under the assumptions given Proof: See [23] Moreover, for the cross-layer network utility maximization problem, convergence can still be established if one uses the latest dual variables and only averages over the subproblem solutions [3]. D. Saddle-point computations and min-max games An alternative path to finding primal-dual optimal solutions to the cross-layer NUM problem goes via the saddle-point characterization of optimal points. By weak duality, we have

for all (s, c) ∈ S × C, λ  0 (cf. [18, Proposition 5.1.6]). In other words, λ minimizes L(s , c , λ) while (s , c ) maximizes L(s, c, λ ). One of the most well-known algorithms for finding saddle points is the algorithm due to Arrow-Hurwicz [24]   s(k+1) = PS s(k) + α(k) ∇s L(s(k) , c(k) , λ(k) )   c(k+1) = PC c(k) + α(k) ∇c L(s(k) , c(k) , λ(k) )   λ(k+1) = PΛ λ(k) − α(k) ∇λ L(s(k) , c(k) , λ(k) ) where α is a step-length parameter. Although the iteration is not guaranteed to converge (unless one imposes the additional requirements of strict convexity-concavity [25]), it provides a unified view of the primal and dual decomposition methods. In particular, the decomposition schemes can be interpreted as methods that run the above updates on different timescales. Primal decomposition lets the s and λ dynamics run on a fast time-scale (essentially, until convergence) while the resource updates are run on a slow time-scale. Similarly, dual decomposition can be seen as letting the s and c updates run on a fast time-scale, while the λ variables are updated slowly. Further insight into the decomposition schemes can be gained from the following zero-sum game interpretation of the max-min inequality: consider a game where the dual player selects λ, while the primal player picks s, c and collects L(s, c, λ) dollar from the dual player. If the dual player goes first, he will try to minimize the amount that he can be forced to pay, i.e., he will let (λ) = arg minλ0 maxs∈S,c∈C L(s, c, λ) resulting in the payoff u. Conversely, if the primal player goes first, she will try to maximize the amount that the dual player is guaranteed to pay and thus let (s, c) = arg maxs∈S,c∈C minλ0 L(s, c, λ), leading to the payoff u. The min-max inequality simply states that it is best to go second, u ≥ u. In convex games with strong duality there is no advantage to go second, since the inequality holds with equality. The mean-value cross decomposition can be seen as a repeated zero-sum game where the dual and primal players act alternatingly. During the course of the game, the players remember earlier moves and decide their own strategy under the assumption that the other will use the average strategy over the history of the game.

u = max min L(s, c, λ) ≤ u ≤ min max L(s, c, λ) = u

IV. C ONGESTION C ONTROL WITH F IXED L INK C APACITIES

This inequality, known as the max-min inequality, simply states that the primal problem underestimates the optimal value, while the dual problem overestimates. Under strong duality, the inequality holds with equality as the primal and dual optimal values are equal. The optimal point can then be given the following alternative characterization: (s , c , λ ) is

Since traditional congestion control, or its optimization flow control formulation, will be a basic building block for our novel schemes, this section contains a brief review some of the main points of congestion control with fixed capacities. A mathematical formulation of a distributed end-to-end flow control scheme over TCP/IP has been developed in [4], [5].

s∈S, c∈C λ0

λ0 s∈S, c∈C

6

IR-S3-RT-0501

The objective is to maximize the aggregate network utility subject to the link capacity constraints. For wired networks the problem takes the following form maximize u(s) subject to Rs  c,

s∈S

(9)

where the variables are collected in the end-to-end rate vector s, while the link capacity vector is assumed to be fixed. A distributed algorithm to this problem can be derived via dual decomposition, assuming that there exist a strictly interior point. Introducing Lagrange multipliers λ for the capacity constraints we can form the Lagrangian L(s, λ) = u(s) + λT (c − Rs) Since the Lagrangian is separable in the end-to-end rates sp , the dual function g(λ) = max L(s, λ) s∈S

can be evaluated by letting sources optimize their rates individually based on the total congestion price, i.e., by letting sp = arg max up (z) − qp z z∈S

(10)

where qp = [RT λ]p . Moreover, the dual problem to (9) minimize g(λ) subject to λ  0 can be solved by the projected gradient iteration   (t+1) (t) = PΛ λl + α(t) [Rs(t) ]l − cl λl where {α(t) } is a step length sequence. Note that links can update their congestion prices based on local information: if the traffic demand across link l exceeds capacity, the congestion price increases; otherwise it decreases. Convergence of the dual algorithm has been established in [5]. As argued in [5], [26], the equilibrium points of a wide range of TCP protocols can be interpreted in terms of sources maximizing their marginal utilities (utilities minus resource costs). Link algorithms generate prices to align the sources’ selfish strategies with the global optimum. Most of the common TCP/AQM variants can be identified with different utility functions and different laws for updating the link prices. V. D ECOMPOSITION AS G UIDING P RINCIPLE FOR D ISTRIBUTED C ROSS - LAYER P ROTOCOL D ESIGN Modern networked systems are designed for a trade-off between a multitude of objectives, including optimality of performance, simplicity and flexibility in implementation, operation and maintenance, as well as robustness to uncertainties and variations. Trying to master this complex trade-off often results in highly structured designs and implementations, such as the layered architecture of the OSI reference model. The approach advocated in this work relies on a mathematical network model that exposes the key interconnections between the network layers. Based on this model, we formulate the optimal network operation under user cooperation

and cross-layer coordination as a global network utility maximization problem. To transform the centralized optimization problem into distributed protocols, we must find efficient ways for guiding different functional modules and network elements towards the common goal. Inspiration for such coordination schemes can be found in mathematical decomposition techniques: Applying mathematical decomposition to the global optimization problem allows us to identify critical information that needs to be communicated between nodes and across layers, and suggests how network elements should react to this information in order to attain the global optimum. In many cases, the underlying structure is such that these optimal rate and resource allocation schemes suggested by the decomposition schemes resides in separate networking layers. The layers are only loosely coupled via a set of critical control parameters. As it turns out, the basic analysis suggests that these critical parameters are the Lagrange multipliers of the optimization problem, or in terms of physical parameters, the backlogs of the individual transmitter queues. An underlying assumption of this work is that if a solution procedure of decomposition type has mathematical convergence, then it corresponds to a possible network architecture. We might even take a step further and make a conjecture that a computationally efficient solution method corresponds to a better way of organizing the networking stack than a less computationally efficient method does. The different decomposition schemes allow us to develop network architectures and protocols with different properties in terms of convergence speed, coordination overhead and the time-scale on which various updates should be carried out. In addition, they allow us to find distributed solutions to problems where the dual decomposition approach is not immediately applicable. We now demonstrate these ideas on the two problem classes described in Section II. VI. E XAMPLE I: N ETWORKS WITH O RTHOGONAL C HANNELS AND N ETWORK - WIDE R ESOURCE C ONSTRAINT As a first application, we consider the design of utility maximizing protocols for systems with orthogonal channels and a global resource constraint in the physical layer. A. Optimality Conditions and Decomposition Approaches The problem formulation (2) can be simplified using the properties of an optimal point. We will make the following assumptions A SSUMPTION B i) The channel capacities, c(ϕ), are strictly concave, twice differentiable, increasing, and c(0) = 0. ii) Every column and every row in the routing matrix, R, contain at least one positive entry. iii) The problem is feasible, L −1 N l=1 cl ( i=1 Ril (smin + )) < ϕtot , where  is a small positive number. The routing matrix assumption implies that all sources are assumed to be sending and all links are used by at least one source. The assumption on smin and ϕtot means that ϕtot is large enough to allow that all sources can send just above the

JOHANSSON AND JOHANSSON: MATHEMATICAL DECOMPOSITION TECHNIQUES FOR DISTRIBUTED CROSS-LAYER OPTIMIZATION OF DATA NETWORKS 7

minimum sending rate using the links indicated by the routing matrix. The optimal point can then be characterized as follows: P ROPOSITION 9 Under assumptions A and B, the optimal point, (s , ϕ , λ ), to (2) is characterized by   Rs = c(ϕ ) l ϕl = ϕtot l = 1, ..., L λl cl (ϕl ) = ν  ϕl ≥ ϕmin (11) p = 1, ..., P sp ≥ smin Proof: See the proof section in the appendix Thus, in the optimal solution to (2), the common resource is fully utilized, all links are bottlenecks and the marginal link revenues λl cl (ϕ) are equal. One consequence of this is that it is possible to consider a simpler, but equivalent, problem maximize u(s) subject to  Rs  c(ϕ), l ϕl = ϕtot ,

smin  s  smax ϕmin  ϕ

(12)

The problems are equal, in the sense that they  share the same optimal solution. Thecrucial change is that l ϕl ≤ ϕtot has been changed to l ϕl = ϕtot . Moreover, some bounds have been introduced; the upper bound, smax , on s and the lower bound ϕmin , on ϕ are technical conditions that do not change the optimal point but make the analysis simpler. This simpler problem will be solved using two approaches, dual and primal decomposition. We also introducethe sets S = {s|smin  s  smax } and Φ = {ϕ|ϕmin  ϕ, l ϕl = ϕtot } 1) Dual approach: Introducing Lagrange multipliers λl , l = 1, . . . , L for the capacity constraints in (12), we form the partial Lagrangian

L(s, ϕ, λ) = up (sp ) − λT (Rs − c(ϕ)) p

and the associated dual function



T up (sp ) − λ Rs + max λT c(ϕ) g(λ) = max s∈S

p

ϕ∈Φ

Thus, the dual function decomposes into a network subproblem and a resource allocation subproblem. The network subproblem is identical to the source algorithm in optimization flow control [5], while the second subproblem can be dealt with in section VI-B if you identify   using the algorithms λ c (ϕ ) with f (ϕ ). The corresponding Lagrange l l l i i l i dual problem is minimize g(λ) subject to λ  0 Since the link capacities are assumed to be strictly concave, the partial Lagrangian is strictly concave in (s, ϕ) and the dual function is differentiable [18, Proposition 6.1.1] with ∇g(λ) = c(ϕ (λ)) − Rs (λ) The dual variables can be updated using a projected gradient method if the gradient is known to be Lipschitz. However, the gradient cannot be guaranteed to be Lipschitz, and we will therefore use a subgradient method with diminishing step size, see the subgradient section in the appendix.   (t+1) (t) (t) λl = PΛ λl − α(t) (cl (ϕl ) − [Rs(t) ]l ) (13)

This update can be carried out locally by links based on their current excess capacities. The dual algorithm is in summary: A LGORITHM 2 (D UAL ) • • •

In step k, find ϕ by solving the resource allocation problem using the method in Section VI-B. Find s by solving the network subproblem using the source algorithm (10) in section IV. Use these optimal values to compute the gradient and update λ with (13). Go to step k + 1.

P ROPOSITION 10 Under assumptions A and B, the dual algorithm (2) with step sizes according to (31) converge to the optimal solution, i.e., limt→∞ ϕ(t) = ϕ , limt→∞ s(t) = s , limt→∞ λ(t) = λ . Proof: See the proof section in the appendix. Note that the optimal resource allocation and source rates can be found in parallel, but the optimal solutions to both subproblems are found before the dual variables are updated. From a practical perspective, this approach has the disadvantage that resource allocations have to be done at a fast time-scale and that the resource allocation algorithm (at least in the most basic analysis) has to be executed to optimality before the dual variables can be updated. 2) Primal approach: Now we continue with the primal decomposition approach. We re-write (12) as maximize ν(ϕ) L subject to l=1 ϕl = ϕtot ,

ϕmin  ϕ

(14)

where we have introduced ν(ϕ) = max {u(s)|Rs  c(ϕ)} s∈S

(15)

Note that ν(ϕ) is simply the optimal network utility that can be achieved by optimization flow control under resource allocation ϕ. Consequently, to evaluate ν(ϕ) we can simply fix the resource allocation and execute the distributed congestion control algorithm. Before attempting to solve the problem (14), we will establish some basic properties of ν(ϕ). P ROPOSITION 11 Under assumptions A and B, ν(ϕ) is concave and a subgradient, h(ϕ), of ν(ϕ) at ϕ is given by   h(ϕ) = λ1 c1 (ϕ1 ) · · · λL cL (ϕL ) where λl are optimal Lagrange multipliers for the capacity constraints in (15). Proof: See the proof section in the appendix Since a subgradient of ν is available, it is natural to use a projected subgradient algorithm   ϕ(t+1) = PΦ ϕ(t) + α(t) h(ϕ(t) ) (16)

8

IR-S3-RT-0501

with diminishing stepsize, α(t) . Here PΦ {·} denotes distributed projection on the set Φ, i.e., solves the following projection problem in a distributed fashion maximize −||ϕ − ϕ0 ||22 L subject to l=1 ϕl = ϕtot ,

ϕmin  ϕ

(17)

The projection problem has a separable concave objective L 0 2 −(ϕ function since −||ϕ − ϕ0 ||22 = l − ϕl ) , and l=1 such problems will be addressed in section VI-B. The primal algorithm can be summarized as follows: A LGORITHM 3 (P RIMAL ) • • •

In step k, find the optimal s by solving the optimization flow problem using the method in section IV. Use these optimal values to compute the subgradient. Execute distributed projection using methods in Section VI-B, and use this to update ϕ with (16). Go to step k + 1

P ROPOSITION 12 Under assumptions A and B, the primal algorithm 3 with step sizes according to 31 converge to the optimal solution, i.e. limt→∞ ϕ(t) = ϕ , limt→∞ x(t) = x . Proof: See the proof section in the appendix. The primal method relies on solving the optimization flow problem on a fast time-scale and performing incremental updates of the resource allocation in an ascent direction of the total network utility on a slower time-scale. The source rate and link price updates are carried out in a distributed way, similarly to optimization flow control. As we will show next, the resource update can be performed in a distributed manner that only relies on communication and resource exchanges between direct neighbors. Put together, this results in a completely distributed algorithm for the network utility maximization problem. B. Solving the Resource Allocation Subproblem The simple resource allocation problem L maximize l=1 fl (ϕl ) subject to l ϕl = ϕtot ϕl ≥ 0

(18)

is a standard problem in economics, and several solution approaches exist. We assume that fl (ϕl ) is twice differentiable and strictly concave. The problem is central to the primal and dual approach, since it appears at the physical layer of ˜ ˜ = f (ϕ˜ + ϕmin ) and let both  algorithms. To see this, let f (ϕ) ˜l = ϕtot − L · ϕmin , then ϕ = ϕ˜ + ϕmin . The optimal lϕ point of (18) can be characterized with the KKT conditions 1 fi (ϕi ) = ψ  , ϕi > 0    fi (ϕ i ) ≤ ψ , ϕi = 0  i ϕi = ϕtot

(19)

1 For this problem, the conditions are also known as Gibb’s lemma, after J. Willard Gibbs, who laid the foundation of modern chemical thermodynamics in ”Equilibrium of Heterogenous substances”, 1876

These properties can be used to devise a way to find the optimal point as we will see later. We will present two algorithms that solve the resource allocation problem: the weighted gradient approach and the direct negotiation approach. 1) Weighted gradient approach: The algorithms in [27], [28] solve (18) under the assumptions that fi are concave, twice continuously differentiable, with the second derivative bounded below and above, mi ≤ fi (ϕl ) ≤ ni , with ni < 0, mi known. The algorithm presented in [27] can also handle non-negativity constraints on resources, by identifying the ϕ:s that will be zero at optimality and finding an appropriate starting point. The resource updates rely on nearest neighbor communication only, and can be written in vector form as ϕ(t+1) = ϕ(t) + W ∇f (ϕ(t) )

(20)

The condition 1T W = 0, implies that new resource allocations ϕ(t + 1) will remain feasible. A simple way of guaranteeing convergence is that W should satisfy the following conditions (the Metropolis weight scheme from [28])   1 1 , + , j ∈ N (i) Wij = −min |N (i)|m i |N (j)|mj  Wii = − j∈N (i) Wij Wij = 0, otherwise where  is a small positive constant and N (i) is the set of neighboring links to i. Note that the limitation that links should only be allowed to communicate and exchange resources with its neighbors turns up as a sparsity constraint on W . The idea that nodes should give away resources to neighbors that have better use for them is very intuitive. In e.g. [29] a similar scheme, based on heuristics, is suggested to be used in dynamic synchronous transfer mode (DTM) optical networks. In DTM networks the nodes have tokens that give them right to a certain bandwidth. The nodes are suggested to transfer tokens to a neighbor that have more use of the bandwidth; more precisely, a token is transferred if the expression |(priority of node i) · (free channels of node i + 1) − (priority of node i+1)·(free channels of node i)| decreases by a token transfer. The weighted gradient algorithm gives a justification for this heuristic, and provides a precise definition of the priorities. 2) Direct Negotiation Approach: As an alternative, the resource allocation subproblem can be solved via direct negotiation. This scheme requires the network to be ordered in a ring structure (or, in fact, any other structure providing order). This is not a major restriction, since a similar structure is also needed for determining a starting point that guarantees nonnegativity of the iterates of the weighted gradient method [27]. The approach is based on the so called waterfilling method, see e.g., [16]. We start with defining ⎧ ψ < fi (ϕtot ) ⎨ ϕtot , −1  (21) hi (ψ) = (f ) (ψ), fi (ϕtot ) ≤ ψ < fi (0) ⎩ i 0, fi (0) ≤ ψ which is a continuous function that is decreasing in ψ, and the inverse of fi (ϕi ) is well defined since fi (ϕi ) is strictly concave. Also introduce the sum

hi (ψ) (22) h(ψ) = i

JOHANSSON AND JOHANSSON: MATHEMATICAL DECOMPOSITION TECHNIQUES FOR DISTRIBUTED CROSS-LAYER OPTIMIZATION OF DATA NETWORKS 9 1

which is a sum of decreasing continuous functions, and it is therefore also continuous and decreasing. Now, the waterfilling method is to find ψ  such that

10

h(ψ  ) = ϕtot

10

A lower bound for ψ  is ψ = mini fi (ϕtot ), and an upper bound is ψ = maxi fi (0). Thus, we know that ψ ≤ ψ  ≤ ψ, and since h(ψ) is a continuous and decreasing function of ψ, we can use binary search to find ψ  that fulfills (23). Start ψ−ψ with taking ψ = 2 . If h(ψ) is below ϕtot then set the upper bound to ψ, i.e., ψ = ψ, but if h(ψ) is above ϕtot then set the lower bound to ψ, i.e., ψ = ψ. Repeat until (23) is satisfied with desired accuracy. The algorithm can be executed in a distributed way: One node takes the lead and announces the initial value of ψ . The information needed to be communicated to the next node in the ordered structure is the total amount of reserved resource and the current value of ψ. After a complete cycle, the total sum can be evaluated and the interval can be cut in half as mentioned above. Repeat this process until desired accuracy is reached. C. Numerical Results To illustrate the performance of the approaches, we apply the dual and the primal algorithms to a sample 8-node network used in [2]. The channel capacities are chosen to be the Shannon capacities and the utility function is log(s). The routing matrix is without full rank, but every row and column contain at least one positive entry. We solve the optimization problem with the dual and the primal algorithm, see numerical results in fig 1. Both algorithms are converging to the optimal point. The dual algorithm descends in a smoother fashion, which can be explained by that the dual function is differentiable. However, the dual algorithm seems to be sensitive to the starting point and the initial step size (this cannot be seen in the current plot), and the iterates are only primal feasible in the limit. If some of these are badly adjusted then the convergence rate is significantly reduced. The primal algorithm exhibits the typical subgradient method behavior, i.e., the descent is not smooth. However, it seems to be more robust with respect to starting point and initial step size, and the iterates are always primal feasible. VII. E XAMPLE II: C ROSS - LAYER O PTIMIZED S CHEDULING IN STDMA W IRELESS N ETWORKS Our second application considers network utility maximization of wireless networks that employ spatial-reuse TDMA (S-TDMA). S-TDMA is a collision-free access scheme that allows spatially separated radio terminals to transmit simultaneously when the interference they incur on each other is not too severe. We will consider a particular instance of STDMA networks that offers a single communication rate, ctgt to all links that obey both primary and secondary interference constraints. The primary interference constraints require that a node communicates with at most one other node at a time (these constraints are typically imposed by the underlying

0

jj'(k) ¡ '?jj

(23)

Dual Algorithm Primal Algorithm

-1

10

-2

10

0

50 100 Main loop iterations

150

Fig. 1. The norm of the resource allocation minus the optimal resource allocation versus main loop iterations for the dual and primal algorithms.

technology, e.g., nodes equipped with omnidirectional antennas and no multi-user detectors). The secondary interference constraints require that the signal-to-interference and noise ratio at the receiver of active links exceeds a target value G P  ll l ≥ γtgt σl + j=l Glj Pj Here, Pl is the power used by the transmitter of link l, σl is the thermal noise power at the receiver of link l, and Glj denotes the effective power gain from the transmitter of link j to the receiver of link l. We say that a subset L ∈ {1, . . . , L} is a feasible transmission group if all links in L obey the primary and secondary interference constraints when Pk = 0 for k ∈ L . Associated to each feasible transmission group is a feasible transmission rate vector where cl = ctgt for l ∈ L and cl = 0 otherwise. By time-sharing over a large number of time-slots, we can achieve any average link-rate vector in the convex hull of the feasible transmission rate vectors. Thus, the problem is on the form (3). A. Decomposition approach It is important to understand that the classical approach of dual decomposition cannot be used for computing a schedule. The main reason is that the dual function g(λ) = max{u(s) − λT Rs} + max{λT c} s∈S

c∈C

is linear in cl and will return a single transmission group in each iteration. Since each transmission group only activates a few links, many links will have zero rates until the next iteration of the algorithm can be carried out by the system and a new transmission group activated. If the communication overhead for solving the scheduling subproblem is negligible, it may be viable to perform the scheduling computations in each time slot. However, as we will see below, negotiating for transmission rights and adjusting transmission powers may require substantial overhead in interference-limited systems. It is then more attractive to maintain a transmission schedule with multiple time-slots and update the schedule less frequently. As it turns out, a distributed algorithm for computing

10

IR-S3-RT-0501

an asymptotically optimal schedule can be derived via meanvalue cross decomposition. For technical reasons (compactness), we have to add the requirement that c  cmin to (3), i.e., to consider maximize u(s) subject to Rs  c, s ∈ S cmin  c c ∈ C1

(24)

If cmin is chosen sufficiently small (this require that smin is sufficiently small as well), then the modified problem will have the same optimal solution as the original one. Simulation indicate that if cmin is small, it can in fact be neglected. Now we use a mean value cross decomposition approach to solve this modified problem. It is also possible to use a primal decomposition approach [3] and that algorithm converges slightly better, but that approach requires more assumptions to work. It is also instructive to show how to apply cross decomposition techniques to a NUM problem. Recall that the mean-value cross decomposition alternates between the primal and dual subproblems using the average values of the computed primal and dual variables as inputs. From the primal subproblem maximize u(s) subject to Rs  c(k) s∈S

(25)

we extract the optimal Lagrange multipliers λ(k) for the capacity constraints, while the (relevant part of) dual subproblem (k)

maximize cT λ subject to c ∈ C1

cmin  c

(26)

yields c(k) . Since the primal subproblem is an instance of optimization flow control, mean-value cross-decomposition suggests the following approach for solving the network utility maximization problem (3): based on an initial schedule, we run the TCP/AQM scheme until convergence (this may require us to apply the schedule repeatedly). We refer to this phase as the data transmission phase. Nodes then record the associated equilibrium link prices for their transmitter queues and maintain their average values in memory. During the subsequent negotiation phase, we try to find the transmission group with largest average congestion-weighted throughput, and augment the schedule with the corresponding transmission rate vector (effectively increasing the number of time slots in the schedule by one). If the time slots are of equal length, the offered link rates of the resulting schedule will equal the average of all computed transmission groups. The procedure is then repeated with the revised schedule. Our algorithm can be summarized as follows A LGORITHM 4 (C ROSS ) Let k = k0 and c(k0 ) cmin • In step k, evaluate the primal problem by solving the optimization flow control problem (25) with the capacity c(k−1) , and let λ(k) be the associated equilibrium link (k) prices. Compute the associated λ (k) by solving • Compute a new transmission rate vector c the scheduling subproblem (26) with the link prices

(k−1)

λ . Augment the schedule with this transmission group and compute the associated c(k) . Go to step k + 1. Note that an initial schedule can be constructed by letting k0 = L and using a pure TDMA schedule. The algorithm is a mean value cross decomposition algorithm and convergence follows as shown in the following proposition. P ROPOSITION 13 Under assumption A, the algorithm 4 converge to the optimal solution, i.e., limk→∞ c(k) = c . Proof: See the proof section in the appendix. Our theoretical analysis applies to the case when we augment the schedule indefinitely, while in practice one would like to use schedules with limited frame length. Although we do not have any theoretical results for computing schedules of fixed frame length, simulations reported in [3] indicate that the method can indeed be adopted to this case. B. Solving the Resource Allocation Subproblem The final component of a distributed solution to the NUM problem for S-TDMA networks is a distributed mechanism for solving the resource allocation subproblem in each negotiation phase. Although a fully distributed scheme that solves the subproblem to optimality appears out of reach, a suite of suboptimal schemes have been proposed and investigated in [3]. We will outline one of these approaches below. Since the scheduling subproblem (26) is linear in cl , an optimal solution can always be found at a vertex of the capacity region, i.e., among one of the feasible transmission groups. We will consider a distributed solution that is based on two logical steps: first, a candidate transmission group is formed by trying to maximize the objective function subject to primary interference constraints only; then, transmitters adjust their powers to allow the most advantageous subset of candidate links to satisfy the secondary interference constraints. Clearly, some links may need to drop out of the candidate set during the power negotiation phase, and the resulting transmission group may be suboptimal. A remarkable feature of the solution is that it is completely distributed and that no single node has global knowledge about the transmission schedule. Each transmitter only keeps a local schedule of which time slots that it is assumed to transmit and receive data. The candidate group formation is based on the observation that the primary constraints are satisfied if only one link in each two-hop neighborhood is activated. In an attempt to maximize the objective of the dual subproblem, the link with the highest average link price in a two-hop neighborhood will assign itself membership to the candidate set. To allow links to make this decision, we assume that the transmitters of each link forwards information about its link price to the receiving node. Each node maintains the maximum link price on its incoming and outgoing links. By collecting the maximum link prices from its neighbors, each node can decide if one of its own transmitters should enter the candidate set. Once a link has identified itself as a member of the candidate set, it will start contending for transmission rights.

JOHANSSON AND JOHANSSON: MATHEMATICAL DECOMPOSITION TECHNIQUES FOR DISTRIBUTED CROSS-LAYER OPTIMIZATION OF DATA NETWORKS 11

where δ > 1 is a control parameter. Links change status from inactive to active when their measured SINR exceeds the target. Inactive nodes that consistently fail to observe any SINR improvement enters a voluntary drop-out phase and go silent (see [30] for details). The negotiation phase is initialized by letting I (0) equal the candidate set. Links then execute the DPC/ALP algorithm for a short period of time, divided into T mini-slots corresponding to the iteration times above. At the end of such a period A(T ) constitutes a feasible transmission group. To increase the likelihood of forming a transmission group with high congestion-weighted throughput, we can let links wait a random time before starting to execute the DPC/ALP algorithm. If the waiting probability is a decreasing function of the link price, highly loaded links will tend to start ramping up transmit powers before lightly loaded ones (and thus, have increased chances to obtain transmission rights). C. Numerical Results To demonstrate typical performance of the approach, we apply Algorithm 4 to a hypothetical indoor wireless LAN scenario used in [19], and compare it with other algorithms. Figure 2 shows the objective value versus number of slots. The straight line is the optimal performance computed using the off-line approach described in [19], the dashed line is the optimal (variable time-slot length) TDMA schedule, the dash-dotted line is a variant of the Cross algorithm without averaging of the link prices combined with the DPC/ALP approach to solve the resource allocation (see [3]), and finally the dotted line is the Cross algorithm with the subproblem solved to optimality. The Cross based algorithms perform significantly better than the optimal (variable time-slot length) TDMA schedule and are improving for every iteration. VIII. C ONCLUSIONS This paper has presented three distinct techniques that can be used to engineer utility-maximizing protocols: primal, dual,

110 100 90 Objective value

In our approach, links enter the transmission group one-byone, adjusting their transmit powers to maximize the number of links in the transmission group. The approach exploits the properties of distributed power control with active link protection (DPC/ALP) [30]. The DPC/ALP algorithm is an extension of the classical distributed power control algorithms (e.g., [31]) which maintains the quality of service of operational links (link protection) while allowing inactive links to gradually power up in order to try to enter the transmission group. As interference builds up, active links sustain their quality while new ones may be blocked and denied channel access. The DPC/ALP algorithm exploits local measurements of SINRs at the receivers and runs iteratively over a sequence of time slots. To describe the algorithm in detail, we introduce A(t) and I (t) as the set of active and inactive links at time (t) t respectively, and let γl be the measured SINR on link l at time t. The DPC/ALP algorithm operates by updating the (t) transmit powers Pl according to

(t−1) tgt (t) δPl γ /γl if l ∈ A(t−1) (t) Pl = (27) (t−1) δPl if l ∈ I (t−1)

80 70 60 50 40

Optimal STDMA Optimal TDMA DPC/ALP Cross

30 20 10 0 0

50

100 150 200 250 Main loop iterations

300

Fig. 2. Network utility as function of schedule length for a number of alternative schemes.

and cross decomposition. These techniques extend the theoretical toolbox available for studying network utility maximization problems, motivate alternative network architectures and suggest protocols where resource allocation updates should be run at different time-scales. In addition, we have demonstrated how these techniques can be used to design protocols that try to maximize network utility for two different networking technologies. Although the theory for network utility maximization is evolving quickly, much work remains to be done. This includes better tools for analyzing protocol dynamics and guaranteeing stable and efficient protocol behavior, as well as better support for analyzing the dependencies that are introduced between the networking layers. Finally, we hope that some of the NUMdesigned cross-layer protocols will face and stand the ultimate test of practical implementations. A PPENDIX A. Subgradients A more thorough background on the methods presented here are given in [18], [16], [17]. In this section we focus on maximize f (s) subject to s ∈ S

(28)

where f (s) is concave and S is convex and closed. The most common method of finding solutions to (28) are gradient algorithms. The gradient of f (s) points in the direction that will give maximum increase and a step should be taken in this direction. However, this only works if f (s) is differentiable. Sometimes the function f (s) is nondifferentiable but so called subgradients exist. The definition of a subgradient is f (s) ≤ f (y) + µT (s − y), ∀s, y ∈ S

(29)

where µ is one of the subgradients. All subgradients are overestimators for concave functions. If there is only one unique subgradient then the function is differentiable. The simplest example of a continuous nondifferentiable function is probably f (s) = |s|. This function is nondifferentiable at s = 0, and the set of all subgradients at s = 0 is {a| − 1 ≤ a ≤ 1}, see fig 3.

12

IR-S3-RT-0501

Subgradients can be used instead of the gradient to find the minimum, and these subgradient methods take steps in the direction of the subgradient, i.e., proceed according to   s(t+1) = PS s(t) + α(t) µ(t) (s(t) ) (30) However, the function value do not need to increase in every step, as in gradient methods. Instead the distance between the iterate and the optimal solution will decrease. To show asymptotic convergence it is enough to use diminishing stepsize and that the subgradients are bounded. The stepsizes that we will use fulfills ∞

t=1

α(t) = ∞,



α(t)

2

0 ∀ i. Using the  assumption that the columns in R has positive entries with i λi Rik −uk (sk ) ≥ 0 and uk (sk ) > 0, give that at least one λi is positive, λi > 0. This with ν  −λl cl (ϕl ) ≥ 0 and cl (ϕl ) > 0 give that ν  > 0. This with ϕl > 0 and ϕl (ν  − λl cl (ϕl )) = 0 give that λl > ∀ l give that 0 and ν  = λl cl (ϕl ) ∀ l. ν  > 0 and λl > 0      ϕ = ϕ and Rs = c(ϕ ). Finally, since tot p Rlp sp ≥ l l −1  R s then ϕ ≥ ϕ = c ( R s ), ∀l. lp min l min lp min p p l

JOHANSSON AND JOHANSSON: MATHEMATICAL DECOMPOSITION TECHNIQUES FOR DISTRIBUTED CROSS-LAYER OPTIMIZATION OF DATA NETWORKS 13

Proof: [Convergence proof of the dual algorithm, proposition 10] We start with showing that the subgradient is bounded. The subgradient, which also is the gradient, is given by ∇g(λ) = c(ϕ (λ)) − Rs (λ) where s is in the interval smin  s  c(ϕtot ) and ϕ is in the interval ϕmin  ϕ  ϕtot . This implies that the subgradient is bounded as follows ||∇g(λ)||2 ≤ ||c(ϕtot ) + Rc(ϕtot )||2 With the assumptions A and B, convergence now follows from [17, proposition 8.2.6]. Proof: [Proof of concavity and subgradient, proposition 11] By strong duality, P

ν(ϕ) = inf sup

λ0 smin s

L

(up (sp ) − sp qp )+ λl cl (ϕl )

p=1

= inf g˜(s(λ)) + λ0

l=1 L

λl cl (ϕl )

l=1

L with qp = l=1 rlp λl . Thus, since ν(ϕ) is the pointwise infimum of concave functions, it is concave. Let λ be the optimal Lagrange multipliers for a resource allocation vector ϕ. For any other resource allocation ϕ, ˜ it holds that 

L



  up (sp ) − sp qp + λl cl (ϕ˜l ) ν(ϕ) ˜ ≤ sup smin s

≤ ν(ϕ)+

L

= ν(ϕ) + L

l=1

λl{cl (ϕl ) + cl (ϕl )(ϕ˜l − ϕl ) − c(ϕl )}

l=1 L

qp

p

λl cl (ϕl )(ϕ˜l − ϕl )

l=1

with = l=1 rlp λl . This, by the definition of a subgradient, concludes the proof. Proof: [Convergence proof of the primal algorithm, proposition 12]We start with showing that the subgradient is bounded. The subgradient is given by   h(ϕ) = λ1 c1 (ϕ1 ) · · · λL cL (ϕL ) and for all ϕ the norm of the subgradient ||h(ϕ)||2 is bounded. Since ϕ is in a compact closed set there must exist a maximum according to the Bolzano-Weierstrass proposition. Define D as D=

sup

ϕmin ϕϕtot

||h(ϕ)||2

(34)

The subgradient is thus bounded by D and with the assumptions A and B, convergence now follows from [17, proposition 8.2.6]. Proof: [Convergence proof of the cross algorithm, proposition 13] The key is to identify the algorithm to be a mean value cross decomposition. We do the following identifications (with MVC notation to the left and STDMA algorithm notation on the right) u(s) = u(s) b1 = 0

v(c) = 0 A2 (s) = 0

A1 (s) = Rs B1 (c) = −c B2 (c) = 0 b2 = 0

The MVC primal subproblem (8) is identified with the STDMA primal subproblem (25). The MVC dual subproblem (7) is identified with the STDMA dual subproblem (26). Hence the STDMA algorithm is an MVC algorithm and convergence follows from [23]. R EFERENCES [1] B. Johansson and M. Johansson, “Decomposition and time-scale design for distributed cross-layer optimization under network-wide resource constraints,” in INFOCOM Student Workshop, Miami, USA, 2005, Poster. [2] ——, “Primal and dual approaches to distributed cross-layer optimization,” in 16th IFAC World Congress, Prague, Czech Republic, July 2005. [3] P. Soldati, B. Johansson, and M. Johansson, “Distributed cross-layer coordination of congestion control and resource allocation in S-TDMA wireless networks,” in IEEE Infocom, 2006, submitted. [4] F. P. Kelly, A. K. Malulloo, and D. K. H. Tan, “Rate control in communications networks: shadow prices, proportional fairness and stability,” Journal of the Operational Research Society, vol. 49, pp. 237– 252, 1998. [5] S. H. Low and D. E. Lapsley, “Optimization flow control – I: Basic algorithm and convergence,” IEEE/ACM Transactions on Networking, vol. 7, no. 6, pp. 861–874, 1999. [6] L. Xiao, M. Johansson, and S. Boyd, “Simultaneous routing and resource allocation in wireless networks,” IEEE Transactions on Communications, vol. 52, no. 7, pp. 1136–1144, July 2004. [7] M. Chiang, “Balancing transport and physical layers in wireless multihop networks: jointly optimal congestion control and power control,” Selected Areas in Communications, IEEE Journal on, vol. 23, no. 1, pp. 104–116, January 2005. [8] X. Lin and N. Shroff, “The impact of imperfect scheduling on crosslayer rate control in multihop wireless networks,” in IEEE INFOCOM, 2005. [9] K. J. Arrow and L. Hurwicz, Essays in Economics and Econometrics, 1960, ch. Decentralization and computation in resource allocation, pp. 34–104. [10] R. Srikant, The Mathematics of Internet Congestion Control. Birkh¨auser, 2004. [11] J. Wang, L. Li, S. H. Low, and J. C. Doyle, “Cross-layer optimization in tcp/ip networks,” IEEE/ACM Trans. on Networking, vol. 3, no. 13, pp. 582 – 595, 2005. [12] D. Palomar and M. Chiang, “On alternative decompositions and distributed algorithms for network utility problems,” in IEEE Globecom, 2005, submitted. [13] L. Lasdon, Optimization Theory for Large Systems. Macmillan, 1970. [14] O. E. Flippo and A. H. G. Rinnooy Kan, “Decomposition in general mathematical programming,” Mathematical Programming, vol. 60, pp. 361–382, 1993. [15] K. Holmberg, “Primal and dual decomposition as orgazitational design: price and/or resource directive decomposition,” Department of Mathematics, Link¨oping Institute of Technology,” LiTH-MAT-R-94-03, December 1996. [16] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004. [17] D. Bertsekas, A. Nedi´c, and A. Ozdaglar, Convex Analysis and Optimization. Athena Scientific, 2003. [18] D. Bertsekas, Nonlinear programming. Athena Scientific, 1999. [19] M. Johansson and L. Xiao, “Cross-layer optimization of wireless networks using nonlinear column generation,” Department of Signals, Sensors and Systems, KTH, Stockholm, Sweden, Technical Report IRS3-REG -0302, November 2003. [20] T. Larsson, M. Patriksson, and A.-B. Str¨omberg, “Ergodic, primal convergence in dual subgradient schemes for convex programming,” Mathematical Programming, vol. 86, pp. 283–312, 1999. [21] B. Obel, “A note on mixed procedures for decomposing linear programming problems,” Mathematicsche Operationsforschung und Statistik, Series Optimization, vol. 9, pp. 537–544, 1978. [22] T. J. V. Roy, “Cross decomposition for mixed integer linear programming,” Mathematical Programming, vol. 25, pp. 46–63, 1983. [23] K. Holmberg and K. Kiwiel, “Mean value cross decomposition for nonlinear convex problems,” Department of Mathematics, Link¨oping Institute of Technology, Research Report LiTH-MAT-R-2003-10, 2003. [24] K. J. Arrow, L. Hurwicz, and H. Uzawa, Studies in Linear and Nonlinear Programming. Stanford University Press, 1958.

14

[25] A. S. Nemirovski and D. B. Judin, “Cesari convergence of the gradient method of approximating saddle points of convex-concave functions,” Soviet Math. Dokl., vol. 19, no. 2, pp. 482–486, 1978. [26] S. H. Low, “A duality model of TCP and Queue Management Algorithms,” IEEE/ACM Trans. on Networking, vol. 4, no. 11, pp. 525–536, August 2003. [27] Y. C. Ho, L. Servi, and R. Suri, “A class of center-free resource allocation algorithms,” Large Scale Systems, vol. 1, pp. 51–62, 1980. [28] L. Xiao and S. Boyd, “Fast distributed algorithms for optimal redistribution,” Journal of Optimization Theory and Applications, 2003, submitted. [29] C. Antal, J. Moln´ar, S. Moln´ar, and G. Szab´o, “Performance study of distributed channel allocation techniques for a fast circuit switched network,” Computer Communications, vol. 21, no. 17, pp. 1597–1609, 1998. [30] N. Bambos, S. C. Chen, and G. Pottie, “Channel access algorithms with active link protection for wireless communication networks with power control,” IEEE/ACM Transaction on Networking, vol. 8, no. 5, pp. 583– 597, October 2000. [31] G. Foschini and Z. Miljanic, “A simple distributed autonomous power control algorithm and its convergence,” IEEE Transactions on Vehicular Technology, vol. 42, no. 4, pp. 641–646, November 1993.

IR-S3-RT-0501