Development of a protocol booster for TCP over a ...

A heterogeneity based clustering heuristic for mobile ad hoc networks Benoit Latré, Jeroen Hoebeke*, Liesbeth Peters*, Tom Van Leeuwen, Ingrid Moerman, Bart Dhoedt, Piet Demeester Department of Information Technology (INTEC), Ghent University - IMEC Sint Pietersnieuwstraat 41, B-9000 Gent, Belgium {benoit.latre, jeroen.hoebeke, tom.vanleeuwen, liesbeth.peters, ingrid.moerman, bart.dhoedt, piet.demeester}@intec.UGent.be Tel: +32-(0)9264 9970 Fax: +32-(0)9264 9960

Abstract—An ad hoc network is an autonomous system of heterogeneous, mobile nodes that communicate with each other over wireless links. Routing protocols for these networks are inherently based on broadcasting control information and are therefore very bandwidth consuming. In order to limit the amount of routing information that has to be stored and maintained by the individual nodes, the technique of clustering is used. The network is partitioned into non-overlapping sub networks, referred to as clusters, and one cluster node, the clusterhead, will take a leading role in the dissemination of control information. In this paper we approach the problem of finding an optimal partition that explicitly takes into account the heterogeneity of the network, as an integer linear programming (ILP) problem. In a second phase we have developed a new heuristic that approximates our ILP solution that is used in our clustering algorithm. It is shown that this heuristic tends to be more stable than existing clustering techniques that are solely based on ID number and/or connectivity and that do not take into account the heterogeneity of the network.

Keywords—Ad heterogeneity.

hoc

networks,

clustering,

routing,

ILP,

I. INTRODUCTION A mobile ad hoc network (MANET) is an autonomous network that consists of mobile nodes that communicate with each other over wireless links with a limited bandwidth [1]. In the absence of a fixed infrastructure, nodes have to cooperate in order to provide the necessary network functionality. Routing is one of the primary functions each node has to perform, in order to make connections possible between nodes that are not directly within each others send range. Developing efficient routing protocols is a non trivial task because of the specific characteristics of a MANET environment. First of all, since nodes are free to move arbitrarily, the network topology may change randomly and rapidly at unpredicted times and an efficient routing protocol should be able to react to these topology changes. Secondly, the available bandwidth is limited and can vary due to fading, noise and interference, so the protocol’s amount of control information should be limited. Finally, the nodes that form the network can be very heterogeneous, ranging from small battery powered devices with limited processing capacity to full fledged computers connected to the power net and with ample processing power. Consequently, routing in ad hoc networks is a challenging task and much research has been done in this field [2][3]. The

most commonly known routing protocols are the proactive and reactive ones. Proactive routing protocols attempt to have at all times an up-to-date route from each node to every possible destination. This requires the continous propagation of control information throughout the entire network. Reactive protocols only setup routes when needed by broadcasting a route request within the network (route discovery) and keep them up-to-date as long as needed (route maintenance). A common characteristic of both types of protocols is that they are flat, i.e. all nodes play an equal role, and they both rely on the broadcasting of control information throughout the entire network for their routing functionality. In the presence of bandwidth limited links and resource limited nodes such an approach can be very bandwidth consuming and limits the scalability of the protocols. Therefore, in order to reduce the overhead for the update of the routing tables and to improve the scalability, hierarchical routing protocols were introduced. Hierarchical routing is based on the idea of organizing the network into different groups of nodes and assigning different functionalities to nodes inside and outside a group. These protocols aim to reduce the routing overhead by maintaining only partial routing information at each node or by exchanging routing information only between a number of elected nodes. As a result the size of the routing tables and update packets, and thus the control overhead, can be reduced. One example of hierarchical protocols are those that make use of clustering [4]. In the next section, the principle of clustering is explained and the shortcomings of a frequently used clustering technique are discussed. In section III, we present our approach to tackle these problems through the solution of an ILP problem. Section IV discusses our new clustering heuristic and section V proves the performance of this approach. Finally, conclusions are made in section VI. II. CLUSTERING One of the most popular ways of building a hierarchy in the network is to group nodes geographically close to each other into non overlapping sub networks, referred to as clusters. Each cluster has a leading node, called the clusterhead, and a number of cluster members. The clusterhead plays a leading role in the dissemination of control information by grouping routing information from within its cluster and communicating this information to other clusterheads on behalf of the cluster [5].

Research Assistant of the Fund for Scientific Research - Flanders (F.W.O.-V., Belgium)

There are numerous ways to divide the network into clusters, but finding an optimal clustering can be difficult, especially in mobile and heterogeneous networks. Typically, a clusterhead is more burdened than its members and could easily become a bottleneck of the system if not appropriately chosen. Also, during the lifetime of the network, the clustering can change because of topological changes, which incurs an additional overhead. In order to limit this overhead, it is important to have a certain level of stability in the network, i.e. we do not want to continuously switch clusterheads. An example of a clustering algorithm is the k-hop lowest ID clustering algorithm [6]. It is assumed that all nodes have an ID and are aware of their k-hop neighbors (i.e. all neighbors at distance of at most k hops). Also, each node in the network broadcasts its clustering decision only once. The algorithm is initiated by having all nodes with the lowest ID of their k-hop neighborhood broadcast their decision to create clusters with them as clusterhead to their k-hop neighbors. If a node hears such decisions (one or more), it will join the cluster of the clusterhead with the lowest ID and will make this decision known to its k-hop neighbors. On the other hand, if all k-hop neighbors with a lower ID have sent their decision and none declared itself as clusterhead, the node decides to create its own cluster and become a clusterhead. So each node will broadcast its decision after all its k-hop neighbors with a lower ID and this results in non overlapping clusters, each having one clusterhead. This algorithm uses the ID as the basis of the clustering process. A variant of this algorithm uses the highest k-hop connectivity (i.e. the number of k-hop neighbors) instead of the lowest ID approach in order not to have more clusters than necessary. Because this algorithm does not work well when multiple nodes have the same connectivity, the k-hop connectivity ID clustering algorithm (CONID) was introduced [7]. This algorithm is based on the same principles, but uses the node connectivity as the primary key and the ID as the secondary key in cluster decisions. Fig. 1 shows a simple example of the clusters produced by 1-hop CONID clustering.

this clusterhead will result in a reduced protocol stability and an additional maintenance overhead. The previous examples make clear that it is beneficial to consider the heterogeneity of the network in the process of clusterhead election. Therefore in our approach we explicitly take into account this heterogeneity by including characteristics such as the memory and battery capacity, the traffic load and the mobility of the nodes in the process of the cluster formation. Our solution consists of 2 phases. In the first phase, the problem of finding appropriate clusters is formulated as an ILP problem. This is described in the next section. In the second phase, a new heuristic is developed in order to approximate the optimal, but computational infeasible, solution of our ILP formulation. This will be the subject of section IV. III. ILP SOLUTION A. Routing and clustering assumptions In the remainder of the text we use the following assumptions and definitions concerning the clustering: • A cluster is a collection of nodes. Each node is part of only one cluster and each cluster has only one clusterhead. As a consequence, each node has only one clusterhead. • A cluster member is a cluster node that is not a clusterhead. • The k-neighborhood of a node n is the set of all nodes located at a maximum distance of k hops from node n, node n itself included. Because we intend to use our clustering algorithm as a basis for a new routing protocol, we make the following assumptions about the routing protocol: • Each clusterhead collects local link state information from its cluster members. • Each clusterhead exchanges this information with other clusterheads. • Each clusterhead computes and stores routes for its members (e.g. using Dijkstra’s shortest path). This is the proactive component of the protocol we assume.

Figure 1.

1-hop CONID clustering

The previous discussion shows that existing clustering algorithms choose clusterheads in a rather arbitrary manner, without taking into account the specific characteristics of the nodes involved in the clustering process. For instance, when only ID or connectivity is used, nodes that are heavily loaded could become clusterhead, which would place an additional burden on these nodes. Also, nodes whose energy is almost drained, could be elected as clusterhead. This would incur additional cluster maintenace overhead if this node suddenly leaves the network because its battery power is exhausted. When lowest ID clustering is used, a highly mobile node with a low ID could become the leader of a cluster. The movement of

• Each member gets a route when needed from its clusterhead. This part of the protocol can be considered as reactive. B. Network representation and notations A wireless ad hoc network can be modeled as a directed graph G=(V,E), in which V is the set of nodes and E the set of connections between nodes. A unidirectional link exists between 2 nodes n and n’ if n’ is within the send range of n and n within the receive range of n’. In the remainder, the following notations are used: • N = |V| = number of nodes in the network;

• dist(n,n’) = number of hops between n and n’ (0 if n = n’, ∞ if there is no route from n to n’);

clusterhead and its members and the available bandwidth of the clusterhead.

• (n,n’) = path from n to n’;

2) ILP constraints a) General constraints: • We assume each node is aware of its direct neighbors:

• (n,n’) ∈ (m,m’) if the path from n to n’ is part of the path from m to m’;

y ( s, s, d ) = 1 ∀s, ∀d with l ( s, d ) = 1 • If a clusterhead n provides a member s of its cluster with a route to another node d, n should also compute and store this route. This means that if the z variable is equal to 1 for a triplet (n,s,d), the corresponding y variable should also be equal to 1. Or, mathematically

• l(n,n’) = 1 if there is a 1-hop link from n to n’, 0 otherwise; • O(n) = cost for a cluster member n for asking his leader routes. This cost depends on the number of times routes are asked, but is taken constant here; • A(n) = set of nodes within the receive range of n;

z (n, s, d ) ≤ y(n, s, d ) ∀n, s, d • A clusterhead stores a complete route from a source (that is member of its cluster) to a destination. This means that the clusterhead automatically stores all subroutes of this route or mathematically:

• k(n,n’,n’’) = 1 if n lies on the route from n’ to n’’, 0 otherwise; • memory(n): memory class node n belongs to. Each node has a limited amount of memory, which can be used for route computation and storage; • battery(n): battery class node n belongs to. Each clusterhead should possess sufficient battery power in order to collect and disseminate control information; • capacity(n): fraction of the total bandwidth that is available for routing (i.e. for control information); • load(n): network load experienced by node n, expressed as a fraction of the available bandwidth.

⎧∀s, d , n with d ≠ s y (n, s, d ) ≤ y (n, x1 , x 2 ) ⎨ ⎩ ∀( x1 , x 2 ) ∈ ( s, d ) b) Clustering constraints: • We assumed that each node has only one clusterhead. This means that if a clusterhead n provides a node s with a route to a destination, it should provide s with routes to all other destinations, otherwise it would be possible for a node to have two clusterheads. ⎧ ∀n, s, d1 , d 2 with z (n, s, d1 ) = z (n, s, d 2 ) ⎨ ⎩d 1 ≠ d 2 , s ≠ d 1 , s ≠ d 2 As a result, a clusterhead either stores all routes for a node, which is then a member of its cluster, either stores no routes for a node, which then has to be a member of another cluster.

C. The ILP formulation The ILP formulation is used to determine the optimal partitioning of the network at one instance in time. This means that the solution is calculated for a static network and that mobility is not taken into account in the ILP problem. However, mobility is introduced in the development of the heuristic.

• We also assume that a node s asks only one node n, its clusterhead, for a route to a destination d or

1) Variables Two types of binary variables are introduced. For each type there exist N3- N2 variables that need to be solved. ⎧1, if node n stores the path from s to d , s ≠ d y (n, s, d ) = ⎨ 0, otherwise ⎩ ⎧1, if s gets a route to d from clusterhead n, s ≠ d z ( n, s , d ) = ⎨ 0, otherwise ⎩ The first variable type (the y variables) determines which nodes are able to store paths between nodes. This type is related to the proactive part of the routing protocol we assume, namely the computation and storage of routes for other nodes. These variables are only concerned with the ability of nodes to compute and store routes and their value will be influenced by the available memory and battery capacity. The second variable type (the z variables) determines the actual clustering and is related to the reactive part of the routing protocol we assume, namely the clusterhead that provides its cluster members with routes in an on demand manner. The value of these variables will be mainly influenced by the distance between the

∑ z(n, s, d ) = 1

∀s, d with s ≠ d

∀n

• According to our definition of clusters and clusterheads, a clusterhead is always member of its own cluster, which means that a node that is elected as clusterhead always should store its own routes or z (n, s, d ) ≤ z (n, n, d ) ∀n, s, d with s ≠ n, d ≠ n, d ≠ s

c)

Memory and battery constraints: A node that stores routes should have enough memory capacity, which leads to the following constraint:

∑ y(n, s, d ) l ( s, d ) M ≤ memory (n)

∀n ∈ V

s , d ∈V

We assume a fixed memory capacity M is needed for each link that needs to be stored. The battery constrained is expressed as follows:

∑ y(n, s, d ) B ≤ battery (n)

∀n ∈ V

s , d ∈V

In this case, for each path that needs to be stored a fixed battery cost B is charged, because the storage and calculation of each path requires battery power. d) Bandwidth constraints: With this constraint, we take into account the overhead generated by clusterheads for providing their cluster members with routes. We do not take into account the data traffic, because this will strongly depend on the routing protocol. The available node bandwidth is influenced in 4 ways: • A cluster member asks his clusterhead a route • A node hears the information sent by a clusterhead within its receive range • A node can act as a router for forwarding control information • A node can hear information forwarded by a neighboring non-clusterhead node These 4 aspects are collected in the following equation:

∑

z ( n, s , d ) O ( s ) +

s , d ∈V , s ≠ d

∑ ∑ z ( n' , s, d ) O ( s ) +

n '∈ A ( n ) s , d ∈V , s ≠ d

∑ ∑ z (n' ' , s, d ) k (n, n' ' , s) O(s) +

n ''∈V s , d ∈V , s ≠ d

∑ ∑ ∑ z ( n' ' , s , d ) k ( n' , n' ' , s ) O ( s )

n '∈ A ( n ) n ''∈V s , d ∈V , s ≠ d

≤ capacity(n) ∀n ∈V The total bandwidth used for routing may not exceed the capacity available for the control information. 3) The cost function The solution of the ILP problem can be found by minimizing the following cost function

∑ ∑ y(n, s, d ) c1 (n) + ∑ ∑ z(n, s, d ) c 2 (n, s) +

s , d ∈V n∈V

s ,d ∈V n∈V

∑ ∑ z(n, n, d ) c3

n∈V d∈V

with cost functions c1(n), c2(n,s) and c3 as follows: ilp wbat

ilp wmem + c1 (n) = battery (n) memory(n) ilp dist ( n, s ) ilp + wload load ( n) c 2 (n, s ) = wdist N ilp

c3 =

wclust N −1

We will now explain the meaning of the different terms that form the cost function

• The first term calculates the cost for the computation and storage of routes and takes into account the battery and memory capacity. Weights are used to stress the importance of these capacities. • The second term is used for the actual clustering. The first part of the cost c2 expresses the importance of the fact that cluster members should be as close as possible to their clusterhead. The second part takes into account the data traffic routed by this node and should avoid that heavily loaded nodes are elected as clusterhead. • The last part of the cost function is concerned with the number of clusters in the network (z(n,n,d) = 1 if n is a clusterhead, because a clusterhead stores its own routes) Each term can have different weights assigned to it and the values of the weights depend on the physical or MAC-layer that is used or the preferences of the network users. If one wants to obtain a low energy usage in the network, one can assign a greater value to the according weight. IV. CLUSTERING HEURISTIC The optimal solution of our clustering problem can be found by minimizing the ILP cost function from the previous section. However, the optimal solution is computational too complex and requires global knowledge of the network topology and the nodes’ attributes. Hence, an appropriate distributed clustering heuristic should be developed that leads to a solution as close as possible to the optimal one. Further, in our heuristic we will also take into account the node movements by using the number of link changes as a measure of node mobility. Instead of using connectivity and ID as clustering criteria, each node is attached a weight [7] [8], which expresses its ability to become a clusterhead (the higher the weight, the better). The weights are assigned according to the following formula: h h wmem mem + wbat bat + wnrh _ of _ neighbors nr _ of _ neighbors h h + wlink _ changes (1 − link _ changes ) + wload (1 − load )

This weight takes into account the memory and battery capacity, the number of k-hop neighbors, the number of link changes and the load of a node. The number of link changes is used to determine the mobility of the node and is calculated as the sum of the number of link breaks and the number of new links formed since the last pass of the heuristic. The formula above is created to match the ILP formulation of the previous section. The same properties of the nodes can be found, e.g. we evaluate the memory and battery capacity of the nodes (mem of node n is the same as memory(n)). Also, the nr_of_neighbors matches with dist(n,s), i.e. when we want to reduce the distance between the clustermembers and the clusterhead, a node with a lot of neighbors can be a good choice. The different weights in the formula can be chosen in the same way as the ILP formulation. The clustering algorithm works in the same way as k-hop CONID, except that weights are used instead of connectivity and ID number. As an example of the heuristic we use the same

network as in figure 1 and use the properties of the nodes which are given in table 1. The values indicate the memory or battery class which the node belongs to and range from 1 to 8. A small value means low battery or memory. By choosing appropriate values for the different weights, different solutions can be obtained, as can be seen in Fig. 2. TABLE I. PROPERTIES OF THE NODES node

memory 8 6 4 7

1 2 3 4

Wbat and Wmem

battery 2 5 3 6

node

memory 6 4 6 2

5 6 7 8

Wload

6

6

7 5

1

6

Wneighbor

7

Figure 2.

6 7 4 5

8

Nodes with most neighbors

2

8

Nodes with high remaining bandwith

All 1

5 3

5 3

2

4

1

4

8

Nodes with high values for memory and battery

3

2

2

7

4

1

battery 6 2 5 1

3

8

Compromise that takes into account all parameters

Influence of the different weights

In order to have a certain level of stability in the network, i.e. we do not want to continuously switch clusterheads, we added an extra value ∆ to the weight used in our heuristic when the node was already a clusterhead. This will increase its chance to be reelected as clusterhead in case changes in topology or node properties occur. This changed heuristic will be further referred to as heur +∆. V. PERFORMANCE COMPARISON A. Simulation environment The MAC layer was abstracted away by providing a direct link between two nodes that are in each others send range. This means that there is no contention for the medium when sending data. The main purpose of this approach is to orthogonalize the advantages of our clustering algorithm and the MAC layers. The ILP problem was solved by the Cplex package, using the simplex algorithm [9]. B. ILP versus heuristic The weights of our heuristic were tuned in order to approximate the optimal ILP solution. In a next step the solutions obtained with our heuristic were put into the ILP cost function and the resulting cost was calculated. Figure 3 shows how well our heuristic approximates the optimal ILP solution.

Figure 3.

ILP versus heuristic

C. Heuristic versus k-hop CONID In this section we compare the performance of our heuristic with k-hop CONID, which was explained in section 2, with the emphasize on mobility. The networks we evaluate consists of 50, 100, 150 and 200 nodes respectively, which move according to a random waypoint mobility model [10] with a pause time of 5s for a duration of 500s (+ 100s initial warmup) and within a simulation area of 500 by 500m. The speed of the nodes vary from 0 to 5m/s if not stated otherwise. All simulations were run 5 times and the results were averaged. In a first simulation, we measured the percentage of nodes that were elected as clusterheads. Table 2 shows the results of the tests that were performed both for 1-hop and 2-hop clustering. As can be seen, there are no remarkable differences between CONID and our heuristic. There are only small differences, with CONID performing slightly better. TABLE II.

PERCENTAGE ELECTED CLUSTERHEADS 1 hop

# nodes

heur

20 21 22 21

50 100 150 200

heur +∆

20 22 22 21

2 hop CONID

18 20 20 20

heur

11 12 12 12

heur +∆

10 12 12 12

CONID

10 11 11 11

Table 3 shows the percentage of reelected clusterheads during the simulations. Due to topological changes, clusters have to change and new clusterheads are chosen, which implies an additional overhead. TABLE III.

PERCENTAGE REELECTED CLUSTERHEADS 1 hop

# nodes 50 100 150 200

heur

82 80 78 75

heur +∆

90 87 85 83

2 hop CONID

46 44 42 38

heur

78 74 71 70

heur +∆

87 85 83 80

CONID

42 37 35 31

Now we can clearly see differences between our heuristic and CONID. With our heuristic, between 75 and 80% of the clusterheads is reelected, which increases to values between 85 and 90% when an additional threshold is used, whereas with CONID only between 40 and 50% of the clusterheads are reelected. This results corresponds with [5] and is caused by

the fact that CONID does not take into account the mobility of the nodes. As a consequency, CONID will require much more overhead in a mobile network in keeping the clusters up to date. In the next simulations we investigated the impact of nodes moving with different speeds, ranging from 0 to vmax, where vmax is varied from 0 to 20m/s.

development of the heuristic is solidly based on the optimal solution provided by an ILP formulation of the clustering problem. Simulation results show that clustering that takes heterogeneity into account can be indeed beneficial, especially in reducing the overhead for maintaining the clusters during the lifetime of the network. It also provides a basis for the development of a clustering based ad hoc routing protocol. ACKNOWLEDGEMENT This research is funded by the Belgian Federal Science Policy Office through the IAP V/11 contract, by The Institute for the Promotion by Science and Technology in Flanders (IWT) through the contract No. 020152 and by the Fund for Scientific Research – Flanders (F.W.O.-V., Belgium) . REFERENCES

Figure 4. Average number of link changes of the elected clusterheads for varying speeds

Figure 4 clearly shows that when mobility is taken into account, i.e. the number of link changes, the chosen clusterheads suffer from a smaller amount of link changes and are thus less mobile. Finally, simulations were performed for networks having a fixed percentage of static nodes. Figure 5 shows the percentage of reelected clusterheads. Again, there is clear difference between our heuristic and CONID, with our heuristic performing clearly better in mobile networks. In a completely static network, performance is almost the same. However, it should be stressed that in this case, our heuristic can outperform CONID depending on the heterogeneity of the network (memory capacity, battery capacity and network load)

Figure 5. Percentage reelected clusterheads for partly static networks

VI. CONCLUSIONS Traditional clustering algorithms lack the possibility to take into account the heterogeneity of the network. In this paper we have developed a new clustering heuristic which takes into account various network and node characteristics. The

[1] S. Corson and J. Macker, “Mobile Ad hoc Networking (MANET): Routing Protocol Performance Issues and Evaluation Considerations,” RFC 2501, Jan. 1999 http://www.ietf.org/rfc/rfc2501.txt [2] X. Hong, K. Xu and M. Gerla, “Scalable Routing Protocols For Mobile Ad Hoc Networks”, IEEE Network, Vol. 16, No. 4, pp. 11 – 21, July 2002 [3] E. Royer and C. Toh, “A Review of Current Routing Protocols for AdHoc Mobile Wireless Networks”, IEEE Personal Communications, pp. 46–55, April 1999 [4] C.E. Perkins, “Ad Hoc Networking”, Chapter 4: Cluster-Based Networks, pp. 75-138, Addison Wesley, 2001 [5] M. Gerla and J.T.-C. Tsai, “Multicluster, Mobile, Multimedia Radio Network”, ACM/BaltzerJournal of Wireless Networks, Vol. 1, No. 3, pp. 255– 265,1995 [6] C.R. Lin and M. Gerla, “Adaptive Clustering for Mobile Wireless Networks” IEEE Journal on Selected Areas in Communications, Vol. 15, pp. 1265–1275, 1997 [7] F. Garcia, J. Solano and I. Stojmenovic, “Connectivity Based K-hop Clustering in Wireless Networks”, Telecommunication Systems, Vol. 22, 1-4, pp. 205-220, 2003 [8] S. Basagni, “Distributed Clustering for Ad Hoc Networks”, Proc. Int. Symp. Parallel Algorithms, Architectures and Networks (ISPAN’99), pp. 310315, Australia, 1999 [9] ILOG CPLEX, www.cplex.com [10] A. Jardosh, E. M. Belding-Royer, K. C. Almeroth, and S. Suri. "Towards Realistic Mobility Models for Mobile Ad hoc Networks." Proc. of the 9th annual int. conf. on Mobile computing and networking, San Diego, CA, pp. 217-229, September 14-19 2003