An Optimal Distributed Trigger Counting Algorithm for

8 downloads 0 Views 556KB Size Report
Distributed trigger counting, Distributed algorithm, Complex adaptive systems, ... Office: +82-2-2220-2382, FAX: +82-2-2220-1886, Email: [email protected]. 1 ...... 512. Number of Messages. Number of Nodes. CoinRand. TreeFill.
An Optimal Distributed Trigger Counting Algorithm for Large-scale Networked Systems Seokhyun Kim1 , Jaeheung Lee1 , Yongsu Park∗ 2 , and Yookun Cho1 1

School of Computer Science & Engineering, Seoul National University, Seoul, Korea. 2 Division of Computer Science & Engineering, Hanyang University, Seoul, Korea.

Received date: 31, May, 2012. Revised date: 9, Feb, 2013. Accepted date: 12, Feb, 2013.

Abstract Distributed trigger counting (DTC) is a problem related to the detection of w triggers with n nodes in large-scale distributed systems that have general characteristics of complex adaptive systems. The triggers come from an external source, and no a priori information about the triggers is given. DTC algorithms can be used for distributed monitoring and global snapshots. When designing an efficient DTC algorithm, the following goals should be considered: minimizing the overall message complexity and distributing the loads for detecting triggers among nodes. In this paper, we propose a randomized algorithm called TreeFill, which satisfies the message complexity of O(n log(w/n)) with high probability. The maximum number of received messages to detect w triggers in each node is O(log(w/n)) with high probability. These results satisfy the lower bounds of DTC problems. We prove the upper bounds of TreeFill. The performance of TreeFill is also evaluated by means of an agent-based simulation using NetLogo. The simulation results show that TreeFill uses about 54 ∼ 69% of the messages used in a previous work called CoinRand. The maximum number of received messages in each node of TreeFill is also smaller than that in the previous work. Keyword Distributed trigger counting, Distributed algorithm, Complex adaptive systems, Multi-agent systems, Randomized algorithm, Data aggregation, Distributed systems ∗ Corresponding author: Yongsu Park, Ph.D. Associate Professor, Division of Computer Science and Engineering, College of Engineering, Hanyang University. Address: 801 IT/BT Center, 222 Wangsimni-ro, Seongdong-gu, Seoul 133-791, Korea. Office: +82-2-2220-2382, FAX: +82-2-2220-1886, Email: [email protected]

1

1

Introduction

Complex adaptive systems (CAS) are networks of large numbers of heterogeneous agents, which behave based on relatively simple rules. Regardless of the simplicity of the rules relating to the agents of CAS, the collective behavior patterns of CAS are very complex due to nonlinear interactions of agents [1]. Great numbers of agents act autonomously and small changes of some agents can initiate successive changes of other agents, and as a result a big change of entire system can occur by the small changes of some agents. Currently, it is known difficult to predict holistic behaviors of CAS [2]. Agent-based modeling (ABM) is a useful tool for the modeling and simulation of the complex behaviors of CAS. In ABM, each agent acts individually by local rules and interacts nonlinearly with other agents. Using ABM, we can analyze and track the complex behaviors of CAS. ABM is a general simulation paradigm suitable for the modeling of CAS, and is also used for the simulations of various scientific areas such as sociology, biology, ecology, and etc. In this paper, we propose an efficient distributed monitoring algorithm for large-scale networked systems that have the properties of CAS. Wireless sensor networks and peer-to-peer systems are typical examples of such kind of largescale networked systems. From the perspective of algorithm design, our algorithm is a monitoring algorithm for multi-agent systems (MAS), although our algorithm can be used for real-world distributed systems having the properties of CAS. Generally, agents of MAS cooperate for the common goal of a system [3, 4, 5]. Therefore, various algorithms are designed and implemented in MAS. In general, various properties of CAS are shown through ABM; the general goal of CAS studies is obtaining some insights on CAS. Thus, our algorithm is a monitoring algorithm for large-scale MAS having the properties of CAS. So hereafter, when we use the term CAS, we are referring large-scale distributed systems that have the properties of CAS such as wireless sensor networks and peer-to-peer systems. In general, a monitoring algorithm for CAS will have to satisfy following requirements: • A monitoring algorithm for CAS should well represent some recent aspects of CAS. • A monitoring algorithm for CAS should impose a minimal overhead on CAS. • The influence of a monitoring algorithm on CAS should be minimized. • Self-organizing property and nested systems. The behavior patterns of CAS are determined adaptively by the nonlinear interactions of agents. When a monitoring algorithm imposes much overhead on an agent, the behavior of the agent, e.g. response time to a message from other agent, can be changed subtly. Small changes of individual agents can initiate a big difference in CAS. Thus, the additional jobs imposed on a single agent should be sufficiently small not to change the behavior patterns of each agent. For similar reasons, monitoring overheads should not be concentrated

2

at a certain agent; the concentrated overheads on the agent will change the behavior pattern of the agent and it can influence the entire system. A monitoring algorithm for CAS should have self-organizing property. Since, there is little or no central control in CAS, monitoring task is done by the nonlinear interactions of agents. Agents send and receive monitoring information autonomously and the adaptive behavior of each agent contributes to the monitoring task. To obtain global information on entire system, some information aggregation among agents is needed. It is generally accepted that CAS are nested systems [6]; CAS are composed of the systems that are systems of smaller systems. We can use this nested structure of CAS for designing an efficient monitoring algorithm. Agents autonomously send monitoring information to other agents, and using the nested structure of CAS, global information can be aggregated through the communication among the agents. Distributed monitoring techniques are widely used to monitor an environmental world or the internal states of distributed systems, such as grids, clusters, wireless sensor networks and peer-to-peer systems [7, 8, 9, 10, 11]. Generally, the main objectives of wireless sensor networks are environmental monitoring, surveillance and tracking. In grids and clusters, system monitoring to acquire a global view of an entire system is important for the management of the system. One example is Ganglia, widely used system state monitoring software for grids and clusters [12]. Distributed monitoring is one of the core functionality of distributed systems. Generally, the algorithms used for distributed monitoring are centralized processing, gossip-based aggregation, and tree aggregation [7, 10]. In centralized processing, a central node gathers status data from all other nodes. Obviously, monitoring overhead is concentrated at the central node; thus, centralized processing has a scalability problem for large-scale distributed systems. If the entire system is highly dynamic and the membership of participating nodes is continuously changing, it is difficult to construct a static data aggregation network. In such a case, gossip-based aggregation is a good choice for estimating some global properties of distributed systems. If a static data aggregation network can be constructed, tree-aggregation works efficiently; it needs only O(n) messages to aggregate system-wide information. When tree-aggregation is used, data aggregation typically occurs periodically. The periodic tree-aggregation technique is sufficient for some applications. We also can use periodic tree-aggregation technique for CAS; it needs only O(n) messages for one aggregation and if we choose sufficiently long aggregationperiod, the overhead on CAS can be reduced sufficiently so that the influence of monitoring algorithm on CAS can be ignored. However, when the timely tracking of the global view of a distributed system is important, periodic data aggregation has some drawbacks. Below are examples of scenarios in which timely tracking is important: • Raising an alarm when 100 enemy soldiers are passing through a strategic area. • Monitoring the numbers of users using cloud services and, if users are concentrated on one particular service, allocating more computing nodes to that service.

3

• An operator of traffic management center wants to be notified when the number of cars passing through an intersection exceeds a predefined threshold. When periodic tree-aggregation is used for distributed monitoring, the dataaggregation period should be short in order to obtain timely monitoring information. However, a short period incurs much data-aggregation overhead. If a period is sufficiently long, the data-aggregation overhead becomes moderate, but the gathered data can be somewhat dated. Distributed trigger counting (DTC) algorithms are useful when an alarm should be raised rapidly for predefined w triggers that are detected by n nodes. Consider the military surveillance scenario mentioned above. In the battlefield, some distributed sensor nodes monitor the movements of enemy soldiers. When 100 soldiers are detected in certain area, an alarm should be raised. In such a case, an alarm should be raised once 100 enemy soldiers are detected by the distributed sensor nodes. DTC algorithms can be used in such cases. Using DTC algorithms for CAS monitoring, a user can obtain the desired accuracy of monitoring result by adjusting the number of detecting triggers w, while reducing system overheads. By repeating the process to detect w triggers, a user can be notified whenever w triggers are detected by the agents of CAS and can observe the trend of the entire system. Another application of DTC algorithms is for a global snapshot of a largescale distributed system [13, 14, 15]. When conventional global snapshot algorithms are applied to large-scale distributed systems, high communication costs can arise [13, 16, 17]. The main reason for this is that the costs of recording channel states in conventional global snapshot algorithms are typically O(n2 ), where n is the number of nodes [18, 19, 20]. With DTC algorithms, the cost for channel-state recording can be greatly reduced in large-scale distributed systems [13]. Suppose a distributed system with n nodes. From external sources, w triggers arrive at the n nodes. The DTC problem is to raise an alarm when n nodes detect w triggers. No statistical information about the triggers is given to the system in advance. Generally, w satisfies w  n, because if w ≤ n, the problem becomes rather trivial; the n nodes simply forward the received triggers and, the total number of messages to detect w triggers then becomes O(n). The communication channels among the n nodes are assumed to be a complete graph. To compare the performance of DTC algorithms, the following parameters should be considered: • Message complexity: the total number of exchanged messages among the nodes to detect w triggers. • Maximum number of received messages (MaxRcv): the maximum number of received messages in each node. In a distributed system, the number of messages to detect triggers should be small. Especially in wireless sensor networks1 , the number of communication 1 Generally, the topology of wireless sensor networks is not a complete graph. However, if sensor nodes are clustered based on their transmission radius, the nodes in a cluster can communicate each other. Then, in a cluster of the sensor nodes, our DTC algorithm can be used. We have a plan to extend our algorithm for clusters of sensor nodes.

4

messages is proportional to the energy consumption of the sensor nodes. Thus, an algorithm of low message complexity is better for distributed systems such as wireless sensor networks. MaxRcv is the metric that shows whether a DTC algorithm has good scalability. MaxRcv means the maximum message-receiving load among the nodes when DTC algorithms are used. If MaxRcv is low, it means that the message overhead is distributed well among all nodes. Thus, a DTC algorithm of low MaxRcv will show good scalability for a large number of nodes. In this paper, we propose a DTC algorithm, TreeFill, which satisfies the lower-bound message complexity of DTC problems with high probability. The lower bound of DTC problems was shown by Garg et al. [13]. In our algorithm, the message complexity is O(n log(w/n)) with high probability and MaxRcv is O(log(w/n)) with high probability (unless otherwise stated, the base of the logarithm is 2 in this paper). Garg et al. show that the lower bound of the message complexity of the DTC problem is Ω(n log(w/n)) [13]. Thus, the lower bound of MaxRcv becomes Ω(log(w/n)) [15]. In the literature, the best results for message complexity and for MaxRcv are O(n(log n + log w)) and O(log n + log w), respectively [15]. The remainder of this paper is organized as follows. Section 2 summarizes previous works. In Section 3, the TreeFill algorithm is described. The performances of TreeFill are shown in Section 4. The simulation results for TreeFill are shown in Section 5, and the paper is concluded in Section 7.

2

Related Works

The DTC algorithms in the literature are summarized in Section 2.1. DTC algorithms can be used for global snapshots of large-scale distributed systems. Several global snapshot algorithms for large-scale distributed systems are introduced in Section 2.2.

2.1

Distributed Trigger Counting Problem

Garg et al. proposed three algorithms for the DTC problem and presented a method to hybrid those algorithms. The algorithms proposed by Garg et al. are grid-based, tree-based, and centralized algorithms [13]. They also proved the lower bound for the DTC problem [13]. The message complexity refers to the total number of messages used in the DTC algorithms. Generally, the size of the messages used in DTC algorithms is O(1). The message complexity of the centralized algorithm is O(n(log w/n)), and it is optimal. However, in the centralized algorithm, one centralized node collects all of the messages. Thus, the MaxRcv of the centralized algorithm is quite high. The tree-based algorithm operates based on rounds, and in each round of the tree-based algorithm, a binary tree is used to detect triggers. In the beginning of a round, each node in the binary tree is given tokens that will be consumed by the triggers. The role of the binary tree is to match the tokens and the triggers. When all of the tokens are consumed by triggers, the round of the tree-based algorithm is completed. The message complexity and MaxRcv of tree-based algorithm are both O(n log n log(w/n)).

5

Chakaravarthy et al. proposed the LayeredRand algorithm, whose message complexity is O(n log n log w) and whose MaxRcv is O(log n log w) [14]. In LayeredRand, the nodes in the system form a binary tree-like structure. The difference compared to a normal binary tree is that the node of LayeredRand has no predefined parent and child nodes. When a node in LayeredRand receives a predefined threshold of messages, it sends another message to one node, which is selected from among the nodes of the upper layer uniformly at random. By means of this randomized scheme, MaxRcv is greatly reduced in LayeredRand. Emek and Korman proposed DTC algorithms for a more generalized environment [21]. They assume that a tree network and nodes can only communicate with their neighbor nodes. The algorithms of Emek and Korman are referred to CompTreeRand and CompTreeDet as in the previous work of Chakaravarthy et al. [15]. The message complexity of CompTreeRand is O(n log w(log log n)2 ) and its MaxRcv is not bounded. The message complexity of CompTreeDet is O(n(log w log n)2 ), and its MaxRcv is O((log w log n)2 ). Chakaravarthy et al. also proposed an improved DTC algorithm that shows the best results among previous works [15]. They proposed the CoinRand and RingRand algorithms. The message complexity of CoinRand is O(n(log w + log n)) and its MaxRcv is O(log w + log n). It also uses the binary tree-like structure used in their previous work, LayeredRand. In CoinRand, the randomized algorithm to aggregate messages in the network is more enhanced than in LayeredRand. The message complexity of RingRand is O(n log n log w) and its MaxRcv is O(log n log w). The merit of RingRand is that MaxSnd is guaranteed, where MaxSnd is the maximum number of message sent from each node. No previous DTC algorithms except CompTreeDet guarantee MaxSnd.

2.2

Global Snapshots in Large-scale Distributed Systems

DTC algorithms can easily be applied to the global snapshots of large-scale distributed systems. Garg et al. defined a distributed message counting problem when they designed global snapshot algorithms for large-scale distributed systems. The definition of a distributed message counting problem is identical to that of the DTC problem except for some of the terminology. Recent large-scale distributed systems use thousands of computing units. Global snapshots of a distributed system are the set of processor states and channel states [18]. Conventional global snapshot algorithms require typically O(n2 ) messages to take a global snapshot [18, 19, 20]. If the processors of a distributed system are interconnected by a spanning tree, their processor states can be recorded with O(n) messages in O(log n) time units. However, the number of messages to record channel states is O(n2 ), and this is expensive for a distributed systems containing a large number of computing units. DTC algorithms can be applied to the channel-state recording of global snapshots [13]. With DTC algorithms, the messages for recording channel states can be reduced significantly. However, if a DTC algorithm is applied to a global snapshot, the lower bound of the worst-case response time is Ω(n log(w/n)). Kshemkalyani proposed a hypercube-based algorithm whose response time is O(log n) [16]. The number of messages used in the hypercube-based algorithm is O(n log n). However, the size of the messages in the hypercube-based algorithm is O(n), whereas the size of the messages in DTC algorithms is O(1). Recently, Tsai generalized the hypercube-based algorithm to general grid interconnection 6

networks and proved the lower bounds of message complexity for global snapshot algorithms using general grid interconnection networks [17].

3

An Optimal Algorithm for DTC Problem

3.1

Algorithm Description

We assume that the total number of nodes in the system is n and that the number of triggers to be detected is w. The number of triggers is far greater than the number of nodes; i.e., w  n. We also assume that the nodes form a complete graph in which all nodes can communicate with each other. Our algorithm, TreeFill, operates based on rounds. In each round, TreeFill detects half of the triggers that are not detected until the round. For example, in the first round, TreeFill detects w/2 triggers; it detects w/4 triggers in the second round, and so forth. Let w ˆi be the number of the triggers that are not detected until the ith round. Then,P we can obtain w ˆi easily as follows: i−1 w ˆi = w − j=1 w/2j = w/2i−1 (i = 1, 2, · · · ). In each round, a node generates one DETECT message when it receives w ˆi /2n triggers and reduces its number of received triggers by w ˆi /2n. When each node generates a message after it receives w ˆi /2n triggers, it can be shown that at least n messages should be generated in each round regardless of the distribution of triggers. The following theorem shows this: Theorem 1. When w ˆi /2n > 1 and when each node generates a message after it receives w ˆi /2n triggers, n nodes generate at least n messages for an arbitrary distribution of triggers in the ith round. Proof. In the ith round, each node can receive w ˆi /2n − 1 triggers without generating a message. Thus, when the minimum number of messages is generated, the entire node should have received w ˆi /2n−1 triggers. In this case, the number of triggers that have contributed to the generation of messages is as follows: w ˆi − n(w ˆi /2n − 1) = w ˆi /2 + n. Thus, the number of generated messages in ith round becomes: b(w ˆi /2 + n)/(w ˆi /2n)c = bn + 2n2 /w ˆi c ≥ n.

As shown in Theorem 1, when i satisfies w ˆi /2n > 1, at least n DETECT messages are generated by n nodes in the ith rounds. TreeFill detects these n messages using a binary tree, which is referred to as DetectTree hereafter. The number of detected triggers in the ith round is n · w ˆi /2n = w ˆi /2 = w/2i . When w ˆi /2n > 1, the DTC problem is reduced to detect n messages in a distributed manner. When w ˆi /2n ≤ 1, all of the remaining triggers can also be detected using DetectTree. The details pertaining to this case are shown in Section 3.2. DetectTree is a binary tree. We assume that n = 2h+1 for some integer h; however, our algorithm can be extended to general cases easily.

7

The depth of a node m in DetectTree is the length of the path from the root to m. The set of all nodes at a given depth is called a layer of the tree in this paper. Thus, the root node is in layer-0 and the number of nodes in layer-d is 2d . is PhThei height of DetectTree is h. Thus, the number of nodes in DetectTree 2+1 2 = n − 1. Figure 1 shows an example of DetectTree when n = 2 = 8. i=0

Figure 1: Example DetectTree when n = 22+1 = 8. In each round, n nodes generate n DETECT messages and send them to the leaf nodes of DetectTree uniformly at random. The number of nodes in the leaf layer of DetectTree is 2h = 2h+1 /2 = n/2. Figure 2 shows the algorithm for generating DETECT messages and sending the messages to the leaf nodes. When the first round begins, each node m does: m.trgs ← 0. In ith round, each node m does: When m receives one trigger: m.trgs ← m.trgs + 1. If m.trgs ≥ w ˆi /2n then Select a node f among the leaf-nodes of DetectTree uniformly at random. Send DETECT to f . m.trgs ← m.trgs − w ˆi /2n. Figure 2: Receiving triggers and generating DETECT messages in each node. Each leaf node f of DetectTree manages f.dts, which is the number of DETECT messages received in leaf node f . When a leaf node receives 2 DETECT messages, it sends a FULL message to its parent node. Each inner node r of DetectTree manages an r.f ull[1..2] array. Initially, all of the entries of r.f ull[] are f alse. When the ith child node of r sends a FULL message to r, r.f ull[i] is changed to true. When r has received 2 FULL messages, all of the entries of r.f ull[1..2] become true. Then, r sends a FULL message to its parent node. The DETECT messages are sent to the leaf nodes of DetectTree uniformly at random. Thus, some leaf nodes receive less than 2 DETECT messages and other leaf nodes receive more than 2 DETECT messages. Suppose that a leaf node f receives 2 + x DETECT messages. We refer to these x messages as excessive DETECT messages or simply excessive messages. We refer to a leaf node receiving less than 2 DETECT messages as a poor node. 8

The strategy to detect n DETECT messages in DetectTree involves moving excessive messages to poor nodes. After n DETECT messages have arrived at DetectTree and all of the excessive messages are moved to poor nodes, every leaf node has 2 DETECT messages; there is no poor node. As mentioned above, the leaf node having 2 DETECT messages sends one FULL message to its parent node. When an inner node r receives 2 FULL messages, r sends one FULL message to its parent node. Therefore, when all of the leaf nodes have 2 DETECT messages, the root node eventually receives 2 FULL messages. Then, the root node recognizes that (n/2) · 2 = n DETECT messages have arrived at DetectTree. To move excessive messages to poor nodes, we utilize the f ull array in each inner node. When an excessive message is received in a leaf node, the excessive message is sent to a node of the upper layer uniformly at random. In an inner node r, if all the entries of r.f ull[] are true, there are no poor nodes in the subtree of r. Thus, the excessive message is sent to a node of the upper layer again uniformly at random. In an inner node r, if the ith entry of r.f ull[] is f alse, it means that there is at least one poor node in the ith subtree of r. Thus, r sends the excessive message to its ith child node. A similar procedure is repeated in the child node and the excessive message is delivered to a poor node eventually. Figure 3 shows the algorithm for moving excessive message and detecting n DETECT messages in the root node of DetectTree.

3.2

Detecting Triggers in the Last Round

In the ith round, when each node m has received w ˆi /2n = w/2i n triggers, m sends a DETECT message to DetectTree. Thus, when w/2i n ≤ 1, every node should generate a DETECT message for every receiving trigger. Let ie be the first round that satisfies w/2i n ≤ 1, then ie = dlog(w/n)e. In the ie th round, the number of as-yet-undetected triggers is w ˆie = w/2ie −1 ≤ n. Therefore, the number of DETECT messages generated in the ie th round is less than n. To use DetectTree to detect the remaining triggers in the ie th round, we introduce dummy DETECT messages. When the ie th round begins, the following number of dummy DETECT messages are distributed among the leaf nodes of DetectTree: n−w ˆie = n − w/2ie −1 . Thus, the summation of all the remaining triggers and the dummy DETECT messages is n, and DetectTree can detect these n messages. Therefore, in the ie th round, the number of triggers to be detected is dw ˆie /2ne · n = 1 · n = n. As w ˆie ≤ 2n, there can be remaining triggers after the ie th round ends. Thus, we need one more round to detect all of the remaining triggers. We also need dummy triggers in the (ie +1)th round. Consequently, the last round of TreeFill is dlog(w/n)e + 1.

9

When each round begins, each inner node r does: All the entries of r.f ull[1..2] are set to f alse. A leaf-node f of DetectTree does: When f receives DETECT : f.dts ← f.dts + 1. When f.dts firstly reaches to 2: Send FULL to the parent of f . If f.dts > 2 then Forward DETECT to a node of upper layer uniformly at random. f.dts ← f.dts − 1. An inner-node r of DetectTree does: When r receives FULL from r.child[i] : r.f ull[i] ← true. If all the entries of r.f ull[] are true then Send FULL to the parent of r. When r receives DETECT : Find i such that r.f ull[i] = f alse. If such i does not exist then Forward DETECT to a node of upper layer uniformly at random. Else Forward DETECT to r.child[i]. The root-node r of DetectTree does: Do the procedure for an inner-node. If all the entries of r.f ull[] are true then Complete to detect w ˆi /2 triggers. Start the next round. Figure 3: The algorithm for moving excessive messages and detecting n DETECT messages.

4

Analysis

4.1

The Message Complexity of TreeFill

In this section, it will be shown that the number of messages to detect n DETECT messages using DetectTree per round is O(n) with high probability. As shown in Section 3.2, the total number of rounds in TreeFill is O(log(w/n)). Thus, the total message complexity of TreeFill is O(n log(w/n)). In each round of TreeFill, at least n DETECT messages are sent to the leaf nodes of DetectTree uniformly at random. To simplify the analysis, we use the following assumptions: • Each DETECT message arrives at DetectTree one by one. • Until an excessive message is forwarded to a poor node, the next DETECT message does not arrive at DetectTree. 10

With these assumptions, when n DETECT messages come to DetectTree, the root node can detect them and start the next round. In the rounds of TreeFill, FULL messages are used to detect the arrival of n DETECT messages. Some DETECT messages are forwarded among the nodes of DetectTree, as shown in Section 3.1. Let N umF wd be the total number of forwarded DETECT messages and N umF ull be the total number of FULL messages in a round of TreeFill. Then, the total number of messages used in a round is N umF wd + N umF ull. N umF ull can be easily evaluated. Every node of DetectTree, except for the root node, generates one FULL message. Thus, N umF ull is determined as follows: N umF ull = 21 + · · · + 2h = 2h+1 − 2 = n − 2 = O(n). The following theorem shows the probability that N umF wd is less than 4n: Theorem 2. The lower bound of the probability that N umF wd ≤ 4n is P r(N umF wd ≤ 4n) > 1 − e−n+2 . Proof. The detailed proof is shown in Appendix A. Thus, the message complexity for N umF wd is O(n) with high probability (greater than 1 − e−n+2 ). Consider the worst case of N umF wd. Let N umF wdworst denote the worst case of N umF wd. In the worst case, DET ECT messages are sent to one node among the leaf nodes of DetectTree. Then, N umF wdworst is as follows: 0·2 + |{z}

a node

=

h X

2 · (4 − 2) | {z }

a tree of height 1

+4 · (8 − 4) + · · · + 2h · (2h+1 − 2h ) {z } |

2i · (2i+1 − 2i ) =

i=0

entire tree

h X

i2i+1 = 4(2h h − 2h + 1)

i=0

= O(n log n) P r(N umF wdworst ) is the probability that n DETECT messages are sent to one node among the n/2 leaf nodes of DetectTree:  n  n−1 2 2 P r(N umF wdworst ) = (n/2) · = . n n Thus, P r(N umF wdworst ) is negligible probability. In a round of TreeFill, N umF ull is O(n) and N umF wd is O(n) with high probability. Thus, the total number of messages used in a round of TreeFill is O(n) with high probability, and the message complexity of TreeFill becomes O(n log w/n) with high probability.

4.2

The Maximum Number of Received Messages in a Node

In this section, it will be shown that the maximum number of received messages in each node, MaxRcv, is O(log(w/n)) with high probability. A node receives messages from its parent node and its child layer (the layer in which its child nodes reside). We define the numbers of these messages as follows: 11

• RcvU pi : In each round of TreeFill, the number of received messages in a node of layer-i from its parent node. • RcvDni : In each round of TreeFill, the number of received messages in a node of layer-i from layer-(i + 1), where 0 ≤ i < h. Let N umRcvi be the number of received messages in a node of layer-i. Then, we can represent N umRcv as follows: N umRcvi = RcvU pi + RcvDni . Let N umRcv Rnd be the maximum number of received messages in a round of TreeFill. Then, N umRcv Rnd can be represented with N umRcvi as follows: M axRcv Rnd = max0≤i≤h (N umRcvi ). The number of rounds in TreeFill is O(log(w/n)) as shown in Section 3.2. Then, MaxRcv can be represented as follows: MaxRcv ≤ O(log(w/n)) · M axRcv Rnd . The following theorem shows that M axRcv Rnd is O(1) with high probability: Theorem 3. For δ < 2e − 1, P r(M axRcv Rnd ≤ 12δ) ≥ 1 − e−δ

2

/2

.

Proof. The detailed proof is shown in Appendix B. Thus, TreeFill satisfies the optimal MaxRcv, O(log(w/n)) with high probability. The proposed DTC algorithm, TreeFill, satisfies the optimal message complexity and MaxRcv with high probability. Table 1 summarizes the performances of the DTC algorithms. Algorithm

Message MaxRcv Complexity Centralized [13] O(n log(w/n)) – Tree-based [13] O(n log n log(w/n)) O(n log n log(w/n)) LayeredRand [14] O(n log n log w) O(log n log w) CompTreeRand [21] O(n log w(log log n)2 ) – CompTreeDet [21] O(n(log w log n)2 ) O((log w log n)2 ) CoinRand [14] O(n(log w + log n)) O(log w + log n) RingRand [14] O(n log n log w) O(log n log w) TreeFill O(n log(w/n)) O(log(w/n)) (The algorithms of [21] are the bounds for arbitrary networks) Table 1: Comparison of DTC algorithms.

5

Simulation Results

In this section, we show the simulation results for the proposed algorithm TreeFill. We used NetLogo [22] for the simulation. NetLogo is a widely used 12

agent-based simulation environment that can be used for a wide range of topics, such as epidemic protocols, fractals, and topics in the social sciences [22]. In NetLogo, the Logo programming language is used for modeling. The script codes for our simulation can be obtained from the author’s homepage.2 In NetLogo, simulations are run with discrete time steps called ticks. In the simulation of TreeFill, a trigger is generated at each tick of the simulation. Each node is represented as an agent in the simulation. Each node (or agent) executes the algorithms in Figures 2 and 3 at each tick. We assume that a message sent by a node arrives at the destination node instantaneously. The effects of message delay and loss will be handled in future work. TreeFill uses a binary tree of nodes for trigger detection, as explained in Section 3. However, we wrote our simulation code to use TreeFill algorithm in general k-ary tree. The simulation result for various k is shown in Figure 10. In the setup procedure of the simulation, a k-ary tree of nodes is constructed. The number of nodes n is defined as k L , where k and L can be selected by the user. In Figure 4, the T reeF ill simulation when n = 33 and w = 40000 is shown.

Figure 4: TreeFill simulation using NetLogo when n = 33 and w = 40000. We also implement the CoinRand algorithm of Chakaravarthy et al. using NetLogo for a comparison with TreeFill [14]. Except for TreeFill and the centralized algorithm of Garg et al., CoinRand showed the best message complexity among the DTC algorithms [13, 14].3 For the comparison with TreeFill, the algorithm of CoinRand is briefly explained here. In CoinRand, the number of nodes is assumed to be n = 2L , where L ≥ 1. In CoinRand, (L + 1) layers of nodes are used to detect triggers. The number of nodes in layer-k is 2k . The node in layer-0 is termed the root node. In layer-L, there are 2L = n nodes. Thus, the nodes of CoinRand have 2 http://ssrnet.snu.ac.kr/∼shkim/treefill/ 3 In the centralized algorithm, message load is concentrated at the central coordinating node, whereas the message load is distributed evenly in TreeFill and CoinRand.

13

dual roles as leaf nodes and inner nodes. All n nodes are contained in layer-L while (n − 1) nodes are also used as inner nodes from layer-0 to layer-(L − 1). CoinRand also runs based on rounds, as does TreeFill, to detect w triggers. In a round of CoinRand, a node generates a coin when it receives w/(4n) ˆ triggers, where w ˆ is the number of still undetected triggers at the beginning of the round. A generated coin is sent to a node of the upper layer uniformly at random. When an inner node receives a coin for the first time, it simply holds the coin. When an inner node receives a new coin after it has received a coin, the newly received coin is sent to a node of the upper layer uniformly at random. When the root node of CoinRand receives a coin, the root broadcasts a trigger-aggregation request through the binary tree of nodes. After the total number of triggers received in a round is aggregated, the root node updates w, ˆ calculates a new threshold of coin generation, and starts the new round. Details about CoinRand are shown in the literature [15]. We compare the number of messages for detecting 40000 triggers in TreeFill and CoinRand. We set the numbers of nodes in the simulations as 2i , where 5 ≤ i ≤ 9. For each number of nodes, the simulations are repeated 10 times and the average value is obtained for comparison. Figure 5 shows the number of messages used in TreeFill and CoinRand. 35000

CoinRand TreeFill 6 n log(w/n)

Number of Messages

30000 25000 20000 15000 10000 5000

32

64

128 Number of Nodes

256

512

Figure 5: The numbers of messages used in TreeFill and CoinRand when n = 2i (5 ≤ i ≤ 9). The numbers of messages used in TreeFill are about 54 ∼ 69% of those in CoinRand. In CoinRand, when the root node determines that it is time to finish the current round, the accurate number of detected triggers is aggregated using the binary tree of CoinRand. Trigger-aggregation through the binary tree of CoinRand requires O(n) messages. On the other hand, TreeFill does not require a trigger-aggregation process at the end of each round. When the root node of 14

TreeFill determines that it is time to finish the current round, it is certain that n DETECT messages are received by the nodes of TreeFill, as shown in Section 3.1. In the ith round, a node of TreeFill sends a DETECT message when it receives w ˆi /2n triggers, where w ˆi is the number of as-yet-undetected triggers when the ith round starts. Thus, when the root of TreeFill determines that it is time to end the ith round, it is certain that n·(w ˆi /2n) = w ˆi /2 triggers have been detected by the nodes. The root of TreeFill calculates w ˆi+1 = w ˆi − w ˆi /2 = w ˆi /2 and the next round starts with the new threshold for generating DETECT messages, w ˆi+1 /2n. In CoinRand, the probability that the root node receives a coin before w/16 ˆ triggers are generated is at most 1/2, where w ˆ is the number of as-yet-undetected triggers at the beginning of a round [15]. In other words, the probability that only w/16 ˆ triggers are detected in a round of CoinRand is not negligible. On the other hands, in any round of TreeFill, w/2 ˆ triggers are detected. Thus, TreeFill uses fewer rounds than CoinRand. Figure 6 shows the numbers of rounds to detect triggers in TreeFill and CoinRand. Simulations are repeated 10 times for each number of nodes and the average, minimum, and maximum numbers of rounds are obtained. The vertical bars in Figure 6 denote the minimum and maximum numbers of rounds in CoinRand. In TreeFill, the same number of rounds is required for the same numbers of triggers and nodes, as TreeFill always guarantee that n(w/2n) ˆ = w/2 ˆ triggers have been detected at the end of any round. 35

CoinRand TreeFill 1+ceil(log(w/n))

Number of Rounds

30

25

20

15

10

5 32

64

128 Number of Nodes

256

512

Figure 6: The number of rounds to detect triggers in TreeFill and CoinRand. In Figure 6, when n ≥ 128, the numbers of rounds for TreeFill are slightly higher than 1 + dlog(w/n)e, which is the number of rounds for TreeFill shown in Section 3.2. At the ith round of TreeFill, each node sends a DETECT message when it detects w ˆi /2n triggers, where w ˆi is the number of as-yet-undetected 15

triggers in ith round. In the simulation code, each node sends a DETECT message when it detects max(bw ˆi /2nc, 1) triggers. We omited this for the simple explanation of our algorithm. In the result of Figure 6, the numbers of rounds are slightly increased, because bw ˆi /2nc ≤ w ˆi /2n and the numbers of detected triggers in each round are slightly decreased. Essentially, TreeFill uses more messages than CoinRand in each round, as the number of forwarded DETECT messages is greater than the number of coins in CoinRand. Figure 7 shows the number of messages per round used in TreeFill and CoinRand.

Number of Messages / Round

2500

CoinRand TreeFill

2000

1500

1000

500

0 32

64

128 Number of Nodes

256

512

Figure 7: The number of messages per round used in TreeFill and CoinRand when n = 2i (5 ≤ i ≤ 9). Despite the fact that number of messages per round is greater in TreeFill, the overall message complexity is lower in TreeFill, as there are fewer rounds in TreeFill and because no trigger-aggregations are needed at the ends of the rounds in TreeFill. Figure 8 compares the MaxRcv values of TreeFill and CoinRand, where MaxRcv denotes the maximum number of received messages in the nodes during the trigger detections. We set the number of triggers w to 40000 and the numbers of nodes are 2i (5 ≤ i ≤ 9) in the simulations. The vertical bars in Figure 8 represent the minimum and maximum MaxRcv values during the simulations. The simulation results show that the nodes of TreeFill receive fewer messages than those of CoinRand. The ranges between minimum and maximum MaxRcv are also narrow in TreeFill. Thus, the message receiving overhead is less in TreeFill and the message receiving overhead is distributed more evenly in TreeFill than in CoinRand. We also compare the numbers of messages to detect w triggers with 128 nodes in TreeFill and CoinRand, where w ∈ {104 , 2 · 104 , 4 · 104 , 8 · 104 }. For 16

60 CoinRand TreeFill 55

MaxRcv

50

45

40

35

30 32

64

128 Number of Nodes

256

512

Figure 8: The MaxRcv of TreeFill-2 and CoinRand. each number of triggers, the simulations are repeated 10 times and the average value is obtained for comparison. Figure 9 shows the number of messages used in TreeFill and CoinRand. The graph of TreeFill is similar to the graph of 6n log(w/n), except the case when w is 80000. Generally, in the last round of TreeFill, the number of remaining triggers w ˆ satisfies w ˆ ≤ n. When w ˆ < n, dummy DETECT messages are generated and an additional round is needed to complete trigger detection. When w = 80000 and n = 128, w ˆ becomes n = 128 in the last round and no additional round is needed. Thus, in this case, the number of message is less than the expected number of messages. TreeFill algorithm can be easily extended for general k-ary tree. We will call the TreeFill algorithm using k-ary tree (k > 2) as TreeFill-k. We measure the numbers of messages to detect 40000 triggers with TreeFill with 2i nodes (5 ≤ i ≤ 9), TreeFill-3 with 3j nodes (3 ≤ j ≤ 6), TreeFill-4 with 4k nodes (2 ≤ k ≤ 5) and TreeFill-5 with 2l nodes (2 ≤ l ≤ 4). Each simulation is repeated 5 times and the average values are shown in Figure 10. The simulation result of Figure 10 shows that TreeFill-5 uses fewer messages to detect the same number of triggers. As the number of nodes increases, the gaps between TreeFill-5 and the others also increase. Thus, for applications that require fewer messages, TreeFill-k with a larger k will be more appropriate. However, when TreeFill-k with a larger k is used, MaxRcv increases. For example, if k is n−1, TreeFill-k becomes a centralized algorithm and the messages used in TreeFill are centralized to the root node. Therefore, when an application uses TreeFill-k, the parameter k should be determined after considering both the message complexity and MaxRcv. The performance bounds for TreeFill-k need more analyze, and we leave this as a future work. 17

11000

CoinRand TreeFill 6 n log(w/n)

Number of Messages

10000 9000 8000 7000 6000 5000 4000 10000

20000 40000 Number of Triggers (w)

80000

Figure 9: The numbers of messages used in TreeFill-2 and CoinRand when 10000 ≤ w ≤ 80000 with 128 nodes.

6

Discussions

We suggested the requirements for the monitoring algorithms of CAS in Section 1. In this section, we discuss that TreeFill satisfies the requirements and has the properties of CAS. In general, ABM is used for the modeling and simulation of CAS. Using ABM, we can explore how individual behaviors of agents make overall behavior patterns of CAS. The monitoring algorithm for CAS enables to look at the recent trends of entire system. In large-scale networked systems, such as wireless sensor networks and peer-to-peer systems, we can design some useful system management algorithms using the monitoring information. TreeFill algorithm takes advantage of the nested system structures, which are general in CAS. Each node in TreeFill sends local monitoring information to other nodes by predefined local rules. When w/2i triggers are detected by all the nodes in the ith round, the root node of TreeFill is notified about it. The root node of TreeFill has no central control; it is just notified when w/2i triggers are detected by all the nodes. The local monitoring information of nodes is sent to the nodes of upper layers and aggregated. When the root node is notified the detection of w/2i triggers in the ith round, this information is disseminated recursively among the nodes, and all the nodes change its rules for local monitoring. Thus, the nodes of TreeFill act adaptively by the global information shared by all the nodes. The agents of CAS interacts in random ways, thus the spatial and temporal distribution of monitoring triggers can change dynamically. TreeFill has two self-organizing properties to cope with the dynamic change of trigger dis-

18

30000

TreeFill TreeFill-3 TreeFill-4 TreeFill-5 6 n log(w/n) 3 n log(w/n)

Number of Messages

25000

20000

15000

10000

5000

0 16

32

64

128 256 Number of Nodes

512

1024

Figure 10: The number of messages to detect 40000 triggers in TreeFill-2, TreeFill-3, TreeFill-4 and TreeFill-5. tribution. The nodes of TreeFill are organized as layers. The local monitoring information is sent to upper layers uniformly at random. Thus, the spatial distributions of triggers are adaptively changed to uniform distribution by the individual message-sending rule of the nodes. DTC algorithms have merit in dealing with the change of temporal trigger distribution. When many triggers are generated, a DTC algorithm raises alarms frequently, and when triggers are generated occasionally, a DTC algorithm will not raise an alarm until w triggers are detected by all the nodes. Thus, DTC algorithms can achieve desired monitoring accuracy by adjusting w, irrespective of temporal trigger distribution. TreeFill is a DTC algorithm and has the same merit to deal with the dynamically changing temporal trigger distribution. A monitoring algorithm for CAS should impose low overheads on CAS, because excessive monitoring overheads can influence the results of simulations. TreeFill satisfies optimal message complexity for DTC problems in high probability. Therefore, when DTC algorithms are used for the monitoring task of CAS, TreeFill will impose low overheads on CAS. However, for various purposes, we may explore other ways to monitor CAS having different characteristics in future research. When TreeFill algorithm is implemented in real networked systems, some problems can happen by message delay or loss. If some messages are delayed, the time for an alarm also can be delayed and the monitoring information can be somewhat dated. When some messages are lost, TreeFill cannot raise an alarm, even though w triggers are detected by all the nodes. Thus, when TreeFill is implemented in a real system, some mechanisms to manage the message loss problem will be needed. 19

7

Conclusion and Future Works

In this paper, we presented TreeFill algorithm for the distributed trigger counting (DTC) problem. DTC algorithms can be useful for distributed monitoring or for scalable global snapshots. Garg et al. showed that the lower-bound message complexity of the DTC problem is Ω(n log(w/n)), where n is the number of nodes and w is the number of triggers to be detected by the nodes. The message complexity of TreeFill is O(n log(w/n)) with high probability; thus, TreeFill is an optimal DTC algorithm with high probability. The maximum number of received messages in each node (MaxRcv) is also an important performance metric for DTC problems. The lower-bound message complexity implies that the lower-bound MaxRcv of the DTC problem is Ω(log(w/n)). TreeFill also satisfies this; the MaxRcv of TreeFill is O(log(w/n)) with high probability. Thus, TreeFill satisfies the lower bounds of DTC problem with high probability. CoinRand by Chakaravarthy et al. shows the best performance among the previous DTC algorithms. We compared the performances of CoinRand and TreeFill in agent-based simulations using NetLogo. We compare the numbers of messages to detect 40000 triggers using TreeFill and CoinRand. The simulation results show that the numbers of messages used by TreeFill are 54 ∼ 69% of those by CoinRand. The MaxRcv of TreeFill is also smaller than that of CoinRand. We also compare the number of messages to detect 40000 triggers using TreeFill, TreeFill-3, TreeFill-4 and TreeFill-5, where TreeFill-k is extended version of TreeFill that uses k-ary tree. These simulation results show that TreeFill-k with a larger k uses fewer messages to detect the same number of triggers. Thus, for applications that require fewer number of messages to detect triggers, TreeFill-k with a larger k will be more appropriate. Our model and simulations have some limitations. We assumed that triggers are generated one by one. To handle a case in which triggers are generated at a very high rate, our model should be extended. We also did not consider message delay and message loss. We will deal with the effects of message delay and message loss for DTC algorithms in future research. We have plans to apply our algorithm to real systems. DTC algorithms can be used in wireless sensor networks using clusters. In each sensor node cluster, a DTC algorithm can be applied, and it will be an interesting issue to determine whether DTC algorithms can reduce the message complexity in environmental monitoring applications. We also plan to use DTC algorithms in cloud monitoring systems.

Acknowledgement This work was supported by the IT R&D program of MKE/KEIT (10043896, Development of virtual memory system on multi-server and application software to provide realtime processing of exponential transaction and high availability service), by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MEST) (No. 2012R1A1A2007263), and by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2009-0069740).

20

References [1] Melanie Mitchell. Complexity: a guided tour. Oxford University Press, USA, 2009. [2] John H Miller and Scott E Page. Complex adaptive systems: An introduction to computational models of social life. Princeton University Press, 2007. [3] Jacques Ferber. Multi-agent systems: an introduction to distributed artificial intelligence, volume 33. Addison-Wesley Reading, MA, 1999. [4] Claudia V Goldman and Shlomo Zilberstein. Optimizing information exchange in cooperative multi-agent systems. In Proceedings of the second international joint conference on Autonomous agents and multiagent systems, pages 137–144. ACM. [5] Reza Olfati-Saber, J Alex Fax, and Richard M Murray. Consensus and cooperation in networked multi-agent systems. Proceedings of the IEEE, 95(1):215–233, 2007. [6] Marijn Janssen and George Kuk. A complex adaptive system perspective of enterprise architecture in electronic government. In System Sciences, 2006. HICSS’06. Proceedings of the 39th Annual Hawaii International Conference on, volume 4, pages 71b–71b. IEEE. [7] Matthew L Massie, Brent N Chun, and David E Culler. The ganglia distributed monitoring system: design, implementation, and experience. Parallel Computing, 30(7):817–840, 2004. [8] Wensheng Zhang and Guohong Cao. Dctc: dynamic convoy tree-based collaboration for target tracking in sensor networks. Wireless Communications, IEEE Transactions on, 3(5):1689–1701, 2004. [9] KyoungSoo Park and Vivek S Pai. Comon: a mostly-scalable monitoring system for planetlab. ACM SIGOPS Operating Systems Review, 40(1): 65–74, 2006. [10] Laukik Chitnis, Alin Dobra, and Sanjay Ranka. Aggregation methods for large-scale sensor networks. ACM Transactions on Sensor Networks (TOSN), 4(2):9, 2008. [11] Changlei Liu and Guohong Cao. Distributed monitoring and aggregation in wireless sensor networks. In INFOCOM, 2010 Proceedings IEEE, pages 1–9. IEEE. [12] Ganglia Monitoring System. http://ganglia.sourceforge.net. [13] Rahul Garg, Vijay K Garg, and Yogish Sabharwal. Efficient algorithms for global snapshots in large distributed systems. Parallel and Distributed Systems, IEEE Transactions on, 21(5):620–630, 2010. [14] Venkatesan Chakaravarthy, Anamitra Choudhury, Vijay Garg, and Yogish Sabharwal. An efficient decentralized algorithm for the distributed trigger counting problem. Distributed Computing and Networking, pages 53–64, 2011. 21

[15] Venkatesan T Chakaravarthy, Anamitra R Choudhury, and Yogish Sabharwal. Improved algorithms for the distributed trigger counting problem. In Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International, pages 515–523. IEEE. [16] Ajay D Kshemkalyani. Fast and message-efficient global snapshot algorithms for large-scale distributed systems. Parallel and Distributed Systems, IEEE Transactions on, 21(9):1281–1289, 2010. [17] Jichiang Tsai. Flexible symmetrical global-snapshot algorithms for largescale distributed systems. Parallel and Distributed Systems, IEEE Transactions on, 24(3):493–505, 2013. [18] K Mani Chandy and Leslie Lamport. Distributed snapshots: determining global states of distributed systems. ACM Transactions on Computer Systems (TOCS), 3(1):63–75, 1985. [19] Ten H Lai and Tao H Yang. On distributed snapshots. Information Processing Letters, 25(3):153–158, 1987. [20] Friedemann Mattern. Efficient algorithms for distributed snapshots and global virtual time approximation. Journal of Parallel and Distributed Computing, 18(4), 1993. [21] Yuval Emek and Amos Korman. Efficient threshold detection in a distributed environment. In Proceeding of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing, pages 183–191. ACM. [22] NetLogo. http://ccl.northwestern.edu/netlogo/.

Appendix A In this section, the proof of Theorem 2 is shown. For the proof, we need the following definitions. Definition 1. Xi is defined as the random variable that represents the number of DETECT messages sent from a node of layer-i to layer-(i−1), where 0 < i ≤ h. Definition 2. Y is defined as the random variable that represents the number of DETECT messages received in a leaf node of DetectTree, while n DETECT messages arrive at DetectTree. Definition 3. Ri is defined as the random variable that represents the total number of DETECT messages received by all the nodes in layer-i, where 0 ≤ i < h. As the number of nodes in the layer-i of DetectTree is 2i , Ri−1 and Xi has following relation: 2i X Ri−1 = Xi = 2i Xi . (1) i=1

22

The random variable Xh is the number of excessive messages in a leaf node of DetectTree. Thus, Xh and Y have following relation: Xh = Y − 2.

(2)

By the algorithm of TreeFill, a DETECT message sent from layer-(i − 1) to layer-i should be forwarded again from layer-i to layer-(i − 1), during the procedure to find a poor node. From this observation, we can obtain N umF wd from Ri as follows: N umF wd = 2 · (Rh−1 + Rh−2 + · · · + R0 ) = 2

h−1 X

Ri .

(3)

i=0

Thus, if we can obtain Ri , N umF wd is also obtained by Equation 3. The random variable Y follows binomial distribution B(n, 2/n). At each round of TreeFill, at least n DETECT messages are sent to the leaf nodes of DetectTree uniformly at random. The number of leaf nodes of DetectTree is 2h = 2h+1 /2 = n/2. Thus, the probability that a leaf node of DetectTree receives one DETECT message is 1/(n/2) = 2/n. Thus, Y follows binomial distribution B(n, 2/n). The expectation and variance of Y are E[Y ] = n(2/n) = 2 and V ar[Y ] = n(2/n)(1 − 2/n) = 2(1 − 2/n). Now, we can represent Rh−1 using Y from Equation 1 and 2: Rh−1 = 2h Xh = 2h (Y − 2) = 2h Y − n.

(4)

We can also obtain the relation between Ri−1 and Ri . To obtain the relation, we need following definition: Definition 4. Let md is an inner node of layer-d. SubTree(md ) is defined as the subtree whose root node is md . NumRcv(md ) is defined as the number of total DETECT messages received in SubTree(md ). The number of leaf nodes in SubTree(md ) is 2h /2d = 2h−d . Thus, NumRcv(md ) can be represented with Y as follows: N umRcv(md ) = 2h−d Y. Then, the following theorem holds: Theorem 4. In a round of TreeFill, an inner node of layer-d, md forwards DETECT messages to its upper layer, when N umRcv(md ) ≥ 2 · 2h−d . The probability of this case is 1/2. Proof. By the algorithm of TreeFill, md does not forward DETECT messages while N umRcv(md ) < 2 · 2h−d . (In this case, at least one poor node is existing in SubTree(md )). P r(N umRcv(md ) ≥ 2 · 2h−d ) is as follows: P r(N umRcv(md ) ≥ 2 · 2h−d ) = P r(2h−d Y ≥ 2 · 2h−d ) = P r(Y ≥ 2). Y follows binomial distribution B(n, 2/n). When enough number of nodes are used, B(n, 2/n) is approximated well with normal distribution N (2, 2(1 − 2/n)). Thus, P r(Y ≥ 2) ≈ P r(N (2, 2(1 − 2/n)) ≥ 2) = 1/2. 23

By Theorem 4, an inner node of DetectTree forwards its received DETECT messages to its upper layer with the probability of 1/2. Ri is the total number of DETECT messages received in layer-i. Therefore, the following relation holds: Ri−1 =

1 Ri , (0 < i < h). 2

(5)

Now, we can obtain N umF wd using Equation 3, 4, and 5 as follows: N umF wd = 2

h−1 X

Ri = 2

h−1 X

i=0

2−i Rh−1

i=0

1 − 2−h 1 − 2−1 −h h = 4(1 − 2 )(2 Y − n) = 2(2h Y − n) ·

= 2n(1 − 2−h )Y − 4n(1 − 2−h ) = 2n(1 − 2−h )Y − E[2n(1 − 2−h )Y ]. Let µ = E[2n(1 − 2−h )Y ] = 4n(1 − 2−h ). Then, using Chernoff inequality, we can obtain P r(N umF wd > δµ) for δ ≤ 2e − 1 as follows: P r(N umF wd > δµ) = P r(2n(1 − 2−h )Y − µ > δµ) = P r(2n(1 − 2−h )Y > (1 + δ)µ) < e−µδ

2

/4

. 2

Thus, 1 − P r(N umF wd > δµ) = P r(N umF wd ≤ δµ) > 1 − e−µδ /4 . As µ = 4n(1 − 2−h ) ≤ 4n, P r(N umF wd ≤ δµ) < P r(N umF wd ≤ 4δn). When δ = 1, we can obtain the lower bound of P r(N umF wd ≤ 4n) as follows: P r(N umF wd ≤ 4n) > P r(N umF wd ≤ µ) > 1 − e−µ/4 = 1 − e−n+2 . Therefore, P r(N umF wd ≤ 4n) > 1 − e−n+2 and the proof for Theorem 2 is completed.

Appendix B In this section, the proof of Theorem 3 is shown. We defined Ri as the random variable, which represents the total number of excessive messages received by all the nodes in layer-i in Definition 3. Thus, RcvDni = Ri /2i , because the number of nodes in layer-i is 2i . We also defined Y as the random variable, which represents the number of received DETECT messages received in a leaf node of DetectTree. Using Y , Equation 4, and Equation 5, RcvDni can be represented as follows: RcvDni = Ri /2i = 2−(h−1−i) Rh−1 /2i = 2−(h−1) (2h Y − n) = 2(Y − 2). 24

An inner node of layer-i, mi forwards the excessive messages received from layer-(i+1) to the child nodes of mi . When mi forwards an excessive message to one of its child nodes, the probabilities that the child nodes receive the excessive message are same, because all the DETECT messages arrive at the leaf nodes of DetectTree uniformly at random. When an inner node receives a excessive message from its parent, the excessive message is also forwarded one of its child nodes with equal probability. Thus, we can represent RcvU pi with RcvDni as follows: RcvU pi = 2−1 RcvDni−1 + 2−2 RcvDni−2 + · · · + 2−i RcvDn0 =

i X

2−j RcvDni−j = 2(Y − 2)

j=1

i X

2−j

j=1

≤ 4(Y − 2). Thus, N umRcvi is represented as follows: N umRcvi = RcvU pi + RcvDni ≤ 6(Y − 2). N umRcvi is less than 6(Y − 2) regardless of i as shown above. Thus, M axRcv Rnd ≤ 6(Y − 2). For δ ≤ 2e − 1, we can obtain P r(M axRcv Rnd ≤ 12δ) using Chernoff inequality as follows: P r(M axRcv Rnd ≤ 12δ) = 1 − P r(M axRcv Rnd > 12δ) ≥ 1 − P r(6(Y − 2) > 12δ) = 1 − P r(Y − 2 > 2δ) = 1 − e−δ

2

/2

.

Thus, M axRcv Rnd ≤ 12δ for δ < 2e − 1 with the probability greater than 2 1 − e−δ /2 , and the proof for Theorem 3 is completed.

Author biographies Seokhyun Kim received his BS degree in 2001 and MS degree in 2008 from Seoul National University. He also received a PhD degree in computer science and engineering from Seoul National University in 2013. His research interests include distributed systems, multi-agent systems, complex systems, peer-to-peer computing, operating systems, cloud computing and system security. Jaeheung Lee received his BS degree in 2001 and MS degree in 2003 from Seoul National University. He also received a PhD degree in computer science and engineering from Seoul National University in 2013. From January 2003 to February 2007, he worked as an engineer for Samsung Electronics in Korea. His research interests include cryptography, privacy, network security, operating systems and sensor networks. Yongsu Park is currently an associate professor in the Division of Computer Science and Engineering at Hanyang University, Seoul, Korea. His main research interests include computer system security, network security, and cryptography. Yookun Cho has been with the School of Computer Science and Engineering, Seoul National University since 1979, where he is currently a professor. 25

He was president of the Korea Information Science Society during 2001. His research interests include operating systems, algorithms, system security, and fault-tolerant computing systems.

26