Group Communication Protocol for Large Group - CiteSeerX

Group Communication Protocol for Large Group Makoto Takizawa

Masaoki Takamura

Akihito Nakamura

Dept. of Computers and Systems Engineering Tokyo Denki University Ishizaka, Hatoyama, Hiki-gun Saitama 350-03, JAPAN e-mail

ftaki,

Abstract

Group communication protocols which have been discussed so far can be adopted for only a small group due to the processing and communication overhead. In this paper, we discuss how to provide reliable group communication for a large number of entities interconnected by a high-speed one-channel network. A group of entities is partitioned into disjoint subgroups named component clusters interconnected by gateways in order to reduce the processing time and data unit length. The communication protocol is executed in each component cluster and the gateways forward data units to other component cluster. There exists only one gateway between every two dierent component clusters in order to prevent the proliferation of data units. 1

Introduction

g

taka, naka @takilab.k.dendai.ac.jp

In distributed applications like teleconferencing and cooperative work [7], group communication [2, 3, 4, 8, 9, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22] among multiple entities is required in addition to the conventional one-to-one communication supported by OSI [10] and TCP/IP [5]. In multimedia applications like teleconferencing, more than one hundred individuals may join the conference. A group of entities is named a at cluster in this paper. [15, 16, 17, 21, 22] discuss kinds of group communication protocols which provide the receipt atomicity and some receipt ordering for the entities in the cluster . [21] presents the OP (Order-Preserving broadcast communication) protocol where each entity can receive all the protocol data units (PDUs) in the same order as they are broadcast in the cluster but may receive PDUs broadcast by dierent entities not in the same order. These protocols can be adopted to small clusters including tens of entities due to the communication and processing overhead. They cannot be adopted to large clusters which include several tens to hundreds of entities. In this paper, we would like to discuss a broadcast communication protocol for these large clusters including hundreds of entities. In group communication protocols [15, 16, 17, 21, 22], n( 2) entities in a cluster are connected by a high-speed one-channel network where each entity receives the PDUs in the same order but may fail to receive some PDUs due to the buer overruns [3].

Each entity has processing time of O(n2) to atomically receive PDUs in the cluster and the length of PDU header is O(n). The transmission speed of the high-speed network [6] like FDDI [1] is faster than the processing speed of each entity. Hence, it is critical to reduce the processing time. In this paper, we discuss how to reduce the processing time of each entity and PDU length. First, a set of entities is partitioned into so small disjoint subsets named component clusters that each entity can process each PDU in reasonable time according to the OP protocol. Since each component cluster includes less entities than the

at cluster, each entity has less processing time and each PDU header is shorter in the component cluster. Then, the component clusters are interconnected by gateways. A set of component clusters interconnected by gateways is named a complex cluster. One problem in broadcasting PDUs among component clusters is the proliferation of PDUs, i.e. duplicate PDUs are broadcast in the complex cluster. In this paper, in order to resolve the proliferation problem, we consider a complex cluster where there exists only one gateway between every two dierent component clusters. This means that each entity can send PDUs to every entity either directly or via one gateway and PDUs which would pass more than one gateway are removed by gateways. Since there is only one gateway among every two component clusters, each PDU is not duplicated while being forwarded from one component cluster to another. In this paper, we would like to present a protocol named a COP protocol (Complex cluster OP protocol) which provides the OP service for a complex cluster. In this paper, we overview the OP protocol in section 2. In section 3, a model of the complex cluster is presented. In section 4, we present a COP protocol. In section 5, we discuss how to construct a complex cluster and the performance evaluation of the COP protocol compared with the cluster. 2

Order-Preserving Protocol

Broadcast

(OP)

We would like to present brie y an OP protocol [21].

2.1

Service model

The communication system is composed of three layers, i.e. application , system , and network layers. Each application entity takes communication service through a service access point (SAP) supported by a system entity. A cluster C is a set of system SAPs fS1 , : : :, S g. Each S is supported by a system entity E (i = 1; : : :; n). C is established by the cooperation of E1, : : :, E [19, 20]. The cluster concept is extension of the conventional connection [19] among two SAPs to n ( 2) ones. Data units exchanged among peer entities of the same layer are protocol data units (PDUs ). The OP service is one where each application entity can receive all the PDUs sent at each SAP in C in the sending order, but may receive the PDUs from dierent SAPs not in the same order. The OP service for C is provided by the cooperation of E1 , : : :, E coordinated by the OP protocol by using a lessreliable one-channel service provided by the underlying network system. Here, E1 , : : :, E are referred to as support C , written as C = hE1, : : :, E i. The one-channel (1C ) service is one obtained by abstracting services provided by high-speed networks. Hence, all the entities which use the 1C service can receive the PDUs in the same order but may fail to receive some PDUs due to the buer overruns. In the highspeed network, the transmission speed of the network is higher than the processing speed. [Example] Suppose that a cluster A includes three application entities A1 , A2 , and A3 . Each A has two kinds of logs, i.e. receipt log RL and sending log SL which denote sequences of PDUs received and sent by A , respectively. Here, let < p1: : :p ] denote a log where p1 is the top, p is the last, and p precedes p if i < j . Figure 1 shows the sending and receipt logs of A1, A2, and A3 . For example, SL1 shows that A1 sends three PDUs a , b , and c , and RL1 shows that A1 receives PDUs a, p, x, b, c y in this order. As shown in Figure 1, every entity receives PDUs from each entity in the sending order. Let RL be a sublog of RL which includes PDUs from A . Here, RL = SL , e.g. RL13 = < x y ] = SL3 . This is an OP service. 2 n

i

i

n

2. If E knows that all the entities in C have accepted p , p is pre-acknowledged by E . 3. If E knows that all the entities in C have preacknowledged p , p is acknowledged by E . 2 At the second pre-acknowledgment level, some entity may not know if E has accepted p even if E knows that all the entities have accepted p . For example, E may fail to receive a PDU which carries the acceptance con rmation of p from E . The third step denotes the highest correct level, i.e. atomic receipt of PDUs among multiple entities. j

j

j

j

j

j

i

j

n

n

n

i

i

i

i

m

m

i

j

ij

Figure 2: Three-phase procedure

i

j

ij

A1 RL1: < a p x b c y ] A2 RL2: < p a b x y c ] A3 RL3: < a p b x c y ]

j

SL1 : < a b c ] SL2 : < p ] SL3 : < x y ]

Figure 1: OP service

2.3

Data transmission procedure

We would like to present the data transmission procedure of the OP protocol for C = hE1 , : : :, E i. Each PDU p broadcast by E is composed of elds as shown in Figure 3 (s = 1, : : :, n). Here, let p .F denote a eld F of p . p .SEQ is a sequence number of p , p .CID is a cluster identi er of C , and p .SRC is a source entity E . p .ACK is a sequence number of a PDU which E expects to receive next from E (k = 1, : : :, n). Each time E broadcasts a PDU, SEQ is incremented by one. p .DATA is data. n

s

s

k

s

2.2

Correct receipt concept among multiple entities

Let C be a cluster hE1 ,: : :,E i. Here, n is the cardinality of C . Suppose that E broadcasts a PDU p in C . All the entities receive p based on the following three-phase procedure [19, 20].

k

s

n

k

[Three-phase procedure]

1. If p arrives at E , p is accepted by E . j

j

CID

SRC

SEQ

ACK 1 ,: : :,ACK

n

DATA

Figure 3: OP PDU format Each E has a receipt log RL for each E , which denotes a sequence of PDUs from E (j = k

kj

j

j

1; : : :; n). RL is a concatenation of RRL , PRL , and ARL , which are subsequences of PDUs accepted , pre-acknowledged , and acknowledged , respectively. E has a sending log SL , and uses variables SEQ and REQ for E (j = 1; : : :; n). When E broadcasts p , p.SEQ := SEQ ; p .ACK := REQ (j = 1; : : :; n); SEQ := SEQ +1. On receipt of p from E , if p .SEQ = REQ , E accepts p and REQ := REQ + 1. p is stored in RRL . E has two n 2 n matrixes AL and PAL. On receipt of p from E , AL := p .ACK for i = 1; : : :; n. Let minAL be minimum of AL1 , : : :, AL , i.e. every PDU from E whose SEQ < minAL is accepted by every entity, i.e. pre-acknowledged. For each RRL , if SEQ of q (= top (RRL )) < minAL , q is moved from RRL to PRL (i = 1; : : :; n). Here, PAL := q .ACK for h = 1; : : :; n. Let minPAL be minimum of PAL1 , : : :PAL , i.e. every PDU from E whose SEQ < minPAL is pre-acknowledged in every entity, i.e. acknowledged. For PRL , if SEQ of r (= top(PRL )) < minPAL , r is moved from PRL to ARL . Since the 1C service is used, some entity may fail to receive PDUs. Suppose that E fails to receive p broadcast by E . E can nd the loss of p by checking SEQ and ACK 1 , : : :, ACK elds in PDUs which E receives. First, suppose that E broadcasts a PDU q just following p . On receipt of q , E nds the loss of p because q .SEQ REQ (= p .SEQ ). Next, suppose that some E broadcasts a PDU r after E receives p . On receipt of r , E nds the loss of p because r .ACK REQ . Then, E requires E to broadcast p again. Only the PDUs lost by E are broadcast again, i.e. selective retransmission [23] is adopted. In the OP protocol, the PDU length is O(n) since ACK 1 , : : :, ACK are included as shown in Figure 3. Since each entity manipulates n 2 n matrixes, the processing time of each PDU is O(n2). kj

kj

kj

kj

k

k

j

j

k

j

j

j

kj

k

j

j

j

k

j

ji

i

i

i

ni

i

i

ki

ki

ki

ih

i

ki

h

Suppose that E11 broadcasts a PDU p in C1 . p is accepted by E11 , E12 , E13 , and GE . GE broadcasts p in C2 . When GE knows that p is accepted in C1 and C2 by using the OP protocol, GE broadcasts p1 and p2 which carry the acceptance con rmation on p in C1 and C2 . If p1 and p2 are accepted in C1 and C2 , respectively, p is pre-acknowledged. Then, when p1 and p2 are pre-acknowledged in C1 and C2 , p is acknowledged. Each entity can execute the OP protocol as if the entity were a member of a at cluster of cardinality 4. In F , each entity executes the OP protocol for a cluster of cardinality 6. Since the processing time of each entity is O(n2 ) and PDU header length is O(n) for the cardinality n, it is clear that the entities in C have less processing time and each PDU is shorter than F . 2

h

h

nh

h

h

kh

kh

kh

h

kh

k

j

k

n

k

Figure 4: Complex cluster C

j

k

j

i

i

k

j

j

k

j

k

n

3

A complex cluster C for communication entities E1, : : :, E is de ned as hC1, : : :, C , GE 1 , : : :, GE i, where C is a component cluster (i = 1, : : :, h( 2)) and GE is a gateway entity (j = 1, : : :, g( 1)). Each C includes communication entities E 1, : : :, E i (m 1) where E 2 fE1 , : : :, E g. Every E is n

h

j

i

i

included in only one component cluster, i.e. complex clusters are disjoint. C is connected with all the other component clusters by gateways GE 1, : : :, GE (1 g g). That is, C = hE 1 , : : :, E , GE 1, : : :, GE i, where GE 2 fGE 1 , : : :, GE g. C includes totally c = m + g communication and gateway entities. c is the cardinality of C . Here, let a term entity mean a communication and gateway entity. Let E + denote GE (j = 1, : : :, g ). For each entity E , j is the entity number in C . h is the degree of C . In this paper, we assume that all the entities are connected by the 1C network. In this paper, we consider a complex cluster which satis es the following properties. [Property] There exists only one gateway between every two dierent component clusters in C . 2 The property means that there is at most one gateway between every two entities. Each PDU arrives at every destination entity by passing at most one gateway. Figure 5 shows three possible complex clusters of degree 4(h = 4). One problem in broadcasting PDUs among component clusters via gateways is the proliferation of the PDUs. If each gateway GE receives a PDU p which has passed one gateway already, GE removes p . Thus, every PDU never passes more than i

im

ij

Complex Cluster

h

i

n

k

i

i

i

A at cluster means just a set of entities. The OP protocol for the at cluster cannot be adopted for a large number of entities due to the processing time and PDU length. In this paper, we discuss how to provide an OP service for a large number of entities by partitioning the at cluster into smaller disjoint subsets of entities C1, : : :, C (h 2) named component clusters . Each entity in each C executes the same OP protocol as in the at cluster. There are gateways which exchange PDUs among the component clusters. A complex cluster is composed of multiple component clusters interconnected by gateways. Communication entities are ones which are not gateways and provide the cluster service for the application entities. [Example] In the OP protocol for a at cluster F = hE11 , E12 , E13 , E21 , E22 , E23 i, each PDU is broadcast to all the entities in F . In a complex cluster C , the entities are grouped into two component clusters C1 = hE11 , E12 , E13 i and C2 = hE21 , E22 , E23 i. In order for C1 and C2 to communicate with each other, there exists a gateway entity GE between C1 and C2 .

g

i

i

i

i

j

i

g

i

i

i

ij

i;gi

imi

ij

igi

im i

i

i

ij

i

i

i

i

one gateway. Furthermore, there is only one way from each component cluster to every one according to the property. Hence, no duplicate PDUs are delivered to every component cluster.

Each GE interconnects k ( 2) component clusters C 1, : : :, C (i = 1, : : :, g), written as GE = hC 1, : : :, C i. Here, the connectivity of GE is a number k of the component clusters connected by GE . GE is composed of multiple subgateways SGE 1 , : : :, SGE . Each SGE not only executes the OP protocol as a member of C (j = 1, : : :, k ) but also exchanges PDUs among SGE 1, : : :, SGE . There are multiple ways to realize GE . In one way, SGE 1, : : :, SGE are implemented in one processor and communicate with each other through using the memory [Figure 6(a)]. In the second way, the subgateways are distributed to dierent processors interconnected by the 1C network [Figure 6(b)]. The communication time in the former way is shorter than the latter one since the intra-gateway communication is realized by using the memory. In the latter approach, the overhead of the gateway can be distributed to multiple processors. i

i

i

i

iki

i

i

ik i

i

i

i

i

ij

ik i

ij

i

i

iki

i

i

ik i

Figure 5: Complex clusters (h = 4) 4

OP Protocol for Complex Cluster

We present the OP protocol for a complex cluster including n entities, which is named a COP protocol. 4.1

Structure of gateway

Let C be a complex cluster hC1 , : : :, C , GE 1, : : :, GE i(h 2, g 1). A PDU p broadcast by E in C is composed of the following elds (i = 1; : : :; h). h

g

Figure 6: Structure of GE

ij

i

4.2

p .CID = cluster identi er, i.e. C . p .SEQ = sequence number of p . p .SRC = entity which originally sends p . p .LSRC = entity which sends p in C , i.e. E . p .ACK = sequence number of PDU which E i

i

ij

i

If E is a gateway, p .SRC (6= p .LSRC ) is a communication entity which originally transmits p in C (6= C ). If E is a communication entity, p .SRC = p.LSRC = E . When a gateway receives p , if p .SRC 6= p .LSRC , the gateway removes p because p has passed one gateway already. By this simple mechanism, no duplicate PDUs are broadcast among the component clusters. ij

k

ij

ij

i

i

it

expects to receive next from E (k = 1, : : :, c ). p .DATA = data.

i

ij

i

ij

ik

Data transmission procedure

The communication entities execute the OP protocol presented in the preceding section. Hence, we would like to discuss how each gateway GE receives and sends PDUs (i = 1, : : :, g). On receipt of a PDU p in C , each SGE of GE forwards p to all the subgateways in GE (j = 1; : : :; k ). On receipt of a PDU q from each SGE (t = 1, : : :, k ), SGE broadcasts q in C . When SGE knows that p from E is accepted by all the communication entities in C , SGE broadcasts a PDU r which includes the acceptance con rmation of p , i.e. r .ACK p .SEQ . Every entity E pre-acknowledges p if E receives r in C (h = 1; : : :; c ). SGE for C has the same data structure as the OP protocol, i.e. SEQ , REQ 1 , : : :, REQ , c 2 c matrixes AL and PAL, and logs SL , RRL , PRL , ARL (s = 1, : : :, c ) to execute the OP protocol as an entity of C . Here, E denotes SGE in C where e is the entity number of SGE in C . Let minCAL be a minimum of AL1 , : : :, AL which means that SGE knows that every communication entity in C i

ij

k

i

ij

i

ij

ij

ij

ij

ij

j

ih

ih

i

ij

ij

ij

cij

ij

s

s

s

ij

ij

ij

ij eij

ij

ij

s

ij

ij

ij

ij

s

mij s

ij

has accepted PDUs from E whose SEQ < minCAL (s = 1; : : :; c ). Here, m is a number of the communication entities in C . If p .SEQ < minCAL , p is referred to as locally pre -acknowledged in SGE . SGE has the following data structure to communicate with the subgateways in GE . Here, data units exchanged among the subgateways are referred to as intra-gateway PDUs (GPDUs). ij s

ij

r .ACK := GREQ (t = 1, : : :, k ); r is broadcast to all the subgateways of GE and (b) is iterated;

s

t

ij

ij

g2

s

ij

ij

i

GSEQ = sequence number of GPDU which

SGE expects to broadcast next in GE . GREQ = sequence number of GPDU which SGE expects to receive next from SGE (t = 1, : : :, k ). GAL = sequence number of GPDU which SGE knows that SGE expects to receive next from SGE (t, u = 1, : : :, k ). GSL = sending log for SGE 1 , : : :, SGE i . GRL = receipt log from SGE 1 , : : :, SGE i . ij

i

t

ij

it

i

tu

ij

it

iu

i

i

ik

i

ik

Let minGAL denote min (GAL1 , : : :, GAL ) (t = 1; : : :; k ). In the same way as the OP protocol, SGE knows that all the subgateways in GE have accepted GPDUs from SGE whose SEQ < minGAL . Let top (L) denote a top of L. enqueue (L, x ) is a function to enqueue x into L. x := dequeue (L) means that x = top(L) is dequeued from L. When SGE receives a PDU p from E in C , SGE behaves as follows. t

t

ki t

i

ij

i

it

t

ij

ij s

[Receipt of

p

from

ij

E

ij s

ij

in

C

ij

ij s

s

s

ij

s

s

st

s

t

ij

ij

i

ij

i

t

ij

i

t

ij u

u

i

When SGE accepts p from E of C , p is transformed into a GPDU gp . By using the intra-gateway sequence number gp .GSEQ , gp is accepted and then is pre-acknowledged in the subgateways in the same way as the OP protocol. In the second step, SGE removes PDUs which have passed one gateway already. Only PDUs broadcast by communication entities in C are forwarded to the other component clusters. There is exactly one gateway between every two different component clusters according to the complex cluster property. Thus, duplicate PDUs are not broadcast. SGE executes the following procedure on receipt of a GPDU gp from SGE (t = 1, : : :, g ). ij

ijs

ij

ij

ij

ij

it

[Receipt of

gp

from

SGE

i

it

]

1. If gp .GSEQ 6= GREQ , SGE requires SGE to rebroadcast GPDUs whose GSEQ < GREQ . SGE fails to receive GPDUs from SGE . 2. If gp .GSEQ = GREQ , SGE accepts gp and executes the following steps. (a) GAL := gp .ACK (u = 1; : : :; k ); enqueue (GRL , gp ); GREQ := GREQ + 1; (b) /* inter-gateway communication */ gq := top (GRL ) (where SGE = gq .SRC ); while ( gq .GSEQ < minGAL ) f gq := dequeue (GRL ); /* gq is accepted by all the subgateways */ if ( gq .SRC = SGE ) f r := gq .PTR ; r .MARK := ACCEPTED ; g gq := top (GRL ); ij

t

it

t

ij

it

ij

t

tu

u

t

i

t

iv

]

1. SGE accepts p from E according to the OP protocol. That is, enqueue (RRL , p ) if p .SEQ = REQ . 2. If p .SRC 6= p .LSRC , SGE removes p from RRL . REQ := REQ +1 and AL := p .ACK (t = 1, : : :, c ) according to the OP protocol. 3. If p .SRC = p .LSRC , SGE executes the following steps and broadcasts a GPDU of p to all the subgateways in GE . (a) SGE creates a following GPDU gp , and then enqueue (GRL , gp ). gp has the same format as p except that gp has an additional eld PTR to point p . gp .CID := GE ; gp .SRC := SGE ; gp .LSRC := p .SRC ; gp .GSEQ := GSEQ ; gp .GSEQ := GSEQ ; GSEQ := GSEQ + 1; gp .DATA:= p .DATA; q .PTR : = p ; q .ACK := GREQ (t = 1; : : :; g ); (b) r := top (GSL ) (where r .SRC = E and pr = r .PTR ); if ( pr .SEQ < minCAL ) f =3 pr is accepted by all the communication entities. 3= r := dequeue (GSL ); ij

i

t

v

ij

g

(c) /* communication in C */ for ( l = 1; : : :; m j ) f p := top (RRL ) (where E = p .SRC ); while ( p .MARK = ACCEPTED ) f /* p is pre-acknowledged according to the OP protocol */ p := dequeue (RRL ); PAL := p .ACK (u = 1, : : :, k ); enqueue (PRL , p ); REQ := p .SEQ + 1; g g2 Even if SGE accepts p from E , REQ is not changed. This means that the acceptance con rmation of p is not carried by PDUs broadcast by SGE in C after SGE accepts p . If p is locally preacknowledged in all SGE 1, : : :, SGE of GE , SGE changes REQ to p .SEQ + 1. Then, if a PDU q is ij

i

l

ij s

l

su

u

i

s

s

ij

ijs

s

ij

ij

ij

i

s

ik i

i

ij

broadcast by SGE in C , q informs all the entities in C that SGE has accepted p , i.e. q .ACK = REQ . ij

ij

ij

ij

s

s

Suppose that E11 in C1 broadcasts p in Figure 7. SGE 1 (= E14 ) receives p , creates a GPDU gp from p , and forwards gp to SGE 2 (= E24 ). SGE 2 receives gp and creates q from gp . SGE 2 broadcasts q in C2 . If q is locally pre-acknowledged in SGE 2 , i.e. q .SEQ < minCAL4 , SGE 2 sends SGE 1 a GPDU gr where gr .ACK 1 = gp .GSEQ +1. On receipt of gr , since gp .GSEQ < minGAL1, SGE 1 marks p in RRL as ACCEPTED , where p is pointed by gp .PTR . If SGE 1 accepts PDUs from all E11 , E12 , and E13 which carry the acceptance con rmation of p , REQ 1 is changed to p .SEQ +1. A PDU g broadcast by SGE 1 carries information that p (from E11 ) is accepted by SGE 1, i.e. g.ACK 1 p .SEQ . If each E1 accepts g in C1, p is pre-acknowledged in E1 . 2 If SGE fails to receive some GPDU p from SGE , SGE detects the loss of p by checking the GSEQ of GPDUs as presented in the OP protocol and then requires SGE to rebroadcast p . On receipt of the retransmission request, SGE broadcasts p again. [Example]

s

j

j

ij

it

ij

it

it

5

Cluster Construction

In this section, we would like to discuss how to construct a complex cluster C from a set E of communication entities E1, : : :, E . n

5.1

Construction of complex cluster

We assume that each entity has the same processing speed. For example, each entity is implemented in a same type of workstation. It is desirable for each entity to have the same processing time for each PDU to acknowledge in C . Hence, we try to construct a complex cluster C from E = fE1, : : :, E g so that each entity has similar processing time. First, E is partitioned into h (1 h n) component clusters C1, : : :, C so that the dierence between every two component clusters in the cardinality is at most one. That is, for m = bn=hc and h2 = n modulo h, h1 ( 1) = h 0 h2 complex clusters have m communication entities, and h2 ( 0) complex clusters have m + 1 communication entities. Let H be a set of complex clusters fC1, : : :, C g (h 1). Next, C1 , : : :, C in H are interconnected by gateways. The processing time of each gateway depends on the connectivity as presented before. The complex clusters are interconnected by gateways whose connectivity is equal to or less than k (1 k h ). k gives the upper bound of processing time of the gateway. The complex clusters are interconnected by gateways by the following procedure. n

h

h

h

[Interconnection procedure]

1. One complex cluster is selected from H . It is written as C0 . 2. For a given k, h 0 1 clusters are partitioned into groups G1 , : : :, G where r = d(h 0 1)=(k 0 1)e. Each G includes k 0 1 component clusters for i = 1; : : :; p and p = b(h 0 1)=(k 0 1)c. G +1 r

i

p

includes q = (h 0 1) modulo (k 0 1) component clusters if q > 0. Here, component clusters in each G are totally ordered like C 0 , C 1 , : : :, C 02 for i p, and C +1 0 , C +1 1 , : : :, C +1 01 if q > 0. Here, it is required that k p if q = 0 and k p + 1 if q > 0. 3. C0 and all the component clusters in each G are interconnected by a gateway hC0 , C 0, : : :, C 02 i of connectivity k for i = 1; : : :; p and hC0, C +1 0 , : : :, C +1 01 i of connectivity q for G +1 if q > 0. 4. C10 , C2 , : : :, C , C +1 are interconnected by a gateway of connectivity p +1 for i = 0; 1; : : :; q 0 1 (q > 0). C10 , C2 , : : :, C are interconnected by a gateway of connectivity p for i = q; q +1; : : :; k 0 2(q 0). 5. If q = 0, C1 , C2 , C3 +1 , : : :, C + 02 are interconnected by a gateway of connectivity p for i = 1; : : :; k 0 2; j = i; i + 1; : : :; i + k 0 2. Here, C = C if j = j 0 modulo (k 0 2). If q > 0, C10 , C2 , C3 +1 , : : :, C + 02, C +1 + 02 are interconnected by a gateway of connectivity p + 1 for i = 1; : : :; k 0 2; j = i; i + 1; : : :; i + k 0 2, and j + p 0 2 q 0 1. C10 , C2 , C3 +1 , : : :, C + 02 are interconnected by a gateway of connectivity p for i = 1; : : :; k 0 2; j = i; i + 1; : : :; i + k 0 2, and j + p 0 2 > q 0 1. 2 i

i

p

;

p

i

ik

;

p

;q

i

i

i;k

p

;

p

i

;q

pi

p

p

;i

i

i

pi

j

;j

p

ij 0

ij

j

p

p;j

;j

j

;j

p;j

p

p

;j

p;j

p

Suppose that there are one hundred communication entities, i.e. n = 100. They are partitioned into ten (h = 10) complex clusters, C1 , : : :, C10 where each C has ten communication entities, i.e. m = 10 (i = 1, : : :, 10). Here, H = fC1, : : :, C10 g. 1. First, one complex cluster, say C1 is selected from H . Suppose that k = 4, i.e. the maximum connectivity of each gateway is 4. fC2, C3, : : :, C10 g is decomposed into three ordered groups G1 = hC2, C3, C4 i, G2 = hC5 , C6, C7i, and G3 = hC8, C9 , C10 i. C2 , C3 and C4 are interconnected with C1 by a gateway hC1 , C2 , C3, C4 i of connectivity 4. Then, C5, C6 , and C7 are interconnected with C1 by a gateway hC1 , C5, C6, C7i, and C8, C9, and C10 are interconnected with C1 by a gateway hC1, C8 , C9 , C10 i. 2. C2 is selected from G1 . hC2, C5, C8i, hC2, C6 , C9 i, and hC2 , C7, C10 i are interconnected as shown in Figure 8(a). 3. C3 is selected from G1 . hC3 , C5, C9i, hC3 , C6, C10 i, and hC3 , C7, C8 i are interconnected by gateways of connectivity 3. 4. C4 is selected from G1 . hC4 , C5 , C10 i, hC4, C6, C8 i, and hC4 , C7, C9 i are interconnected by gateways of connectivity 3. Here, a complex cluster is constructed as shown Figure 8(b). There exists only one gateway among two component clusters. 2 [Example]

i

i

Figure 7: Three-phase procedure between C1 and C2 By the algorithm, a complex cluster is constructed for a given set of n entities, a number h of clusters, and a connectivity k of gateways under the constraint k (h 0 1)=(k 0 1). It is clear that there exists only one gateway between every dierent two component clusters. 5.2

Processing time

Let us consider that a complex cluster C = hC1 , : : :, C , GE 1 , : : :, GE i for n communication entities E1 , : : :, E , where C = hE 1, : : :, E , GE 1 , : : :, GE i and C includes c = m + g entities (i = 1; : : :; n). Since the OP protocol is used in each C , each E (j = 1; : : :; c ) has O(c 2 ) processing time. Hence, the processing time of E is represented as c 2 = (m + g )2. The total processing time CEP of communication entities in C is 6 =1 : : : m 2 (m + g )2 . Each GE connects k component clusters hC 1, : : :, C i (i = 1; : : :; g ). Each SGE in GE has c2 = (m + g )2 processing time to execute the OP protocol in C (j = 1; : : :; k ), and k 2 processing time in order to exchange PDUs among the subgateways. Hence, the total processing time GEP of all the gateways is 6 =1 : : : 6 =1 : : : (c 2 + k 2 ) = 6 =1 : : : g 2 (m + g )2 + 6 =1 : : : k 3 where 0 1. is a ratio of intra-gateway communication time to h

g

n

i

i

i

i

igi

imi

i

ij

i

i

i

i

i

ij

i

i

i

;

;h

i

i

i

ij

ij

i

i

iki

i

ij

ij

ij

i

i

i

i

i

i

;

i

;g

j

;

;ki

i

;

i

ij

;g

i

i

i

;

;h

inter-entity communication time. If the intra-gateway communication is realized by the 1C service as shown in Figure 6(a), = 1. If GE is realized in one processor as shown in Figure 6(b), < 1. The total processing time TEP is CEP + GEP = 6 =1 : : : m 2 (m + g )2 + 6 =1 : : : 6 =1 : : : (c 2 + k 2) = 6 =1 : : : (m + g )3 + 6 =1 : : : k 3 . Here, let F be a at cluster hE1, : : :, E i including the same entities as C . i

i

i

i

i

;

;h

i

i

i

;

i

;g

j

;

i

;

;g

;ki

ij

;

;h

i

i

n

5.3

Evaluation

Let E be a set fE1, : : :, E g of entities. We consider the following measures for each component cluster obtained from E by the complex cluster construction algorithm for n, h, and k. Here, n is the number of entities, k is the number of component clusters, and k is the maximum connectivity of gateways. CEP /n = average processing time of each communication entity. GEP /g = average processing time of each gateway entity. GEP /n = average gateway processing time of each communication entity. TEP /n = average total processing time of each communication entity, i.e. (CEP + GEP )=n. n

Figure 8: Complex cluster construction(n = 100, h = 10, k = 4)

0.25

TEP/n^3 CEP/n^3 GEP/n^3 GEP/gn^2

0.20 processing time

First, we compute the processing time, TEP =n of each entity. As presented in the previous subsection, each complex cluster is given by three parameters n, h, and k. Figure 9 shows the ratios of TEP =n, CEP=n, GEP=n, GEP=g of C to the processing time n2 of the at cluster for = 1:0). Figure 9 shows that the processing time can be reduced into one tenth to one hundredth of the at cluster. For each n, a complex cluster of h and k which gives the minimum TEP =n is computed. Figure 10 shows the minimum TEP =n for n in a case of = 1 and = 0.1. Following Figure 10, it is concluded that each communication and gateway entity has O(n) processing time for n. Figure 11 shows g and h which gives the minimum TEP for each n. As shown in this gure, TEP gets minimum if g = h. Here, let c be an average number of entities in each component cluster where TEP is minimum for n. Figure 11 also shows c, h, and g for n. Figure 12 shows the ratio of the average number c of the entities in each component cluster to the number n of the entities in the at cluster for n. The header length can be reduced in the complex cluster.

0.15

0.10

0.05

0.00 0

50

100 150 200 250 number of entities(n)

300

Figure 9: Ratio of processing time to the at cluster( = 1:0)

6

400

processing time

350 300 250 200 150 100

TEP/n, alpha=1.0 TEP/n, alpha=0.1

References

50 0 0

50


300

Figure 10: Processing time

45 40 35 30 number

Concluding Remarks

In this paper, we have discussed how to provide high-speed group communication for large number of entities interconnected by a high-speed one-channel network like FDDI . The entities are partitioned into disjoint groups named component clusters. Every component cluster has exactly one gateway for every other component cluster. In the complex cluster, it is easy to prevent PDUs from being broadcast repeatedly. We have shown the COP protocol for the complex cluster by which each entity can receive PDUs in the sending order. We have shown the performance evaluation of the COP protocol in the complex cluster compared with the OP protocol in the at cluster. The COP provides less processing time and PDU length than the at cluster.

25 20 15 10

g h c

5 0 0

50


300

Figure 11: number of gateways and component clusters( = 1:0)

[1] American National Standards Institute, \FDDI Token Ring Media Access Control (MAC)," ANSI X3.139 , 1987. [2] Birman, K., Schiper, A., and Stephenson, P., \Lightweight Causal and Atomic Group Multicast," ACM Trans. on Computer Systems , Vol.9, No.3, 1991, pp.272-314. [3] Chang, J. M. and Maxemchuk, N. F., \Reliable Broadcast Protocols," ACM Trans. on Computer Systems , Vol.2, No.3, 1984, pp.251-273. [4] Chanson, S., Neufeld, G., and Liang, L., \A Bibliography on Multicast and Group Communications," ACM SIGOPS Operating Systems Review , Vol.23, No.4, 1989, pp.20-25. [5] Defense Communications Agency, \DDN Protocol Handbook," Vol.1 - 3, NIC 50004-50005, 1985. [6] Doeringer, W. A., Dykeman, D., Kaiserswerth, M., Meister, B. W., Rudin, H., and Williamson, R., \A Survey of Light-Weight Transport Protocols for High-Speed Networks," IEEE Trans. on Communications , Vol.38, No.11, 1990, pp.20252039. [7] Ellis, C. A., Gibbs, S. J., and Rein, G. L., \Groupware," Comm. of ACM , No.1, 1991, pp.38-58. [8] Garcia-Molina, H. and Kogan, B., \An Implementation of Reliable Broadcast Using an Unreliable Multicast Facility," Proc. of the 7th IEEE Symp. on Reliable Distributed Systems , 1988, pp.428-437. [9] Garcia-Molina, H. and Spauster, A., \Message Ordering in a Multicast Environment," Proc. of the 9th IEEE Int'l Conf. on Distributed Computing Systems , 1989, pp.354-361. [10] International Standards Organization, \OSI { Connection Oriented Transport Protocol Speci cation," ISO 8073, 1986.

PDU header length/n

0.5

0.4

0.3

0.2

0.1

0

50


300

Figure 12: Header length( = 1:0) [11] ISO \Data Processing { Open Systems Interconnection { Basic Reference Model," ISO 7498 , 1987. [12] Kaashoek, M. F., Tanenbaum, A. S., Hummel, S. F., and Bal, H. E., \An Ecient Reliable Broadcast Protocol," ACM Operating Systems Review , Vol.23, No.4, 1989, pp.5-19. [13] Luan, S. W. and Gligor, V. D., \A Fault-Tolerant Protocol for Atomic Broadcast," IEEE Trans. on Parallel and Distributed Systems , Vol.1, No.3, 1990, pp.271-285. [14] Melliar-Smith, P. M., Moser, L. E., and Agrawala, V., \Broadcast Protocols for Distributed Systems," IEEE Trans. on Parallel and Distributed Systems , Vol.1, No.1, 1990, pp.17-25. [15] Nakamura, A. and Takizawa, M., \Reliable Broadcast Protocol for Selectively Ordering PDUs," Proc. of the 11th IEEE Int'l Conf. on Distributed Computing Systems , 1991, pp.239246. [16] Nakamura, A. and Takizawa, M., \Design of Reliable Broadcast Communication Protocol for Selectively Partially Ordered PDUs," Proc. of the IEEE COMPSAC91 , 1991, pp.673-679. [17] Nakamura, A. and Takizawa. M., \Priority-Based Total and Semi-Total Ordering Broadcast Protocols," Proc. of the 12th IEEE Int'l Conf. on Distributed Computing Systems , 1992, pp.178185. [18] Schneider, F. B., Gries, D., and Schlichting, R. D., \Fault-Tolerant Broadcasts," Science of Computer Programming , Vol.4, 1984, pp.1-15.

[19] Takizawa, M., \Cluster Control Protocol for Highly Reliable Broadcast Communication," Proc. of the IFIP Conf. on Distributed Processing , 1987, pp.431-445. [20] Takizawa, M., \Design of Highly Reliable Broadcast Communication Protocol," Proc. of IEEE COMPSAC87 , 1987, pp.731-740. [21] Takizawa, M. and Nakamura, A., \Partially Ordering Broadcast (PO) Protocol," Proc. of the 9th IEEE Int'l Conf. on Computer Communications (INFOCOM ), 1990, pp.357364. [22] Takizawa, M. and Nakamura, A., \Reliable Broadcast Communication," Proc. of IPSJ Int'l. Conf. on Information Technology (InfoJapan ), 1990, pp.325-332. [23] Tanenbaum, A. S.: \Computer Networks (2nd ed.)," Prentice-Hall , Inc., 1989.