Detection of Free Distributed Cycles in Large ... - Semantic Scholar

Detection of Free Distributed Cycles in Large-Scale Networks

Fabrice Le Fessant (Email: Fabrice.Le [email protected]) INRIA Rocquencourt, B.P. 105, 78153 Le Chesnay Cedex, France September 16, 1998

1 Introduction

Abstract

Most detectors of free cycles are either per-cycle algorithms with local detection, either global algorithms with partitionned detection. Our detector is a global algorithm with local detection.

Automatic detection of free cycles has always been a challenge in distributed systems. We already designed and implemented an unexpensive detector [LFPS97, LFPS98] for partitionned systems, based on timestamps propagation. Here, we present a new algorithm, for large-scale networks(such as the Internet), based on the propagation of more complicated marks. Compared to our previous algorithm, it collects cycles independently and faster, with full locallity. Compared to other detectors, it is far less expensive in terms of both implementation complexity and resources consumption (time, messages and memory), and detects free cycles all-at-once. Our new algorithm combines two novel mecanisms: min-max marking is a technique of marks propagation, where pairs of local garbage collections are gathered to mark each outgoing reference with the maximal and the minimal marks on local roots and incoming references it is locally reachable from. Sub-generation is the construction of a partial acyclic graph upon a potentially free cycle, either by optimistic or pessimistic back-tracing. Finally, our algorithm is completely asynchronous and distributed, conservative with communication and space failures. It only requires one message per two local garbage collection for each co-working spaces, and accepts incremental local tracing. Keywords: distributed garbage collection, cycles detection, sub-generation, optimistic backtracing, min-max marking Full

2 Model Our system is composed of a set of spaces, which are the basic units of computation. Each space has its own local memory, managed with a local garbage collector. Spaces can not access memory of other spaces. Spaces must communicate through asynchronous messages. Communication medium is supposed unreliable. Thus, messages may get lost, mis-ordered or duplicated. Spaces can hold references to objects in other spaces. Such references are materialised by two objects, called stub (or exit item) and scion (or entry item): Concretely, a reference R from object A in space X to object B in space Y is represented by (Figure 1): a local pointer in X from A to the stub stubX (R) in X and a local pointer in Y from the scion scionY (R) in Y to B . Each stub contains the identi cation of the associated scion in a dedicated structure, called the locator. Remote references are created and used by sending such locators in application messages of the form: APPLX !Y (target; params), where X is the sender space, Y is the destination space, target is either a NULL pointer (for a reply for example) or the locator of an object located in Y (thus, no

time student in the PARA and SOR Projects.

1

Y

1

X

1

G1 B

G1

A

scion

stub

1

Figure 1: A remote reference from space X to space Y .

1

1

1

1

1

1

1

1

1

G2 2

reference is created, but instead an already existing remote reference is used), and params are locators for objects referenced in the parameters of the call. For the latter objects, new remote references may be created, by creating new scions in X before sending the message, and creating the associated stubs in Y when the message is received.

Figure 2: The basic scheme: the left cycle is reachable, while the right one is unreachable. In the left case, generator G2 can not receive its mark, while generator G1 receives its mark, although gray (since mixed with mark of G2). In the right case, generator G1 receives its white mark: the cycle is detected and can be collected (by removing the pointer/generator G1).

3 Overview Our algorithm is based on coloration of graphs: a white mark is propagated from some special scions and all local roots, called generators, along chains of remote pointers. When two dierent marks, ie from dierent generators, are propagated to the same stub (this is detected by a simple mecanism, called min-max marking), the stub propagates a gray mark to its associated scion. When a scion generator receives its own white mark from its associated stub, it detects that it belongs to a cycle (its own mark has gone back to it) with no other generators (the white color), thus it is not reachable from any roots (see Figure 2 for a basic example). However, this basic scheme is not susant for all cycles: in particular, all scions in a cycle can be marked with the same mark, but an original gray color may never disappear from a sub-cycle (see Figure 3). In such a case, a second mecanism, called sub-generation, is used, to remove orphan gray marks: when the generator receives its own gray mark from its associated stub, it starts a selective back-trace from its stub. If a scion can propagates the generator gray mark to its stub, the scion becomes a sub-generator, and starts propagating the generator white mark. Scions propagating the generator gray mark to sub-generator stubs also become sub-generators. Such a selective back-trace

can be eciently implemented locally by two different techniques (described in section 5.3). As a consequence of the sub-generation mecanism, orphan gray marks eventually disappear from the cycle. When all sub-generators and the generator only receive the white mark of the generator, the cycle is detected (see Figure 4). Finally, subgeneration may also abort, if a cycle is reachable from a generator which propagates a smaller mark than the mark propagated by the generator inside the cycle. In such a case, sub-generation is aborted when a greater mark is received, and a new subgeneration is started by the inside generator. Anyway, either the cycle will become unreachable, and a sub-generation will eventually succeed, either the cycle will remain reachable, and all sub-generations will abort, preventing erroneous reclamation of the cycle.

4 Data structures and messages

The algorithm is based on the propagation of mark structures from scion structures to stub struc2

1

1

1

2

1

1

1

1

1

1

1

1

1

1

1

sub-G1

G2

1

1

G1

1

1

G1

1

1

Figure 3: A bad case: at the beginning, two generators G1 and G2 are present on the cycle. Consequently, gray marks appear on the sub-cycle, and will not disappear, even after removal of generator G2. Thus, basic scheme detection is not complete.

G1

1

1

G1

1

Figure 4: Sub-generation: generator G1 receives its own mark, but gray. It starts a sub-generation, by creating the sub-generator sub-G1 in its back-trace. Sub-G1 always propagates the white mark of G1, thus removing the gray color from the sub-cycle. When both G1 and Sub-G1 receive G1 white marks, the cycle is detected and reclaimed (by removing pointers G1 and Sub-G1).

tures.

Mark structures contain three elds: an identi er (of mark ident structure), a color and a distance. The color is either White, or Gray when another mark is met or Black when its sub-generation is aborted. The distance is the maximal number of stub-scion pairs that the mark can be propagated along. Finally, the identi er (mark ident structure) contains informations on the mark generator: its creation time1 , its mark range, its generator identi er and its sub-generation number. Creation time is used to distinguish two dierent usages of the same scion as a generator. The mark range is the maximal number of stub-scion pairs from the generator that the mark can be propagated along. The generator identi er is a unique global identi er for the generator of that mark (a generator is either a scion or a local root), and the sub-generation number is the number of sub-generations aborted by that generator. Marks are sorted in the lexicographical order of their elds, with a strict order on scion identi ers, such that

type mark = { id : mark_ident; color : {White,Gray,Black}; distance : integer; } type mark_ident = { time : float; range : integer; generator : scion_id; subgen_num : integer; }

The stub and scion structures are extended for cycle detection. Each stub structure contains two marks propagated during local garbage collections. One is the minimal mark and the other one is the maximal mark on local roots and scions from which the stub is locally reachable. Two other elds are used for sub-generation: one is either NULL if the stub is not involved in a sub-generation, or the mark identi er for the sub-generation it is involved in, whereas the other eld contains a list of scions which are sub-generators of the generator, and which are associated with this stub. 1 Time is either received from a real hardware clock, or Each scion structure contains the mark propafrom a Lamport clock. Using a hardware clock is safe gated by the current local garbage collection, the and simpler, but may increase the number of aborted subgenerations when spaces clocks are not enough synchronized. next mark which will be propagated by the next 3

never overcome their range, and few generators are created. However, ranges have the desired property that any mark present in an unreachable cycle is bound to overcome its range, so that, in any unreachable cycle, some generators will eventually appear. Each generator creates its own mark, and propagates it to reachable stubs. While marks generated by local roots are all the same (with the exception of their distance eld), marks from generators contain the unique identi er of their generator. Marks from a generator always have a range strictly greater than the range of the mark that created the generator. This is important, to insure that, for any cycle, a mark will be created with a range greater than the circonference of the cycle.

local garbage collection2 , and the mark received from its associated stub. For sub-generation, the scion structure also contains two elds, one is either NULL if the scion is not involved in any subgeneration, or the mark identi er for which it is a sub-generator, and another eld containing the stub it is associated with in the sub-generation. Finally, two ags are added, one if the scion is a generator, and one if the scion has been detected inside a free cycle. type stub = { /* stub structure extended with: */ min_mark : mark; max_mark : mark; back_gen : NULL or mark_ident; back_trace : scion list; } type scion = { /* scion structure extended with: */ mark : mark; next_mark : mark; stub_mark : mark; back_gen : NULL or mark_ident; back_stub : NULL or stub; generator : boolean; deleted : boolean; }

5.2 Min-max marking During local garbage collection, marks are propagated from local roots and scions to reachable stubs. However, each stub can be reachable from several scions or roots, with dierent marks, but only one mark must be propagated to its associated scion. This dilemma is solved in most algorithms using either timestamp propagation[Hug85, LFPS98] or distance heuristic [ML95, ML97] by marking the stub with the greatest mark on roots or scions it is reachable from. This is commonly implemented by sorting local roots and scions before the local garbage collection, and tracing them in a decreasing order. Here, we take a somehow dierent approach: local garbage collections are associated in pairs, where the rst one traces roots and scions in a decreasing order, while the second one traces them in the increasing order. Between two local garbage collections of the same pair, marks on scions can not be modi ed. Instead, the new mark is stored in the scion, and all scions are updated atomically at the beginning of the next pair of local garbage collections. Consequently, stubs are marked with the maximal mark at the rst trace, and with the minimal mark at the second trace. Both are stored in the stub structure, for latter usage at the end of the pair of local garbage collections. At this point, we recall that each mark contains a color, which is either White, Gray or Black. For

5 The algorithm

5.1 Mark generation

Marks are generated by local roots at each local garbage collection. All local roots propagate the same mark. These marks are propagated to stubs during local garbage collection and from stubs to scions on the network after each local garbage collection. Then, during new local garbage collections, marks are propagated from both local roots and scions to stubs. We now extend marks with ranges: each mark is only allowed to propagate along a xed amount (its range) of stub-scion pairs. When a mark has overcome its range, the scion to which it is propagated becomes a generator. Hopefully, most marks 2 In fact, by the next min-max marking, since one minmax marking includes two local garbage collections.

4

5.3.1 Pessimistic back-tracing

each stub, we can compare its minimal and maximal mark at the end of each pair of tracings: If both marks identi ers and colors are equal and the color is White, the stub propagates the mark to its associated scion. If both marks identi ers are equal, but either the colors are dierent or one color is not White, the stub propagates the maximal mark to its associated scion3 If both marks identi ers are distinct, the stub propagates the maximal mark to its associated scion, after changing the color to Gray. Consequently, when a generator receives its own White mark from its associated stub, this means that, for all stubs from which it has been propagated, their minimal and maximal marks identi ers were equal and both of White color. Thus, during its traversal of the graph of pointers, from its generator scion to its generator stub, the mark never met neither another mark, nor a Gray mark. This is clearly enough for the generator to detect it belongs to a free distributed cycle (see Figure 2 for such a case).

In pessimistic back-tracing, a scion is associated with a stub involved in the back-trace and becomes a sub-generator when it propagates the generator Gray mark to that stub, and the scion is not a subgenerator. Thus, only one sub-generator is created per one stub and one pair of local garbage collection. This technique is slow, but leads to fewer abortions of the sub-generation mecanism, since only scions leading to the generator are included in the back-trace.

5.3.2 Optimistic back-tracing In optimistic back-tracing, all scions propagating the Gray mark of the generator are immediatly inserted in the back-trace of a stub when this stub is marked with that Gray mark during the local garbage collection and become sub-generators for that stub. We must notice that with this technique, scions can be included in the back-trace of a stub even if that stub is not reachable from those scions. This is safe, since this is equivalent to adding pointers from those scions to the stub, thus it only increases reachability. It is also complete, since such marks are eventually only present on scions which are only reachable from the generator4. This technique is really fast, but leads to more abortions of the sub-generation mecanism.

5.3 Sub-generation

As shown on Figure 3, the previous mecanism is not susant to detect all free cycles with sub-cycles. Thus, we extend the previous algorithm with a new mecanism: a partial back-trace is started when the generator receives its own mark with a Gray color. Each scion in the back-trace is locally associated with a stub and becomes a sub-generator: it then propagates the same mark as the generator with a White color. Moreover, a color is also associated with each sub-generator: White if all sub-generators associated with the associated stub are also White, and if the assocaited stub receives only the White mark of the generator of the back-trace, Gray in other cases. Two techniques are available to associate sub-generators with a stub: optimistic or pessimistic back-tracing. 3 Two remarks: White < Gray < Black, and the mark

5.3.3 Abortions

5.4 Reclamation

6 Implementation issues 6.1 Memory compression

Our marks are quite big: we need to propagate a color, a distance and a generator. The generator itself has a creation time, a range, a local identi er and a space. The space contains an IP address, a propagated to the scion can take the distance of the minimal port and a creation time.

mark if it is greater than the distance of the maximal mark, to prevent the mark from overcoming its range too early in a sub-cycle.

4

5

This should be further detailed.

type mark = { color : color; distance : distance; // generator gen_time : time; gen_range : range; gen_id : int;

identi er and each generator structure will only be sent once in each message. Both techniques can be used together to give the maximal compression performance. Thus, we only send 2 bytes per rooted mark, 5 bytes per non rooted mark, 15 bytes per dierent generator in a non-rooted mark, and 26 bytes per dierent space in a generator, sent for the rst time (cache miss). All these numbers are upper approximations which can be slightly decrease in a realistic system 5

[2] [16]

[64] [4] [32]

// generator space spc_addr : addr_inet; [128] spc_port : port; [16] spc_time : time; [64] } // [326] = 41 bytes

type space = { addr : addr_inet; port : port; time : time; } // type gen = { time : time; range : range; id : int; space : space; } // type mark = { color : color; distance : distance; gen : gen; } //

The structure of marks with All these informations would take 41 bytes. Thus, sending such a structure for each reference would be particularly expensive. To cope with this problem, we use two techniques. First, marks propagated from roots don't need so much information. Indeed, they are always gray, with a xed range, the current time and no useful generator. So, for these marks, we only send the unique useful information, ie the distance eld (at most 2 bytes). Since most references lengths don't exceed the range of rooted marks, two bytes only will be sent for most references. Moreover, we can notice that the distance heuristic, which is used as a key point in some other cycles detectors [ML95, ML97], has a cost approximately equal. For non rooted marks, we can use either cached structures or shared structures. Cached structures can be used eciently for space identi ers which are often long living structures and not too numerous, and possibly with generators. Each space has a vector containing all space structures. Each time a structure is sent to another space, the sender rst tests if this structure has already been sent to that space, in which case only the index of the structure in the vector is sent. Else, the structure is sent with its index in the vector, and the receiver associates the new index with the new structure. To prevent old structures to stay in the vector, the vector is cleaned somethimes. Shared structures are available in a lot of highlevel languages: communication channels are able to keep sharing between data structures sent in the same message. As a consequence, each structure

[128] [16] [64] [208] = 26 bytes [64] [4] [32] [16] [108] = 15 bytes [2] [16] [16] [34] = 5 bytes

6.2 Local garbage collection

7 Related work

Detection of cycles in large asynchronous networks has already been studied by a lot of researchers. In this section, we describe four recent algorithms, which have some similarities with our algorithm. The rst two algorithms are p er-cycle algorithms: they only answer the question \does this object belongs to a free cycle ?", and must be triggered for each suspected cycle. The two other algorithms are p artitionned algorithms, since they collect all cycles which are inside a partition of the network.

7.1 Per-cycle detectors

These two algorithms present some familiarities with our work: the rst one is the rst algorithm 5 For example, we took a maximal distance of 65535 remote references, a 64 bits clock and IPV6 addresses.

6

7.2.1 Hierarchical tracing collector

using the d istance heuristic to detect suspected objects. We also use this technique, since our marks have a distance (plus a range). The second algorithm uses a back-tracing technique to detect cycles. Contrary to our algorithm, which uses only lazy back-tracing, this algorithm needs a speci c expensive local garbage collector.

Lang and al. [LQP92] garbage collector is designed for really large scale systems. It uses a reference counting algorithm for collection of acyclic garbage, and a tracing garbage collector on small sets of spaces to collect cyclic garbage. The algorithm proceeds in four phases, which are group negotiation, initial marking, global marking and global sweeping. The group negotiation is responsible for grouping spaces which may share distributed cycles. Then, initial marking is performed on entry items from outside the group (they use a particular algorithm from ??, related to the reference counting algorithm they use for acyclic collection). Then, each local garbage collection is used to propagate the mark for the global trace from local roots and marked entry items to exit items, and these marks are propagated by messages to other spaces. When there is no more messages in transit and each space has propagated all the marks on its entry items, the global trace on the group is terminated. Each space in the group can sweep all entry items and exit items which have not been marked, thus collecting all cycles of garbage totally included in the group. The authors make an interesting improvement of this algorithm for large scale systems: small groups are included in larger groups, forming a hierarchy of groups. Each group has its own timestamp, increasing from 0 for the universal group to larger timestamps for nested groups. Instead of only propagating a mark, each local garbage collection propagates timestamps (as in Hughes'algorithm) corresponding to the marks for several groups. As a consequence, each local garbage collection participates in the traces for all groups in which the space is included. Thus, all cycles are collected, but at a slightly dierent speed depending on the size of the group in which the cycle is included. However, we must make two remarks on this garbage collection algorithm: rst, for one group, global traces are not concurrent. As a consequence, a cycle created just after the beginning of one trace will live until the end of the next trace, ie two global trace periods. In larger groups, where global trace periods might be very long, this could cost a lot of waste space. Second, the authors do not specify the termination algorithm used to detect the end of the global marking phase. ?? Pourtant, ter-

7.1.1 Migrations 7.1.2 Back tracing

Fuchs[Fuc95], Maheshwari and Liskov[ML97] and Rodriguez-Rivera and Russo[RRR97] also proposed to detect free cycles using a back tracing algorithm: when an object is suspected, its space starts a backtrace for it. For each traced entry item, all exit items from which it is reachable are known, from using reference listing, and have to be traced. For each traced exit item, either it is locally reachable, which means that the back-trace can stop since the object is reachable, either it is only reachable from entry items which have to be traced. The main drawback of back tracing is the need to compute the set of entry items from which an exit item is locally reachable. Indeed, such an information is not available with standard local garbage collectors, which must therefore be modi ed. This problem was not studied by Fuchs (as other implementation issues), while Rodriguez-Rivera and Russo have implemented their system using an expensive modi cation of Boehm algorithm[BDS91] and Maheshwari and Liskov propose to use a linear tracing algorithm from Tarjan[Tar92]. In both cases, reachability information is only partially computed, only for exit items (Rodriguez-Rivera and Russo) or only for suspected exit items (Maheshwari and Liskov). Nevertheless, both techniques are far more expensive in time than standard local tracing garbage collectors.

7.2 Partitionned systems

These two algorithms also present some familiarities with our work: both algorithms are inspired from Hughes [Hug85], which rst introduced timestamp propagation in a message-passing system. The rst one use timestamps to enable dierent traces on hierarchical partitions, while the second one only uses timestamps for dierent traces in the same partition. 7

mination algorithms do not scale well (as shown [Lam78] Leslie Lamport. Time, clocks, and the in hughes'algorithm, which must make strong reordering of events in a distributed sysquirements on the system). For example, which tem. Communications of the ACM, algorithm should be used to detect the end of the 21(7):558{565, July 1978. trace for the universal group, containing all spaces [LFPS97] Fabrice Le Fessant, Ian Piumarta, and in the world ? Marc Shapiro. A detection algorithm for distributed cycles of garbage. In Dick7.2.2 Centralised cycles detector man and Wilson [DW97]. The SSPC cycles detector[LFPS98] is inspired from Le Fessant, Ian Piumarta, and Hughes' garbage collector, slightly modi ed for [LFPS98] Fabrice Marc Shapiro. An implementation asynchronous, faulty and partitionned systems. As for complete asynchronous distributed in Hughes' algorithm, timestamps are propagated garbage collection. In Proceedings of from local roots (marked with the current time of SIGPLAN'98 Conference on Programthe local Lamport clock[Lam78]) along chains of ming Languages Design and Implemenremote pointers. tation, ACM SIGPLAN Notices, Montreal, June 1998. ACM Press. [LQP92] Bernard Lang, Christian Quenniac, and Jose Piquer. Garbage collecting the world. In Conference Record of the Nineteenth Annual ACM Symposium on [BDS91] Hans-Juergen Boehm, Alan J. Demers, Principles of Programming Languages, and Scott Shenker. Mostly parallel ACM SIGPLAN Notices, pages 39{50. garbage collection. ACM SIGPLAN NoACM Press, January 1992. tices, 26(6):157{164, 1991. [ML95] Umesh Maheshwari and Barbara Liskov. [DW97] Peter Dickman and Paul R. Wilson, Collecting cyclic distributed garbage by editors. OOPSLA '97 Workshop on controlled migration. In Proceedings Garbage Collection and Memory Manof PODC'95 Principles of Distributed agement, October 1997. Computing, 1995. Later appeared in Distributed Computing, Springer Ver[Fuc95] Matthew Fuchs. Garbage collection on lag, 1996. an open network. In Henry Baker, editor, Proceedings of International Work- [ML97] Umesh Maheshwari and Barbara Liskov. shop on Memory Management, volCollecting cyclic distributed garbage by ume 986 of Lecture Notes in Comback tracing. In Proceedings of PODC'97 puter Science, Concurrent Engineering Principles of Distributed Computing, Research Center, West Virginia Uni1997. versity, Mor gantown, WV, September [RRR97] Gustavo Rodriguez-Riviera and Vince 1995. Springer-Verlag. Russo. Cyclic distributed garbage col[Hug85] R. John M. Hughes. A distributed lection without global synchronization garbage collection algorithm. In Jeanin CORBA. In Dickman and Wilson Pierre Jouannaud, editor, Record of [DW97]. the 1985 Conference on Functional Programming and Computer Archi- [Tar92] R. Tarjan. Depth- rst search and linear graph algorithms. SIAM Journal of tecture, volume 201 of Lecture Notes Computing, 1(2), 1992. in Computer Science, pages 256{ 272, Nancy, France, September 1985. Springer-Verlag.

8 Conclusion References

8