Housekeeping for Prefix Coding

3 downloads 2191 Views 191KB Size Report
Index Terms—Adaptive coding, canonical code, Huffman code, ... length in a code; together with a mapping from symbol to an ..... www.rfc-editor.org/rfc.html.
622

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 48, NO. 4, APRIL 2000

Housekeeping for Prefix Coding Andrew Turpin and Alistair Moffat

Abstract—We consider the problem of constructing and transmitting the prelude for Huffman coding. With careful organization of the required operations and an appropriate representation for the prelude, it is possible to make semistatic coding efficient even when , the size of the source alphabet, is of the same magnitude as , the length of the message being coded. The proposed structures are of direct relevance in applications that mimic one-pass operation through the use of semistatic compression on a block-by-block basis. Index Terms—Adaptive coding, canonical code, Huffman code, minimum-redundancy code, prefix code.

I. INTRODUCTION

S

EMISTATIC compression involves two passes over a message during encoding: one to gather symbol frequency information and another to perform the actual coding once a code has been constructed. In order for the decoder to function correctly, the code used during encoding must be transmitted from encoder to decoder as a prelude, the size of which must be added to the cost of the compressed representation of the message. symbols, over an alphabet of possible For a message of symbols, the cost of transmitting the prelude is minor if is small relative to . For example, in character-based compression using a prefix code, any simple scheme—such as a sequence of 256 8-bit integers describing the codeword lengths symbols in the alphabet, with zero lengths inof the dicating that the specified symbol does not appear in the message—will suffice. On the other hand, if is larger than , creating and transmitting the prelude can be a considerable overhead in terms of both compression effectiveness and throughput. Even zeroing an -element array to accumulate symbol frequencies becomes a substantial overhead in the processing of an -symbol message when is large. One arrangement which can lead to the coding of messages is when semistatic coding is blocked. In such a with scenario, the message is broken up into blocks of symbols, and each block is processed in memory, rather than requiring two passes during encoding. This is the technique used by many common archival tools—gzip, pkzip, lha, zoo, and arj, for example [1]. When blocking is employed, it may well be the case , and algorithms will be more efficient if they have that Paper approved by E. Ayanoglu, the Editor for Communication Theory and Coding Application of the IEEE Communications Society. Manuscript received March 12, 1998; revised September 15, 1998. A. Turpin was with the Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Vic. 3052, Australia. He is now with Discoveries In Sight, Devers Eye Institute, Portland, OR 97208-3950 USA (e-mail: [email protected]). A. Moffat is with the Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Vic. 3010, Australia (e-mail: [email protected]). Publisher Item Identifier S 0090-6778(00)03631-X.

time and space complexity dependent on rather than . Similarly, the use of blocking increases the number of preludes from , so the prelude representation scheme plays a one to more significant role in determining overall compression levels. This paper considers the problem of economically managing the “housekeeping” tasks that must be undertaken by a system that makes use of semistatic minimum-redundancy coding, both with and without blocking. The housekeeping tasks include the process used by the encoder to create its tables, the process used by the decoder to reconstruct those tables, and the mechanism used to communicate a description of those tables from encoder to decoder. The actual coding mechanism assumed throughout this paper is canonical minimum-redundancy coding [2]–[4], long noted for its modest resource requirements and recently improved in other work by the current authors [5]. Canonical coding employs a Huffman-style code [6], but does not rely on an explicit code tree during encoding or decoding. Instead, two small tables are used, each containing a single entry for each different codeword length in a code; together with a mapping from symbol to an index of that source symbol in a list sorted by codeword. Individual codeword lengths are sufficient for these three canonical coding tables to be generated, hence, code construction need not involve a binary tree structure [7]. II. REPRESENTING THE PRELUDE The following information must be conveyed to a canonical decoder either explicitly, using an agreed representation, or implicitly, as a consequence of the design of the compression system: 1) an integer , the length (in terms of symbols to be decoded) of the message; 2) an integer , the maximum symbol identifier that may appear; 3) an integer , the number of distinct symbols that do appear in the message; 4) an integer , the maximum codeword length; , namely, the subset 5) a list of integers, each in of the symbol identifiers that appear in the message—the “subalphabet;” 6) a list of corresponding codeword lengths, each an in. teger in Note that when a canonical code is used, it is not necessary for the specific codeword for each symbol to be communicated, only its length in bits; and that it is assumed that the source alphabet used to compose any particular message is the inte. To ensure that the compressed message is decodgers able, the decoder and encoder calculate the codeword assignment using only the codeword lengths and do not exploit the

0090–6778/00$10.00 © 2000 IEEE

TURPIN AND MOFFAT: HOUSEKEEPING FOR PREFIX CODING

detailed symbol frequency information that led to the assignment of those codeword lengths. (If the symbol frequency information were to be used, extra effort is required to ensure that symbols of different frequency but the same codeword lengths are handled consistently by encoder and decoder.) Representation of the prelude information—which is dominated by items 5) and 6)—has received little prior attention. McIntyre and Wolff [8] describe an efficient mechanism for transmitting the structure of a decoding tree [item 6)], but do not consider the more expensive component of the prelude [item 5)], which in their case is the symbol label that should be attached to each leaf of the tree. Similarly, Moffat et al. [4] describe a prelude for two-pass Huffman coding where the list of symbols . Horspool and Cormack [9] [item 5)] is implicit, being consider the cost of an explicit tree representation, as do Bookstein and Klein [10]. With the exception of these, we are unaware of any other work that describes prelude representations. The lack of careful investigation is a consequence of the fact that for most conventional applications, ad hoc methods are completely adequate. For example, consider the gzip compression program [11], which is a LZ77-style sliding window compressor [12] implementing the Deflate standard [13]. It compresses a message by coding a sequence of offsets back into the window, with the distance back to each match and the length of the match coded using a combination of a canonical code and binary codes. The code for match lengths is also used as a code for literal characters that are not matched in the window. The , canonical code for offset distances is quantized so that . while the canonical code for lengths and literals has A list of codeword lengths is transmitted for each code, with a codeword length of zero indicating that a symbol is unused. Each of the two lists are encoded using a third canonical code, the prelude for which is transmitted as a binary encoded list of codeword lengths (comprising 15 for the codeword lengths 1–15, a codeword for length 0 to signal that a symbol does not appear, and three extra codewords to represent runs of identical codeword lengths). As is restricted to be small for all three canonical codes, the problem of sending a prelude when is large and the list of symbols is sparse does not arise. Using such techniques is larger requires more care, as the number of zero when codeword lengths that must be transmitted to show that a symbol does not appear can become high. In a message containing distinct symbols from the range , there such “no-codeword” flags. When is large and are , it is more economical to transmit the lists as two separate entities, hence, the separation of items 5) and 6) in the prelude description given above. The problem of transmitting a list of integers is well studied, and there are a variety of codes available. Elias [14] described a family of codes including unary, in which integer is transone bits, followed by a zero bit, the code, mitted as is transmitted in unary, followed by in which as a binary code of bits, and the code, in is transmitted as a code, followed by which as a binary code of bits. All of these codes are biased toward small integers and differ only in the extent of that bias and the degree to which they also handle large integers.

623

An alternate code is the Golomb code [15], in which integers are quantized into buckets of size , a parameter of the code. Each codeword is composed of a unary codeword for , which identifies the bucket containing integer , describing and a binary codeword for the position of within that th bucket. For examples of he Elias and Golomb coding methods, the reader is referred to the description given by Witten et al.[16]. Any of these techniques can be applied directly to the subalphabet and codeword length lists of the prelude, since all yield economical representations for small integers, and differences can be taken between successive entries in the sorted subalphabet list. Indeed, the resultant difference list for a dense set of integers contains many unit gaps, each of which requires only 1 bit using the Elias codes. Another method for encoding the subalphabet list that is particularly effective on both sparse and dense lists is interpolative coding [17]. Using this technique, the middle value in a sorted to , is coded as a binary list of integers, ranging from or code using either bits. The method then recursively codes the list of integers less than the middle value and follows that by a recursively generated representation of the integers greater than the middle value. When is relatively large compared to , the entire symbol list might be coded in fewer than bits, and only in pathological situations are more than bits required, irrespective of . The size of the second list in the prelude—the codeword length list—can be reduced by encoding the difference for each codeword length , where is the maximum codeword length. A canonical code obeys the Kraft , so codeword inequality, which requires that times. That is, length cannot appear in a code more than a more economical representation is achieved by encoding the differences rather than the raw lengths. For example, if the and symbol has a codemaximum codeword length is , then a unary code for word length of is transmitted, namely “1110.” Unary coding the difference list is the technique used in the experiments of Moffat et al. [4]. , and —can The first four integers in the prelude— be represented in a few bytes using any convenient code. For example, for all practical purposes, can be safely coded as an 8-bit integer, and (if not obvious from the context) as Elias. coded values, and as a binary integer in the range III. MINIMUM-REDUNDANCY DECODING Now consider the actions that must be carried out by the decoder to interpret the prelude transmission and establish the necessary decoding tables. We assume that a canonical code is being used in the manner described by Moffat and Turpin [5]. The largest table required in the decoder is the decoder mapping, which translates a codeword number in the range into a symbol number in the range . Algorithm MR-DECODE shown in Fig. 1 describes steps to read the prelude and build the required mapping dec map. It makes use of two temporary element arrays: syms used, which records the subset of the alphabet in use in the message,

624

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 48, NO. 4, APRIL 2000

Fig. 1. Algorithm MR-DECODE for encoding

and len, which holds the corresponding codeword lengths of each symbol. With careful space management, the decoder mapping, array dec map, can occupy the same space as the is used to record array syms used. The -element array the number of symbols of each possible codeword length, words, and takes the total memory requirements to words required for the canonical coding in addition to the tables constructed by step 3. For more detail, see Moffat and Turpin [5]. Decoding the prelude (step 1) is straightforward, and astime suming that each symbol number can be decoded in time. The main effort is in permuting the subalrequires phabet list syms used in-place to make the array dec map. Step 5 begins the process by overwriting len with the final position -time, by of each symbol—its codeword number—in counting off codeword numbers from . Step 7b then sorts syms used, using len as the key for syms used . As the keys are dense, len is actually the index for the final position of syms used , so the sort can be effected with a series of swaps. There is a possibility that the loop in step 7b may encounter a symbol that has already been processed before all symbols are done. To restart the loop, entries of len are flagged as done, and a finger start is kept that, when required to do so, moves to locate the next unprocessed entry (step 7c). time. The loop in step 7b is also linear in Step 2 requires , as each of the symbols is processed only once, and start is incremented no more than times. Note that in effect the sorting time, since the objects being sorted process requires only

are all integers in a tightly constrained range. Note also that none of the steps require time that is linear in . Hence, since , the cost of decoding a message composed of multiple blocks is linear in the overall length of the message, even when very small block sizes are used. Once the tables are built, the space occupied by len can be released—canonical decoding requires only two -word tables, the -word array dec map (which is, at the completion of the process, stored in array syms used) and a small amount of space for a codeword-prefix lookup table [5]. IV. ENCODING The remaining task is to construct the encoder tables and arrange for transmission of the prelude in the form required by the decoder. Two item arrays are used, and a third array of items. The first array is len, where len is used first to accumulate the frequency of the th symbol of the alphabet and then later to record the length of the codeword that must be assigned to symbol . The second -element array is enc map, which to ordinal codeword numbers maps symbol numbers in . Only elements of enc map are actually used, but in unless is especially large relative to , for fast lookup it is sensible to make use of an -item sparse array rather than some other -item dictionary structure with its associated searching cost. The -element array syms used stores the set of symbol numbers used in this particular message and describes the set of values for which enc map is defined.

TURPIN AND MOFFAT: HOUSEKEEPING FOR PREFIX CODING

625

Fig. 2. Algorithm MR-ENCODE for Encoding.

Algorithm MR-ENCODE in Fig. 2 outlines the steps required to calculate and write the prelude, construct enc map, and then actually encode the message. Algorithm MR-ENCODE uses a further words for a codeword length count array , so a total words of memory are required. of The first two steps of MR-ENCODE accumulate symbol frequencies in the message and establish syms used as the subalphabet in use for this message. Assuming that array len is pre-initialized to zero, these two steps require time. The array syms used is then sorted (step 3) by frequency using a level of indirection through len. The code generation in step 4 also uses the indirection of syms used. The in-place Huffman code calculation method of Moffat and Katajainen [7] is used for this step and requires time and additional space. Alternatively, if it is necesbits in length sary to ensure that no codeword exceeds , then a length-limiting mechanism for some fixed value such as the Package–Merge algorithm can be employed [18], which can also be implemented to run quickly and in modest amounts of additional space [19]. The sort in step 6 prepares syms used for the coding of the subalphabet “gaps.” The last operation before the actual encoding of the message is the construction of the encoder mapping enc map, using the same loop control as in Algorithm MR-Decode so as to ensure the same allocation of codewords. time. This loop also requires

Sorting items in steps 3 and 6 dominates the time required by the other prelude-construction steps of MR-ENCODE except for the actual cost of accumulating the symbol frequencies, and if a standard so the running time per block is comparison-based sorting mechanism such as Quicksort is used [20]. It should, however, be noted that if a small amount of additional memory space is used, then asymptotically faster sorting methods such as Radix sort can be employed, possible because in both of the two steps that require sorting the objects being manipulated are integers in a constrained range. Once the codes for the block have been transmitted, array len -time is restored to its original pristine state in a further loop. Earlier it was assumed that len was initialized to zero, without any justification being given. In fact, for the first block of a multiblock message, an explicit initialization is required, time. But thereafter, by using syms used to reset taking the values of len that were actually used for each block, the time, a considerable initialization effort can be reduced to saving when (and hence ) is small relative to . Note that it is not necessary to reset values in enc map, as only values that have been correctly assigned will ever be accessed. It is possible to reduce the memory requirements of words by overwriting len with the MR-ENCODE to encoder mapping, rather than using a separate array enc map. The cost of doing this is that the codeword lengths are not available for the actual encoding, and the encoding process

626

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 48, NO. 4, APRIL 2000

COMPRESSION

IN

BITS

PER

TABLE I MESSAGE SYMBOL (b/s) ACHIEVED BY VARIOUS CODING SCHEMES

ON THE

TWO PRELUDE LISTS.

ENTRIES OF – INDICATE VALUES LESS THAN 0.0005 b/s

becomes slower as a result. Finally, note that an to hold the message is, of course, required.

-word buffer

V. RESULTS In order to evaluate the housekeeping algorithms and coding schemes discussed above, several test messages were coded using blocking, and two-pass semistatic canonical coding. The first two messages comprise the words and nonwords parsed from a test file representing about 510 Mbytes of English text from the Wall Street Journal (part of the TREC collection [21]). A textual symbol was assigned an integer according to its order of occurrence in the text; that is, the first distinct word was assigned 1, the second distinct word 2, and so on. Both the WORDS and NON-WORDS messages contain about 86 million integers in total, with the largest integer in WORDS being , and the largest in NON-WORDS . The second two test messages were composed of the distance and length/literal pointers used on the Wall Street Journal text by gzip, a LZ77-style sliding window compressor. The message LZ-DISTANCE contains about 36 million integers drawn from all . The message LZthe integers in the range to LENGTH contains about 57 million integers, 330 of which are to . distinct in the range Table I shows, for each of the four test messages, the number of bits per message symbol used to compress the subalphabet gap list and the codeword length difference list. The Golomb code used on the subalphabet gap list was recomputed for every

block, and the parameter stored -encoded as part of the prelude for that block. The Huffman code used for the codeword difference list was rebuilt for every prelude, so it provides a bound on the compression achievable by prefix code on this list. The meta-prelude for this second Huffman code was represented as a list of 33 5-bit binary codewords, the cost of which is included in the figures of Table I. Interpolative coding is consistently better than both coding and Golomb coding for representing the subalphabet list and, as intimated above, copes well both with sparse sets of symbols and dense sets of symbols. For the codeword length list, unary coding is superior to when the alphabet is large. The values in Table I also confirm that prelude representation is not an issue for two-pass semistatic coding of large messages, as the encoded message dominates the total size, but that prelude representation is important for small messages and messages processed as many small blocks. To test the housekeeping algorithms of Figs. 1 and 2, further experiments were carried out with an implementation that used unary coding for the codeword length lists and interpolative coding for the subalphabet list. The actual coding engine was the semistatic length-limited minimum-redundancy coder described by Moffat and Turpin [5], with a lookup table containing 256 entries for decoding. All experiments were conducted on a 266-MHz Pentium II with 256-Mbyte RAM and 512-kbyte cache running Debian GNU/Linux, with the C source code compiled using gcc version 2.7.2.3 with maximum opti. Times are averaged over the last four of mization level five runs, with the initial run discarded to remove any startup

TURPIN AND MOFFAT: HOUSEKEEPING FOR PREFIX CODING

627

TABLE II COMPRESSION EFFECTIVENESS IN BITS PER MESSAGE SYMBOL (b/s), AND THROUGHPUT ACHIEVED IN MILLIONS OF MESSAGE SYMBOLS PER CPU MINUTE. ENTRIES OF – INDICATE VALUES LESS THAN 0.0005 b/s; ENTRIES OF * INDICATE MEASUREMENTS THAT IT IS INAPPROPRIATE TO REPORT

effects. The standard deviations of the execution times were insignificant. Table II shows the performance of the pseudoadaptive block coder on the test messages for a range of block sizes. The third column shows the cost of the prelude for each given block size, measured in bits per symbol, and averaged over the entire test message. The prelude is expensive for small blocks, but, as can be seen from the fourth column, much of the expense is recouped by having a smaller active alphabet and shorter codewords, and the overall compression loss is small. Small blocks do, however, suffer from increased encoding -time sorting operations that are pertime, as the two formed on a per-block basis in our implementation mean that the encoding cost is not linear in the length of the message. On the other hand, decoding throughput—arguably more important that encoding throughput—is linear-time and is less sensitive to block size. The “two pass” row of the table shows the compression achieved when the entire data file is treated as a single block and shows that there is little to be gained by not breaking the input message into manageable blocks. Indeed, compression levels using blocking can be better than that of full two-pass semistatic coding, as each block is coded using a code tailored for that particular block and hence capable of exploiting any local frequency variations. For very small blocks, this reduction in message size is quite marked, but as was noted above, is offset by a corresponding increase in prelude cost.

For the LZ-LENGTH and LZ-DISTANCE messages, the compression achieved by gzip using the best compression level when applied to the original Wall Street Journal text is shown, split into the two actual symbol streams. Throughput figures are not shown as it is difficult to isolate the coding time of gzip from the modeling components. The block size of gzip is controlled by the input—64 kbytes of text—so the number of integer symbols coded varies from block to block. For the Wall Street Journal, an average of about 20 000 distance pointers and 24 000 length/literal pointers were used for each block. As expected, using the blocked coder reduces the number of bits required for the message, as an exact Huffman code is used for the entire alphabets, rather than a combination of Huffman and binary codes. On the other hand, using a small block size increases the prelude size enough to give an overall compression level higher than gzip, and it is clear that the hybrid Huffman/binary code used in gzip is well engineered for situations in which symbol probabilities are correlated. (For example, a distance offset of 1000 will have a probability much closer to the probability of a distance of 999 than it will to a distance of 9.) VI. SUMMARY The full range of housekeeping operations necessary to encode a source message using semistatic minimum-redundancy coding has been described. The mechanisms proposed are particularly relevant when pseudoadaptive coding is to be achieved

628

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 48, NO. 4, APRIL 2000

by breaking the source message into a sequence of smaller fixedlength blocks, since the decoding time is independent of the block size and linear in the length of the overall message. The space requirements of the encoding and decoding processes are also modest. ACKNOWLEDGMENT The authors would like to thank the anonymous referees for their helpful comments and J. Larsson (Lund University, Sweden) for providing useful assistance. REFERENCES [1] J.-L. Gailly, Frequently Asked Questions (FAQ) for comp.compression, Jan. 1998. Periodically posted to the Usenet newsgroup comp.compression. [2] E. S. Schwartz and B. Kallick, “Generating a canonical prefix encoding,” Commun. ACM, vol. 7, no. 3, pp. 166–169, Mar. 1964. [3] J. B. Connell, “A Huffman-Shannon-Fano code,” Proc. IEEE, vol. 61, pp. 1046–1047, July 1973. [4] A. Moffat, N. Sharman, I. H. Witten, and T. C. Bell, “An empirical evaluation of coding methods for multi-symbol alphabets,” Inform. Processing Manage., vol. 30, no. 6, pp. 791–804, Nov. 1994. [5] A. Moffat and A. Turpin, “On the implementation of minimum-redundancy prefix codes,” IEEE Trans. Commun., vol. 45, pp. 1200–1207, Oct. 1997. [6] D. A. Huffman, “A method for the construction of minimum-redundancy codes,” Proc. IEEE, vol. 40, pp. 1098–1101, Sept. 1952. [7] A. Moffat and J. Katajainen, “In-place calculation of minimum-redundancy codes,” in Proc. Workshop on Algorithms and Data Structures, Aug. 1995, http://www.cs.mu.oz.au/~alistair/inplace.c., pp. 393–402. [8] D. R. McIntyre and F. G. Wolff, “An efficient implementation of Huffman decode tables,” Congressus Numerantium, vol. 91, pp. 79–92, 1992. [9] R. N. Horspool and G. V. Cormack, “Technical correspondence on' Data compression using static Huffman code-decode tables',” Commun. ACM, vol. 29, pp. 150–152, 1986. [10] A. Bookstein and S. T. Klein, “Is Huffman coding dead,” Computing, vol. 50, no. 4, pp. 279–296, 1993. [11] J. L. Gailly. (1993). Gzip Program and Documentation [Online]. Available: ftp://prep.ai.mit.edu/pub/gnu/gzip-*.tar [12] J. Ziv and A. Lempel, “A universal algorithm for sequential data compression,” IEEE Trans. Inform. Theory, vol. IT-23, pp. 337–343, May 1977. [13] L. P. Deutsch. (1996, May). DEFLATE Compressed Data Format Specification Version 1.3 RFC 1951 [Online]. Available: http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1951.txt and listed at http:// www.rfc-editor.org/rfc.html

[14] P. Elias, “Universal codeword sets and representations of the integers,” IEEE Trans. Inform. Theory, vol. IT-21, pp. 194–203, Mar. 1975. [15] S. W. Golomb, “Run-length encodings,” IEEE Trans. Inform. Theory, vol. IT-12, pp. 399–401, July 1966. [16] I. H. Witten, A. Moffat, and T. C. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd ed. San Francisco, CA: Morgan Kaufmann, 1999. [17] A. Moffat and L. Stuiver, “Exploiting clustering in inverted file compression,” in Proc. IEEE 1996 Data Compression Conf., J. A. Storer and M. Cohn, Eds., Los Alamitos, CA, Apr. 1996, pp. 82–91. [18] L. L. Larmore and D. S. Hirschberg, “A fast algorithm for optimal length-limited Huffman codes,” J. ACM, vol. 37, no. 3, pp. 464–473, July 1990. [19] J. Katajainen, A. Moffat, and A. Turpin, “A fast and space-economical algorithm for length-limited coding,” in Proc. Int. Symp. Algorithms Computation, Cairns, Australia, Dec. 1995, pp. 12–21. [20] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms. Cambridge, MA: MIT Press, 1990. [21] D. K. Harman, “Overview of the second text retrieval conference (TREC-2),” Inform. Processing Manage., vol. 31, no. 3, pp. 271–289, May 1995.

Andrew Turpin received the Ph.D. degree from the University of Melbourne, Melbourne, Australia, in 1998. He is currently working as a Postdoctoral Fellow at Discoveries In Sight, Devers Eye Institute, Portland, OR. His doctoral dissertation, titled Efficient Prefix Coding, describes improved mechanisms for calculating and employing minimum-redundancy codes and length-limited minimum-redundancy codes.

Alistair Moffat received the Ph.D. degree from the University of Canterbury, Canterbury, New Zealand, in 1986. Since 1986, he has been a member of the academic staff at the University of Melbourne, Melbourne, Australia, and is currently an Associate Professor. His research interests include text and image compression, techniques for indexing and accessing large text databases, and algorithms for sorting and searching. He is a co-author of the book Managing Gigabytes: Compressing and Indexing Documents and Images, (San Francisco, CA: Morgan Kaufmann, 1999, 2nd ed.) and has written more than 90 refereed papers. Dr. Moffat is a member of the ACM and the IEEE Computer Society.