Architectural tradeoffs for survivor sequence ... - Semantic Scholar

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 41, NO 3. MARCH 1993

425

Transaction Letters Architectural Tradeoffs for Survivor Sequence Memory Management in Viterbi Decoders Gennady Feygin and P . G . Gulak

Abstract-In a Viterbi decoder, there are two known memory organization techniques for the storage of survivor sequences from which the decoded information sequence is retrieved, namely register exchange method and traceback method. This paper extends previously known traceback approaches describes two new traceback algorithms, and compares various traceback methods with each other. Memory size, latency and implementational complexity of the survivor sequence management are analyzed for both uniprocessor and multiprocessor realizations of Viterbi decoders. A new one-pointer traceback method is shown to be better than other known traceback methods.

two brand new implementations, one of which we call a OnePointer traceback algorithm and the other a Hybrid traceback algorithm. Four alternative implementations are compared with each other and with the register exchange method in Section IV-A. In Section IV we also concern ourselves with some practical details of a traceback decoder, such as trading off the latency against the number of memory modules into which the traceback memory must be subdivided. Finally, Section V summarizes the main results of this paper.

'

I. INTRODUCTION

11. TRACEBACK METHOD

HERE are two known methods for the storage of survivor sequences from which the decoded sequence is retrieved. The register exchange (RE) method is the simplest conceptually and a commonly used technique [1]-[3]. Because of the large power consumption and large area required in VLSI implementations of the RE method, the traceback (TB) method is the preferred method in the design of large constraint length, high performance Viterbi decoders. Recently, Collins et al. [4] have described the details of a proposed TB implementation in a large multiprocessor Viterbi decoder. A n actual implementation of TB was reported in [5]. In this paper we describe two TB methods which are generalizations of the approaches introduced in [SI and [4]. We also introduce a third alternative implementation of TB, which offers certain advantages over the other two TB approaches. Finally, as the fourth alternative we introduce a hybrid technique that combines the techniques of the previous three. The relative merits of the various approaches to TB are compared and contrasted with the RE method. This paper is organized as follows. In Section I1 we give a general description of a traceback method that relies on backpropagation of pointers through the memory containing the path information. Section 111-A-D describes four alternative implementations of a traceback method, with extensions of two previously known implementations of traceback, as well as

We will omit the details of the register exchange method, which is widely used and is straightforward in concept. Details of the RE method can be found in [7], while a general introduction to the Viterbi algorithm is available in [8]. The traceback method stores path information in the form of an array of recursive pointers, and was originally proposed in [3]. Unfortunately, a direct implementation of the traceback method as described in [3] requires further thought, since it treats memory as infinite in size, while any actual implementation contains only finite memory resources (with a limited number of address and data ports). Furthermore, [3] seems to assume that the ACS decisions have been written into the memory before traceback commences; thus the challenge (and opportunity) of various approaches to simultaneously updating and reading the memory has not been addressed. It is advantageous to think of traceback memory as organized in a two-dimensional structure, with rows and columns. The number of rows is equal to the number of states N = 2". Each column stores the results of N comparisons corresponding to one symbol interval or one stage in a trellis diagram. Since the stream of symbols is, in general, semi-infinite, storage locations are periodically reused. There are three types of operations performed inside a TB decoder [4]. Traceback Read (TB)-This is one of the two read operations and consists of reading a bit and interpreting this bit in conjunction with the present state number as a pointer that indicates the previous state number (Le., state number of the predecessor). Pointer values from this operation are not output as decoded values, instead they are used to ensure that all paths have converged with some high probability, so that actual decoding may take place. The traceback operation is usually

T

Paper approved by the Editor for Communication Theory of the IEEE Communications Society. Manuscript received October 15, 1990; revised May 20, 1991 and September 26, 1991. This work was supported by the Natural Sciences and Engineering Research Council of Canada, and by the Information Technology Research Center of Ontario. This paper was presented in part at the Third IBM Workshop on ECC, San Jose, 1989, and at the International Conference on Circuits and Systems, Singapore, 1991. The authors are with the Department of Electrical Engineering, University of Toronto, Toronto, Ont. M5S 1A4, Canada. IEEE Log Number 9208032.

'

'Of the two versions of the Hybrid traceback algorithm (even and odd), one (even) has been independently discovered by [ 6 ] .

0090-6778/93$03.00 0 1993 IEEE

426

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 41, NO. 3, MARCH 1993

run to a predetermined depth T before being used to initiate the decode read operation (described next). Decode Read (DC)-This operation proceeds in exactly the same fashion as the traceback operation, but operates on older data, with the state number of the first decode read in a memory bank being determined by the previously completed traceback. Pointer values from this operation are the decoded values and are sent to the bit-order reversing circuit. A decode read can serve as a dual decode and traceback read, this allows us to decode read multiple columns using one traceback read operation of T columns. Writing New Data (WR)-The decisions made by the ACS are written into locations corresponding to the states. The write pointer advances forward as ACS operations move from one stage to the next in the trellis, and data are written to locations just freed by the decode read operation. For every set of column write operations (N bits wide), an average of one decode read must be performed. The overhead of T-column traceback read can be spread over one or more column decode read operations, resulting in k read operations ( k > 1);k includes both decode read operations and traceback read operations.

Fig. 1. Survivor sequence update in 3-pointer even method.

&

bit order reversal.2 Each stack must be in depth. During the decoding of one memory bank, decoded bits are pushed ALGORITHMS 111. FOURTRACEBACK on one stack, while the bits stored on the other stack are popped. Upon completion of the decoding of a given memory bank, stacks switch from pushing to popping and vice versa. In A. The k-Pointer Even Algorithm addition to bit order reversal, a two-stack structure equalizes The k-pointer even algorithm is a generalization of the the latencies of all decoded bits. The overall latency of the implementation of the traceback algorithm implemented in [5]. &Pointer Even algorithm, including the two-stack structure, This generalization was also independently discovered by [6]. is 21czT. kz-1 Fig. 1 illustrates how, for a particular value of k = 3, read and write operations proceed in parallel in the memory, which is B. The k-Pointer Odd Algorithm divided into banks. The memory is divided into 2k2 memory The k-pointer odd algorithm is a generalization of the banks, each of size columns, as shown in Fig. 1. Each read pointer is used to perform the traceback operation in k2 - 1 implementation of the traceback algorithm proposed in [4]. memory banks, and the Decode read in one memory bank. Fig. 2 is an illustration of operation of a 3-pointer odd alEvery T stages, a new traceback front is started from the fixed gorithm. Altogether, there are 2k2 - l memory banks, each for a total length of -. A two-stack state such as all zeros (or the state with the best path metric of length LIFO structure is also required to perform bit order reversal. if best state decoding [9] is used), and a new decode front is started at a location determined by the traceback pointer Latency of the k-pointer odd algorithms (including the twoAs indicated in Fig. 2, the decode stack structure) is derived in the previous stage. Since the traceback depth T must be achieved before pointer and the write pointer always point to the same column decoding can be performed, the number of columns in IC2 - 1 in the memory, although the decode pointer will be used to memory banks must be greater than or equal to T. This permits read only one memory location, while the write pointer will be us to compute the total amount of memory and latency of the used to sequentially update memory locations corresponding to k-pointer even algorithm. Each memory bank is columns all states in a given trellis stage (i.e., all 2" bits). It is necessary long, and there are a total of 2k2 memory banks, for a total of to perform decoding before new data can be written, otherwise columns. The latency of the k-pointer even algorithm is the contents of the memory being used to generate the decoded the time delay between the writing of the particular column and information bit may be overwritten before it is read.

&

A,

s..

&

5

the time that column is subjected to decode read. Depending on the position of the column inside the memory bank, the for the first column in a memory delay can vary from

3

bank, to for the last column in a memory bank. The decoded bits are generated in a reverse order, thus a scheme is required for reversing the ordering of the decoded bits. A simple two-stack (LIFO) structure is sufficient to perform the

1

C. The One-Pointer Algorithm The one-pointer algorithm [lo], [11] differs significantly from the k-pointer odd and even algorithms. Instead of utiliz'One of the reviewers of this paper has pointed out that a single memory would suffice if read and write operations are interleaved, or if a dual-port memory is employed.


BANKO

BANK2

BANK1

BANK4

BANK3

I

idle

421

TIME

B.4NXO

BAUXl

B.ANK2

BANK3

IIME

J

4-j; ..............>

r1q-q

. . ,............................ q m N-l

,..............

Fig. 3.

Survivor sequence update in one-pointer method.

I ... ..PPath of the iracebsdr p ~ m i e r(tracebockstate) Wrillcn add-comparcselact decisions for N s t a t e s ( 0 lhrough N-1)

Fig. 2 .

Survivor sequence update in 3-pointer odd method

ing k read pointers to perform the required k reads for every column write operation, we chose to use a single read pointer, but accelerate read operations, so that every time the write counter advances by one column, k column reads occur. This acceleration of read operations is made possible by the fact that among the three operations, writing new data, traceback read and decode read, writing new data is by far the most time consuming. This observation is particularly important as 2" bits are written every stage, as opposed to only k bits being read. The one-pointer algorithm (with kl = 3) is illustrated in columns long, Fig. 3. Only kl 1 memory banks, each are required, for a total of columns. A single read pointer produces the decoded bits in bursts. While reading k l memory banks, no decoded data is available from the first k l - 1 memory banks. During the decode read operation in the kith memory bank, decoded bits are generated at a rate of k l per stage. Fortunately, the two-stack structure discussed above can perform both bit order reversal and burst elimination at the same time. The latency of the one-pointer algorithm, including two-stack structure, is 1-1

+

e

&

F.

D. The Hybrid Algorithm A hybrid approach to TB, which combines some features of the k-pointer algorithm (either even or odd) and a onepointer algorithm is also possible where k column reads per stage are performed using k2 read pointers, each advancing at a rate of kl columns per stage (k1, k.2 are integers such that, k = klk2, and as before k 5 T + 1). The general expression for the number of memory banks required, the total memory size and the latency are computed as fol-

lows: each memory bank is 1 k 2 - 1 columns long, and the number of memory banks is k2(kl 1) for the hybrid of k-pointer even algorithm and one-pointer algorithm (k2(kl 1) - 1) for the hybrid of &Pointer Odd algorithm and onepointer algorithm. The latency is for odd hybrid as well as for even hybrid. The even hybrid approach with kl = = 2 is illustrated in Fig. 4. When either k l or kp is set to one, the expressions above agree with the expressions for the one-pointer or k-pointer algorithms, respectively.

+

=

+

IV. DISCUSSION

A. Comparative Advantages The even hybrid approach, though conceptually interesting, offers no significant advantages over the one-pointer traceback algorithm. Latency, total memory requirements and the num-

428


TABLE I BANDWIDTH REQUIREMENTS OF SURVIVOR SEQUENCE MANAGEMENT USINGREGISTER EXCHANGE AND TRACEBACK Method

Register Exchange

Traceback

Write Bandwidth

T x 2y

2“

Read Bandwidth

T x 2y

k

Total

T x 2”+’

2”

+ I; x 2 y

TABLE I1 MEMORY REQUIREMENTS AND LATENCY OF SURVIVOR SEQUENCE MANAGEMENT USINGREGISTER EXCHANGE AND TRACEBACK Traceback Method

Reg. Exch,

Memory size Type of cell required

T x 2’

Latency

T

Dual-port

+ mux + wiring

One-P

k - P Even

e T 2 ”

a T 2 ”

Even Hybrid

W kz-1 T 2 ” m T 2 ’

DRAM without refresh for majority of applications

f i T

ber of memory banks for a hybrid approach with IClIc2 = k will be higher than for the one-pointer approach with IC column reads per column write ratio. Furthermore, read pointer controls in the case of hybrid approach will be more complex than in the case of one-pointer approach, since circuitry must keep track of k2 pointers advancing at a rate of k2 column read operations per column write operation. The k-pointer odd algorithm offers no significant advantages over either IC-pointer even or one-pointer algorithms. In the ICpointer odd method, bidirectional column counters are required for both write and read pointers, since the direction of the memory bank accesses alternates, as illustrated in Fig. 2. Furthermore, decode read and write operations share the same set of memory locations. It is necessary to “stall” write operations to allow the decode read operation to complete before the information bits in the set get overwritten. Thus, writing cannot proceed at a uniform pace, and the column access time must be divided into 2” 1 intervals; the design of a counter for this is difficult. The odd hybrid combines the worst features of even hybrid and IC-pointer odd methods and is of no practical significance. The k-pointer even algorithm is significantly better. The counter design is very simple, with each read counter and the write counter advancing by exactly one column every stage. The one-pointer algorithm is the best amongst the known TB methods. It requires approximately half as much memory as either of the k-pointer even or odd algorithms. The latency is similarly reduced by a factor of two. In addition, the number of memory modules required is also half as large as that required by the IC-pointer algorithms. The only disadvantage of the one-pointer algorithm is the need to provide separate column counters for the write operations and for the read operations, since the read counter advances by k columns for every one column advance of the write pointer. If IC is selected to be a power of two, say 2‘, then the read counter can be implemented simply by using the b most significant bits of the

+

k - P-Odd

2k zk- 12 T

2!%T kz-1

m T 2 ”

write row counter as b least significant bits of the read column counter. Unlike the k-pointer even algorithm, with only a single read operation per pointer per stage, the one-pointer algorithm requires k read operations per pointer per stage. If ACS decisions for a particular stage are computed and written into memory 2” decisions at a time, k should not exceed 2u-x if we are to avoid introducing a higher-speed clock for read pointer control. Since IC is usually a small integer, the limit on k will rarely be exceeded in a practical design. The characteristics of various traceback approaches and the register exchange approach are summarized in Tables I and 11. In order to significantly boost the throughput of Viterbi decoders, researchers are increasingly turning to use of multiple processing units. Although full details cannot be given in this short paper, we must note that the advantages of the TB method over the RE method become even more pronounced [ll]when survivor sequence memory is distributed. Selection of the proper value of IC allows one to trade off latency against the number of memory banks (Le., complexity of controls). In our discussions we have assumed that k is an integer. It is possible to modify any one of the TB methods to work with rational IC, in other words k =,: where m, n are integers. This is of limited value, since memory becomes fragmented into a large number of memory banks and control becomes unnecessarily complex.

v.

SUMMARY AND CONCLUSIONS

This paper has presented several new algorithms for traceback memory management in Viterbi decoders. For a highspeed, large constraint length VD the traceback algorithm is advantageous as compared to the register exchange method. The TB method is superior because of the lower bandwidth requirements (lower power dissipation). The TB method is particularly suitable for use in a multiprocessor implementation of the VD with memory distributed among the processors; in a multiprocessor VD the TB method requires significantly less


area [111 ~. than the RE method. We have demonstrated a novel one-pointer implementation of the traceback decoder, which is better than previously known implementations, with lower latency, lower total amount of memory required and fewer memory banks, resulting in simpler control circuitry. L~~~~~~ in this approach can be made as low as T 2 information bits (when IC1 is set to its maximum value of T + 1).

+

ACKNOWLEDGMENT The authors are grateful to the anonymous referees for their helpful comments.

REFERENCES [ l ] R. M. Orndorf et al., Mterbi Decoder VLSI Integrated Circuit for Bit Error Correction. Anaheim, CA: Rockwell International, Dec. 1981. [2] C.B. Shung et al., “Implementation issues for the design of a rate 8/10 trellis code for partial response channels,” in Third IBM Workshop ECC, San Jose, CA, Sept. 1989.

429

131 C. M. Rader, “Memory management in a Viterbi algorithm,”IEEE Trans. ..

Commun., vol. 29, pp. 1399-1401, Sept. 1981. [4] 0. Collins and F. Pollara, “Memory management in traceback Viterbi decoders,” TDA Prog. Rep. 42-99, Jet Prop. Lab., Pasadena, CA, Nov. 1989. [SI H.A. Bustamante, et al., “Stanford telecom VLSI design of a convolutional decoder,” in IEEE Conf Military Commun.,vol. 1, pp. 171-178, Boston, MA, Oct. 1989. [6] R. Cypher and C.B. Shung, “Generalized traceback techniques for survivor memory management in the Viterbi algorithm,” in GLOBECOM, Dec. 1990, pp. 1318-1322. [7] G. C. Clark and J. B. Cain, Error Correction Coding for Digital Communications. New York: Plenum, 1981, p. 262. [8] G.D. Fomey, Jr., “The Viterbi algorithm,” Proc. IEEE, vol. 61, pp. 268-278, Mar. 1973. [9] R.J. McEliece and I.M. Onyszchuk, “Truncation effects in Viterbi decoding,” in IEEE Conf Military Commun., vol. 1, Boston, MA, Oct. 1989, pp. 541-545. [lo] G. Feygin, P.G. Gulak, and F. Pollara, “Survivor sequence memory management in Viterbi decoders,” in Third IBM Workshop ECC, San Jose, CA, Sept. 1989. [ l l ] G. Feygin and P.G. Gulak, “Survivor sequence memory management in Viterbi decoders,” CSRI Tech. Rep 262, Univ. of Toronto, Jan. 1991.