Shuffled Iterative Decoding - IEEE Xplore

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 53, NO. 2, FEBRUARY 2005

209

Transactions Letters________________________________________________________________ Shuffled Iterative Decoding Juntan Zhang and Marc P. C. Fossorier, Senior Member, IEEE

Abstract—Shuffled versions of iterative decoding of low-density parity-check codes and turbo codes are presented. The proposed schemes have about the same computational complexity as the standard versions, and converge faster. Simulations show that the new schedules offer better performance/complexity tradeoffs, especially when the maximum number of iterations has to remain small. Index Terms—Belief propagation (BP), iterative decoding, lowdensity parity-check (LDPC) codes, scheduling, turbo codes.

two different angles, and eventually, achieve similar gains. The vertical scheduling proposed in this letter is an algorithmic approach intended to speed up BP decoding at no cost in complexity. For very-large-scale integration (VLSI) considerations, groups are introduced to preserve some parallel advantages of BP decoding. The horizontal scheduling of [6] and [7] is a hardware approach intended to serialize the totally parallel BP decoding. In the serialization, new updates become available at the same iteration, and speed up is also achieved by using them.

I. INTRODUCTION

II. ITERATIVE DECODING OF LDPC CODES

TERATIVE decoding based on belief propagation (BP) [1] has received significant attention recently, mostly due to its near-Shannon-limit error performance for the decoding of lowdensity parity-check (LDPC) codes [2] and turbo codes [3]. Like the maximum a posteriori (MAP) probability decoding scheme [4], it is a symbol-by-symbol soft-in/soft-out decoding algorithm. It processes the received symbols recursively to improve the reliability of each symbol based on constraints that specify the code. In the first iteration, the decoder only uses the channel output as input, and generates soft output for each symbol. Subsequently, the output reliability measures of the decoded symbols at the end of each decoding iteration are used as inputs for the next iteration. The decoding-iteration process continues until a certain stopping condition is satisfied. Then hard decisions are made, based on the output reliability measures of decoded symbols from the last decoding iteration. The aim of this letter is to develop shuffled versions of the standard iterative decoding algorithms for both LDPC and turbo codes. A similar approach for low-latency decoding of turbo product codes was proposed in [5]. In [6] and [7], a horizontal partitioning of the parity-check matrix was proposed to serialize the decoding of LDPC codes, and in the process, speed-up of the convergence was achieved. In this letter, we consider a vertical partitioning of the parity-check matrix to speed up the decoding. The two approaches are well introduced in [8]. It is interesting to note that the vertical and horizontal schedulings come from

LDPC code of length and A regular binary has a parity-check matrix , with dimension ones in each column and ones in each row. We denote the set of bits that participate in check by , and the set of checks in which bit participates as . Assume a codeword is transmitted over an additive white Gaussian noise (AWGN) channel with zero mean and variance using binary phasebe shift keying (BPSK) signaling, and let the corresponding received sequence.

Paper approved by P. Hoeher, the Editor for Coding and Communication Theory of the IEEE Communications Society. Manuscript received November 3, 2003; revised May 7, 2004; August 20, 2004; and September 21, 2004. This work was supported by the National Science Foundation under Grant CCR-00-98029 and Grant CCF-04-30576. This paper was presented in part at the 36th Annual Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, November 2002. The authors are with the Department of Electrical Engineering, University of Hawaii at Manoa, Honolulu, HI 96822 USA (e-mail: [email protected]. hawaii.edu). Digital Object Identifier 10.1109/TCOMM.2004.841982

(2)

I

A. Standard BP for Iterative Decoding of LDPC Codes be the log-likelihood ratio (LLR) of bit Based on [1], let and initially set . Let and be the to bit node LLRs of bit which is sent from check node , and sent from the bit node to check node , respectively. Let denote the a posteriori LLR of bit . The standard BP algorithm [1] is carried out as follows. Initialization: Set , maximum number of iterations . For each , , set . to Step 1: a) Horizontal Step, for and each , process (1)

b) Vertical Step, for process

0090-6778/$20.00 © 2005 IEEE

and each

, (3) (4)

210


Step 2: Hard decision and stopping criterion test such that if , a) Create if . and b) If or the maximum iteration number is reached, stop the decoding iteration and go and go to Step 1. to Step 3. Otherwise, set Step 3: Output

, process jointly the following two Step 1) For steps. a) Horizontal Step, for and each , process

as the decoded codeword.

B. Shuffled BP for Iterative Decoding of LDPC Codes

(6)

At the th iteration of the standard BP algorithm, first all values of the check-to-bit messages are updated by using the th values of the bit-to-check messages obtained at the iteration, i.e., each is updated by using . Then, all values of the bit-to-check messages are updated by using the values of the check-to-bit messages newly is updated from obtained at the th iteration, i.e., each . In general, for both the check-to-bit messages and bit-to-check messages, the more independent information is used to update the messages, the more reliable they become. Iteration of the standard two-step implementation of the BP algorithm uses all values computed at the previous iteration in (1). However, certain values could already be computed in (3) based on a partial computation of the values obtained from (2), and then be used instead of in (1) to compute the remaining values . This suggests a shuffling of the horizontal and vertical steps of the standard BP decoding. Hence, we refer to this new version as shuffled BP decoding. Note that the updating procedure in shuffled BP is bit-based. In the shuffled BP algorithm, the initialization, stopping criterion test, and output steps remain the same as in the standard BP algorithm. The only difference between the two algorithms lies in the updating procedure. Step 1 of the shuffled BP algorithm and each , process is modified as: For the horizontal step and vertical step jointly, with (1) modified as

(5) We observe, however, that while one iteration of the standard BP algorithm can be fully processed in parallel, that of the shuffled BP algorithm becomes totally serial. To decrease decoding delay and preserve the parallelism advantages of the standard BP algorithm, a parallel shuffled decoding scheme named “group shuffled BP” is developed next. In the group shuffled BP algorithm, the code length is divided into a number of groups. In each group, the updating of messages is processed in parallel, but the processing of groups remains sequential. bits of a codeword are divided More precisely, assume the into groups, and each group contains bits for simplicity). Step 1 of the group (assuming shuffled BP algorithm is carried out as follows.

(7) b) Vertical Step, for , process each

and

(8) (9) For , the group shuffled BP becomes the standard BP, is the previously while the group shuffled BP with proposed shuffled BP.1 As an example, consider the code with parity-check matrix (10) The decoding processes for one iteration of the group shuffled BP is illustrated in Fig. 1 with 1 (standard BP), 2, and 6 (original shuffled BP). The shuffled BP algorithm for the decoding of LDPC codes keeps the computational advantages of the forward–backward implementations of the standard iteratively decoded BP, and requires the same computational complexity [10].2 Furthermore, when the Tanner graph of the LDPC code is acyclic and connected, the proposed method is optimal in the sense of MAP decoding and converges faster (or at least, no more slowly) than the standard BP algorithm [10] (proofs follow from the fact that shuffled BP is simply a new scheduling on the same graph). It is also straightforward to generalize the shuffled approach to various suboptimum versions of BP decoding. C. Simulation Results Fig. 2 depicts the word-error rate (WER) of iterative decoding of a (8000,4000) (3,6) LDPC code, with the group shuffled BP 1 (standard BP), 2, 8, 100, and 8000 (origalgorithm, for and a maxinal shuffled BP). We observe that at the WER imum of 20 iterations, the original shuffled BP algorithm performs about 0.2 dB better than standard BP algorithm, and the larger the value of , the better the error performance. How, no significant difference from is ever, for 1A scheme similar to this totally sequential approach was proposed simultaneously in [11]. 2An increase in complexity due to more control logic may result.


211

Fig. 2. Error performance for iterative decoding of the (8000,4000) (3,6) LDPC code with group shuffled BP algorithm, for G = 1, 2, 8, 100, and 8000, and at most 20 iterations.

Fig. 1. Group shuffled BP with G = 1; 2; 6 for decoding a code with N = 6. (a) G = 1 (standard BP). (b) G = 2. (c) G = 6 (original shuffled BP).

observed. Fig. 3 depicts the corresponding average number of iterations. We observe that the average number of iterations of the original shuffled BP algorithm is about half that of the standard BP algorithm, and the same type of differences as in Fig. 2 with respect to the values of . Both standard and shuffled BP decoding achieve the same error performance with a maximum of 2000 iterations, which indicates that the speedup is not achieved at the expense of a poorer achievable error performance. Similar gains were also achieved with suboptimum versions of BP decoding and the shuffled decoding approach. III. ITERATIVE DECODING OF TURBO CODE A turbo code [3] encoder comprises the concatenation of two (or more) convolutional encoders, and its decoder consists of two (or more) soft-in/soft-out convolutional decoders which feed reliability information back and forth to each other. For simplicity, we consider a turbo code that consists of two

Fig. 3. Average number of iterations for iterative decoding the (8000,4000) (3,6) LDPC code with group shuffled BP for G = 1, 2, 8, 100, and 8000, and at most 20 iterations.

ratesystematic convolutional codes with encoders in feedback form. Let be an information be the correblock of length , and sponding coded sequence, where for is the output code block at time . Suppose BPSK transmission over an AWGN channel, with and all taking values in for and . Let be the received seis the received block quence, where at time . Let denote the estimate of . Let denote the encoder state at time . Following [4], define , , , where , , , and be the corresponding and let . values computed in component decoder , with Let denote the extrinsic value of the estimated infordelivered by component decoder at the th mation bit iteration.

212


A. Standard Serial and Parallel Turbo Decoding The decoding approach proposed in [3] operates in serial mode, i.e., the component decoders take turns generating the extrinsic values of the estimated information symbols, and each component decoder uses the extrinsic messages delivered by the last component decoder as the a priori values of the information symbols. The disadvantage of this scheme is high decoding delay. In the parallel turbo decoding algorithm [9], all component decoders operate in parallel at any given time. After each iteration, each component decoder delivers extrinsic messages to other decoder(s) which use these messages as a priori values at the next iteration. B. Shuffled Turbo Decoding Although the parallel turbo decoding overcomes the drawback of high decoding delay of serial decoding, the extrinsic messages are not taken advantage of as soon as they are available, because the extrinsic messages are delivered to component decoders only after each iteration is completed. The aim of the shuffled turbo decoding is to use the more reliable exbe the trinsic messages at each time. Let permuted sequence by the interleaver corresponding to the orig, according to inal information sequence , for . We assume that the mapping . There is a unique corresponding reverse mapping , for and . In shuffled turbo decoding, the two component decoders operate simultaneously as in the parallel turbo decoding scheme, but the scheme of updating and delivering messages is different. We further assume that the two component decoders deliver , where extrinsic messages synchronously, i.e., and denote the times at which decoders 1 and 2 deliver the extrinsic values of the th estimated symbol of the original information sequence , and of the interleaved sequence , respectively. The shuffled turbo decoding scheme processes the backward recursion followed by the forward recursion. Let us first consider the forward recursion stage at the th itera, the values of tion of component decoder 1. After time should be updated, and the values of are needed. , There are two possible cases. The first case is which means the extrinsic value of the information bit is not available yet. Then the values , which are stored in the backward-recursion stage of the current iteration, are used to update the values and . The second case is , which means the extrinsic value of the information bit has already been delivered by decoder 2. Then this newly available is used to com(then stored), , and . pute the values The backward recursion in decoder 1, as well as both recursions in decoder 2, are realized based on the same principle. iterations, the shuffled turbo decoding algorithm outAfter as the decoded codeword, where puts , which is different from that in the standard serial turbo decoding [3]. The decoding processes of the standard serial, parallel, and shuffled turbo decoding are illustrated in Fig. 4.

Fig. 4. Serial, parallel, and shuffled turbo decoding.

Fig. 5. Bit-error performance of three-component turbo code with interleaver size 16384, for parallel decoding (dashed line) and shuffled decoding (solid line).

It is straightforward to generalize the shuffled turbo decoding to multiple turbo codes which consist of more than two component codes. Based on the above descriptions, the total computational complexity of the shuffled turbo decoding for multiple turbo codes at each decoding iteration is the same as that of the parallel turbo decoding, and each of them has a decoding delay of the decoding delay of serial turbo dewhich is about coding, where is the number of component codes. C. Simulation Results We observed that shuffled turbo decoding does not present an advantage over standard decoding for turbo codes with two component codes. A possible reason is that with an increasing


213

number of component codes, the proportion of new updated extrinsic messages taken advantage of by each component decoder also increases. It is also known that parallel decoding outperforms serial decoding for turbo codes with more than two component codes [9]. Fig. 5 depicts the bit-error performance of a turbo code with three component codes (rate-1/4) and interleaver size 16384, with standard parallel decoding and shuffled decoding. After five iterations, the shuffled turbo decoder outperforms its parallel counterpart by several tenths of a decibel.

[4] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for minimizing symbol error rate,” IEEE Trans. Inf. Theory, vol. IT-20, pp. 284–287, Mar. 1974. [5] C. Argon and S. McLaughlin, “A parallel decoder for low-latency decoding of turbo product codes,” IEEE Commun. Lett., vol. 6, pp. 70–72, Feb. 2002. [6] E. Yeo, P. Pakzad, B. Nikolic, and V. Anantharam, “High throughput low-density parity-check decoder architectures,” in Proc. Global Telecommun. Conf., Nov. 2001, pp. 3019–3024. [7] M. M. Mansour and N. R. Shanbhag, “Turbo decoder architecture for low-density parity-check codes,” in Proc. Global Telecommun. Conf., Nov. 2002, pp. 1383–1388. [8] F. Guilloud, “Generic architecture for LDPC codes decoding,” Ph.D. dissertation, ENST Paris, Paris, France, 2004. [9] D. Divsalar and F. Pollara, “Multiple turbo codes for deep-space communications,”, JPL TDA Prog. Rep., 71–78, May 1995. [10] J. Zhang and M. Fossorier, “Shuffled belief propagation decoding,” in Proc. 36th Annu. Asilomar Conf. Signals, Syst., Computers, Nov. 2002, pp. 8–15. [11] H. Kfir and I. Kanter, “Parallel versus sequential updating for belief propagation decoding,” Phys. Rev. E, submitted for publication.

REFERENCES [1] D. J. C. MacKay, “Good error-correcting codes based on very sparse matrices,” IEEE Trans. Inf. Theory, vol. 45, pp. 399–431, Mar. 1999. [2] R. G. Gallager, Low-Density Parity-Check Codes. Cambridge, MA: MIT Press, 1963. [3] C. Berrou and A. Glavieux, “Near-optimum error-correcting coding and decoding: Turbo-codes,” IEEE Trans. Commun., vol. 44, pp. 1261–1271, Oct. 1996.