An Early Decision Decoding Algorithm for LDPC Codes ... - CiteSeerX

ECCTD 2005 - European Conference on Circuit Theory and Design, Cork Ireland, 29 August - 2 September 2005

An Early Decision Decoding Algorithm for LDPC Codes Using Dynamic Thresholds Anton Blad, Oscar Gustafsson, and Lars Wanhammar* Abstract — Low-density parity-check codes have recently received extensive attention as a forward error correction scheme in a wide area of applications. The decoding algorithm is inherently parallelizable, allowing communication at high speeds. One of the main disadvantages, however, is large memory requirements for interim storing of decoding data. In this paper, we investigate a modification to the decoding algorithm, using early decisions for bits with high reliabilities. This reduces the amount of messages passed by the algorithm, which can be expected to reduce the switching activity of a hardware implementation. While direct application of the modification results in severe performance penalties, we show how to adapt the algorithm to reduce the impact, resulting in a negligible decrease in error correction performance.

1

2

c1

LOW-DENSITY PARITY-CHECK CODES

LDPC codes are defined as the null space of a very sparse parity-check matrix H. H has dimensions M × N , and the set of valid code words are defined by the relation Hx = 0 . Thus code words are of length N, and each code word contains (assuming H is full-rank) M parity symbols. An LDPC code is typically represented by a bipartite graph between N variable nodes and M check nodes, called a Tanner graph. An LDPC code is * Department of Electrical Engineering, Linköping University, SE-581 83 Linköping, Sweden, e-mail: {antonb, oscarg, larsw}@isy.liu.se, tel: +46 13 284059, fax: +46 13 139282

0-7803-9066-0/05/$20.00 ©2005 IEEE

v3

v4

v5

c2

v6

v7

c3

v8

c4

v9

v10

c5

Figure 1: Tanner graph corresponding to the code given by H in (1). The variable nodes v and the check nodes c correspond to the columns and rows of H, respectively. There is an edge between vn and cm if and only if Hm,n = 1.

INTRODUCTION

Low-density parity-check (LDPC) codes were invented by Robert G. Gallager [1] in the early 60’s. However, despite excellent performance and a practical decoding algorithm, they were largely neglected for over 30 years until they were rediscovered by MacKay and Neal [2] in the mid 90’s. Since then, LDPC codes have been found to have performances comparable to, and in some cases exceeding, those of Turbo codes. The main problem in coding theory is to build a practical and efficient decoder. The decoding algorithm for LDPC codes is inherently parallelizable, and thus suitable for communication at high speeds. One of the main disadvantages, however, is large memory requirements for implementations that are not fully parallel. A typical code with a length of 6000 bits might require up to a million memory accesses for the decoding of a single word. As 95% of the power is consumed in the memories [3], reducing the amount of memory accesses becomes an important concern, especially in low-power hand-held devices.

v2

v1

regular if all variable nodes have a constant degree of j, and all check nodes have a constant degree of k. (1) shows an example code with parameters ( N, j, k ) = ( 10, 2, 4 ) . Thus the number of parity bits is M = Nj ⁄ k = 5 , and the code has rate R = 1 ⁄ 2 . However, H actually has one dependent row, which may be removed in order to slightly increase the rate. This always happens when the column weight j is even, as well as with many systematically constructed codes, but becomes a minor issue as the code length grows. 1 1 H = 0 0 0

1 0 1 0 0

1 0 0 1 0

1 0 0 0 1

0 1 1 0 0

0 1 0 1 0

0 1 0 0 1

0 0 1 1 0

0 0 1 0 1

0 0 0 1 1

(1)

Associated with H is the Tanner graph shown in Fig. 1, where the variable nodes have a constant degree of j = 2 , and the check nodes have a constant degree of k = 4 . The girth of a graph is the length of its shortest cycle. This concept is important, as cycles introduce dependencies in the decoding process, thereby reducing the decoder’s performance. 3

SUM-PRODUCT DECODER

The sum-product decoding algorithm [4] works by passing messages representing bit probabilities along the edges of the Tanner graph. As the graph is bipartite, there are two types of messages. The variable-to-check a message q mn, i is meant to represent the likelihood that variable n has the value a, given the information re-

a

xn

∑

r mn, i =

( x n , …, x n ) ∈ {0, 1} 1

l

1

l

Initialization a For every variable node, the prior likelihoods p n are set to the likelihood that variable n has the value a, given channel information. Then the variable-to-check mesa a sages q mn, 0 are set to the prior likelihoods p n . Horizontal step a The check-to-variable messages r mn, i are computed according to the relation (1), where n1,...nl denote the neighbors to check node m other than n, and [P] denotes a function which is 1 if P is true and 0 otherwise. Vertical step In the vertical step the variable-to-check messages a q mn, i are computed as a

a

1 n,

a

i

⋅ … ⋅ r m n, i l

(2)

where m1,...,ml denotes the neighbors to variable node n other than m, and α n, i is a normalizing constant chosen such that q 0mn, i + q 1mn, i = 1 . Decision step a The pseudo-posterior likelihoods q n, i are computed by the same equation (2) as the variable-to-check messages, except that the product is over all the neighbors to n. A tentative decoding xˆ is created by choosing each bit individually to the most likely value, based on the pseua do-posterior likelihoods q n, i . If H xˆ = 0 , xˆ is reported as the decoded word. Otherwise the algorithm continues from the horizontal step until the maximum number of iterations is reached. 4

EARLY DECISION DECODER

At the end of each iteration the pseudo-posterior likelia hoods q n, i are computed for every variable node n. Generally, the values converge quickly towards either 0 or 1. We try to exploit this fact by deciding variables with high likelihoods. We define the reliability of variable n in iteration i as 0

c n, i

l

, i – 1 ⋅ … ⋅ q mn , i – 1

1

 q n, i = log  ---------  q 1n, i

(3)

(1)

l

l

ceived from check node neighbors other than m in itera ation i. The check-to-variable message r mn, i is meant to denote the probability that check node m is satisfied, given that variable n is locked at a and m’s other neighbors have separate distributions given by their respective messages to check node m. The probabilities are exact if the Tanner graph has no cycles [5]. However, cycles are needed for the code to be usable [6], and with cycles the messages are accurate only in the first few iterations. Decoding is done in several phases:

q mn, i = α n, i ⋅ r m

xn

1

[ a + x n + … + x n = 0 ] ⋅ q mn

0

1

For high reliabilities, either q n, i or q n, i is close to one, and cn, i can be approximated 0

1

c n, i ≈ max ( – log q n, i, – log q n, i )

(4)

We may also state the probability that deciding a variable causes an error, which is the lesser of the two likelihoods: 0

1

e n, i ≈ min ( q n, i, q n, i )

(5)

We define a threshold t, and whenever the reliability of a variable is greater than t we declare the variable decided and stop updating its messages. In later iterations, a we use q n, i ∈ {0, 1} for a decided variable, which can be efficiently implemented in hardware. To determine the potential gain of this decoding scheme, we define a measure of the efficiency as the ratio of the number of decided variables to the number of variable nodes, summed over all iterations, given by (6). The efficiency E is equal to the reduction of communication between nodes during the decoding of a received word, and should roughly correspond to the reduction of power consumption of a hardware implementation of an LDPC decoder. It does not, however, consider the cost of additional logic needed to implement the scheme. I denotes the number of iterations required to decode a word, and di is the number of decided bits in iteration i. I

E =

∑ i=0

di ---N

(6)

If the Tanner graph of the code is cycle-free, the a likelihoods q n, i are exact. The cycles, however, introduce dependencies between the messages arriving at a node, thus causing the reliabilities cn, i as defined by (3) to increase faster than the real reliabilities of the nodes. Even for a girth-8 code, the messages are dependent already in the second iteration. Also, by deciding variables we increase their reliabilities, which causes higher reliabilities to be propagated to nearby nodes. This is called “belief pushing” by Zimmerman et al. [7], although they used a different measure of reliabilities. While they found that better performance can be oba tained by leaving the likelihoods q n, i at the values they had at decision time, this also severely limits the efficiency of the decoder as the likelihoods still have to be read from memory each iteration. Thus we consider only a scheme where we force the likelihoods to 0 and 1. Another problem with early decision decoding is that the decoder is bound to have an error floor. From (4) and (5), we can define the error probability of a variable as a function of its reliability: 0

1

e n, i ≈ min ( q n, i, q n, i ) ≈ exp ( – c n, i )

(7)

10 10 10

−3

0.5

10

0.3

Prob. prop. BEP Thres. BEP, t =6 0

−1

Efficiency, t0=6

−3

−5

10

1.5 E /N (dB) b

2

2.5

10

0

Efficiency, t0=5

0.1

0.5

1

1.5 E /N (dB)

2

2.5

0 3

0

Figure 3: Dynamic threshold decoding, t d = 0.7

We decide every variable with a reliability of at least t. According to (7), we thus decide every variable with an error probability of less than exp ( – t ) . If a fraction δ of all variables are decided at a given SNR, δ exp ( – t ) will be a lower bound for the error probability. Increasing the SNR increases the average reliability of the variables, thus increasing the decision ratio δ . Hence, an error floor of δ exp ( – t ) can be concluded. To counter the effect of the cycles we consider a modification to the discussed decoding scheme. We define a dynamic threshold as t = t0 + td i

0

b

Figure 2: Performance of threshold decoder.

(8)

where t0 and td are constants, and i is the current iteration. Thus we define the threshold to increase linearly with the number of iterations. While this allows us to get higher performance, the initial threshold t0 must be lowered, making the error floor a bigger concern at higher SNR. Each iteration we also re-enable each bit for which any of the checks are unsatisfied. To decrease the effect of the error floor we consider a threshold decoder combined with a regular sum-product decoder. LDPC decoders very seldom reports an erroneous code word, but rather reach the maximum number of iterations and reports that they are unable to find a valid decoding. This property is not altered by the threshold scheme. Thus, as long as the word error rate is satisfactorily low, the failed words may be submitted to an ordinary sum-product decoder for decoding, without the efficiency dropping significantly. 5

0.2

−7

0 3

SIMULATION RESULTS

We present results from simulations using a girth-8 integer lattice LDPC code [8], ( N, j, k ) = ( 6702, 3, 6 ) . The maximum number of iterations was set to 100. Similar results have been achieved using other rate-1/2 codes, but those results are not presented. All parameters given in figures should be multiplied by log 10 . Figure 2 shows the performance of the threshold algorithm with constant thresholds of t = 6 ⋅ log 10 and t = 8 ⋅ log 10 . Communication reductions of 35% are achievable, but performance suffers.

0

0.7

10

−1

0.6

10

−2

Bit error probability

1

0

Dyn. th. BEP, t0=4

10

0.5

10

−3

0.4

−4

Prob. prop. BEP Dyn. th. BEP, td=0.4 Dyn. th. BEP, t =0.7

0.3

−5

Dyn. th. BEP, td=1.0

10 10

d

10

0.2

Efficiency, td=0.4

−6

10

Efficiency, t =0.7

0.1

d

Efficiency, td=1.0 −7

10

0

0.5

1

1.5 Eb/N0 (dB)

2

1.5 E /N (dB)

2

2.5

0 3

Figure 4: Dynamic threshold decoding, t 0 = 5 20

Average number of iterations

0.5

0.3

Prob. prop. BEP Dyn. th. BEP, t =5

Efficiency, t0=4

−7

0

0.4

−4

10

−6

0.1

0.5

10

Efficiency, t0=8 10

0.6

−2

0.2

Thres. BEP, t0=8 −6

10

0.4

−4

−5

0.6

0.7

Efficiency

10

−2

10

Efficiency

10

−1

0

0.7



10

0

Efficiency

10

15

10

5

Prob. prop. Threshold, t =6 0

Dyn. th, t =4 0

Dyn. th, t0=5 0 0

0.5

1

b

2.5

3

0

Figure 5: Average number of iterations. t d = 0.7 for the threshold algorithms. Employing dynamic thresholds, the performance in Fig. 3 results. The dynamic parameter was set to t d = 0.7 ⋅ log 10 . With t 0 = 5 ⋅ log 10 the algorithm performs adequately at a bit error rate of almost 10-6. Trying to increase the efficiency by lowering the initial threshold to t 0 = 4 ⋅ log 10 results in a higher bit error rate. The error floor is clearly visible. The effect of the dynamic parameter is shown in Fig. 4. Figure 5 shows the average number of iterations with the different decoding schemes. As can be seen,

0

−1

0.6

10

−2

−3

10

−4

10

−5

10

7

0.5

10

Prob. prop. BEP Thres. BEP, t =6

0.4

Dyn. th. BEP, t0=4

0.3

0

Dyn. th. BEP, t0=5

Efficiency


gains achievable by avoiding extrinsic memory accesses, and the overall amount of memory accesses could be significantly reduced.

0.7

10

0.2

Eff., thres., t =6 0

−6

10

Eff., dyn. th, t =4

0.1

0

Eff., dyn. th, t0=5 −7

10

0

0.5

1

1.5 E /N (dB) b

2

2.5

0 3

0

Figure 6: Combined threshold-SP decoding. The threshold curve requires a 50% increase in the average number of iterations in the region of 2 to 2.5 dB. Thus the gains may be lost due to increased clock frequency of the decoder. both modifications increase the decoding times by a small to medium amount, which is included in the efficiency measures. The curves are not very comparable though, as a constant SNR yields different error rates for different algorithms. It can be noticed that for the dynamic threshold algorithm with t 0 = 5 ⋅ log 10 , achieving an efficiency of 30% at SNR = 3 dB, an insignificant increase in the average number of iterations results. Simulations of a combined decoder are shown in Fig. 6. The definition of efficiency is not applicable here, but setting E = -1 for a failed decoding gives a lower bound for the gain that is possible with a combined decoder. The performance penalty relative to the sumproduct decoder is negligible, but still significant communication reductions are possible. 6

IMPLEMENTATION CONSIDERATIONS

In a hardware implementation of a sum-product decoder, messages are usually passed as signed-magnitude numbers in the log-likelihood domain. Thus the reliabilities, as defined by (3), are already available. This is in contrast to the definitions in [7] and [9], which are not easily evaluated in hardware. The architecture proposed in [10] can easily be adapted to threshold decoding. A second memory containing the bit decisions is added to the variable node processing units. It is updated during variable node processing, and is used to completely disable further variable node processing. For check node processing, the hard decision value must still be read and communicated to the check node processing unit. The efficiency as defined in this paper directly corresponds to the reduction of the number of memory accesses during the decoding of a received word. The size of the decision memory is N bits, while the extrinsic memory is about 20N bits. Thus the overhead imposed by storing early decided bits is small compared to the

CONCLUSION

In this paper we have investigated a modification to the sum-product decoding algorithm for low-density paritycheck codes. We exploit the different convergence rates of the bits involved in the decoding process. By prematurely deciding bits with high reliabilities we achieve a reduction in the message passing. However, two problems with the threshold decoding scheme is the existence of an error floor and reduced performance due to cycles in the code’s graph. We present an approach to reduce the effect of the cycles. By using a dynamic threshold, which increases linearly with the number of iterations, we achieve a 5 to 10 % increase in efficiency at bit error rates below 10-5. Combining the various threshold decoding schemes with a regular sum-product decoder increases performance almost to the level of a regular sum-product decoder. The overhead imposed by multiple decoding attempts is negligible, as very few words are failed by the threshold decoder at a bit error rate of 10-5. REFERENCES [1]

R. G. Gallager, “Low-density parity-check codes,” IRE Trans. Inform. Theory, vol. IT-8, pp. 21–28, Jan. 1962. [2] D. J. C. MacKay, “Good error-correcting codes based on very sparse matrices,” IEEE Trans. Inform. Theory, vol. 45, no. 2, pp. 399–431, Mar. 1999. [3] Y. Li, M. Elassal and M. Bayoumi, “Power efficient architecture for (3,6)-regular low-density parity-check code decoder,” in Proc. IEEE Int. Symp. Circuits Syst., Vancouver, Canada, vol. 4, pp. 81–84, May 2004. [4] F. R. Kschischang, B. J. Frey and H.-A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 498–519, Feb. 2001. [5] N. Wiberg, Codes and Decoding on General Graphs, PhD thesis, Linköping University, Linköping, Sweden, 1996. [6] T. Etzion, A. Trachtenberg and A. Vardy, “Which codes have cycle-free Tanner graphs?,” IEEE Trans. Inform. Theory, vol. 45, no. 6, pp. 2173–2181, Sept. 1999. [7] E. Zimmermann, G. Fettweis, P. Pattisapu and P. K. Bora, “Reduced complexity LDPC decoding using forced convergence,” 7th Int. Symp. on Wireless Personal Multimedia Communications, Sept. 2004. [8] B. Vasic, K. Pegagani and M. Ivkovic, “High-rate girtheight low-density parity-check codes on rectangular integer lattices,” IEEE Trans. Communications, vol. 52, no. 8, pp. 1248–1252, Aug. 2004. [9] R. Bresnan, W. Manane and M. Sala, “Efficient lowdensity parity-check decoding,” Irish Signals Syst. Conf., June 2004. [10] T. Zhang and K. K. Parhi, “Joint (3, k)-regular LDPC code and decoder/encoder design,” IEEE Trans. Signal Proc., vol. 52, no. 4, pp. 1065–1079, Apr. 2004.