coded decision feedback signal processing - CiteSeerX

Coded Decision Feedback Signal Processing: An Alternative to Iterative (Turbo) Signal Processing Krishna Narayanan Department of Elect. and Comp. Engg Texas A&M University College Station, TX 77843

Nitin Nangare Department of Elect. and Comp. Engg Texas A&M University College Station, TX 77843

[email protected]

[email protected] Each column is an LDPC coded word

Abstract— A non-iterative receiver is proposed to achieve near capacity performance on channels with memory. There are two main ingredients in the proposed design - i) the use of a signal processing block which produces optimal soft estimates of the inputs to the channel given all the observations from the channel and some past symbols exactly ii) The use of an encoder structure that ensures that past symbols can be used in the signal processing block in an error free manner through the use of a capacity achieving code for a memoryless channel. The advantages and disadvantages of this receiver compared to turbo signal processing are discussed and it is shown that when the latency constraint is not too strict, the proposed receiver can be a viable alternative to turbo signal processing.

I. INTRODUCTION We consider the problem of coding and signal processing for channels with memory when the input to the channel is constrained to be from a finite alphabet (such as from a PSK or QAM constellation). Examples include an inter-symbol interference (ISI) channel, a channel with timing offset, a correlated fading channel without channel state information at the receiver, and channels with additive correlated noise. In order to deal with the memory in the channel, typically a front end signal processing block such as an equalizer, timing offset estimator, channel estimator or noise whitener is used. Since the introduction of the turbo principle, passing soft information between the signal processing block and a decoder iteratively has become a popular paradigm [1]– [3]. Several papers have even shown that it is possible to carefully design codes for use with iterative signal processing techniques in order to achieve near capacity performance on channels with memory [4]. In this paper, we show that turbo or iterative signal processing is not necessary and demonstrate an alternate approach in order to achieve near capacity performance. We show a simple receiver structure whose complexity is only that of one iteration of the turbo receiver and yet achieves capacity (we mean achieves the information rate corresponding to the input constellation and the input distribution). The proposed receiver performs joint decoding and signal processing through a decision feedback (DF) mechanism. We consider two examples - coding and equalization for ISI channels and timing recovery in the presence of timing offset and ISI to explain the main ideas. However, this can be used with other This work was supported by a grant from Seagate Research and by the National Science Foundation under grant CCR 0093020

1 1

L

m

j

Known Bits Data is transmitted row wise

. Bits Known i

. . . . . . . . . . Bits Known

xi, j

Encoded Data

n Known Bits

Fig. 1.

Encoder Structure

signal processing functions as well. In comparison to turbo signal processing (or a turbo receiver), the advantages of the proposed approach include low complexity, simplified code design and universality. This is discussed in more detail in a later section. The disadvantage is the requirement of long latency. This paper is related to previous works in [5]–[8]. A detailed comparison to these works is omitted due to space limitation. In a nutshell, the optimality of decision feedback signal processing for the case of inputs from a finite alphabet is novel. This paper is also an extension of our previous work [9]. Due to space limitation all the proofs are omitted and can be seen in a longer version [10]. II. J OINT E QUALIZATION AND D ECODING FOR ISI C HANNELS In this section, we consider the example of coding and equalization for an ISI channel to explain the main ideas behind coded decision feedback signal processing. The idea can be easily extended to any channel which can be described using a finite-state Markov model. A. Encoder Structure and notation The encoder structure is shown in Fig. 1. The data is encoded with a low density parity check (LDPC) code of rate r and length n. The codewords are arranged in the form of a data-matrix of size n×m bits. The first L columns are a sequence of zero bits or known bits. We will assume binary

phase shift keying (BPSK) as the modulation; extension to other memoryless modulation is straight-forward. During the transmission, bits are transmitted sequentially along the rows, i.e. from 1st row to nth row. Let X = [X1 , X2 , . . . , Xnm ] denote the sequence of transmitted bits and Y , the entire received sequence. We also use Xi,j and X(i−1)m+j to refer to the transmitted bit in the ith row and jth column in the data-matrix in Fig. 1. The sequence of L bits [Xi,1 , Xi,2 , . . . , Xi,L ] are known bits ∀i. The overall rate is R = rn(m − L)/(nm) = r(1 − L/m). For m → ∞, the rate becomes r. The sequence of bits {Xk } are transmitted through an L + 1 tap ISI channel (memory L). The received sequence {Yk } at the output is given by Yk =

L X

hi Xk−i + Nk ,

(1)

i=0

where hi ’s are the tap coefficients (assumed to be complex) and Nk is complex uncorrelated Gaussian noise. We will use superscript to refer to vector of signals in a column, i.e., X j is the vector of bits in the jth column, e.g X j = [X1,j , X2,j , . . . , Xn,j ]. Let us denote the achievable information rate for this system by Ci.i.d emphasizing that the input to the channel is a sequence of i.i.d bits. The ISI channel can be represented using a trellis with 2L states, where the state before the transmission of Xi,j is denoted by Si,j and corresponds to the past L bits, [X(i−1)m+j−L , X(i−1)m+j−L+1 , . . . , X(i−1)m+j−1 ]. It is assumed that the channel state is reset to zero state at the end of the transmission of last row of a data-matrix. B. Proposed Receiver Structure The receiver begins by computing optimal soft outputs for each of the bits in the (L + 1)th column of a data-matrix shown in Fig. 1. We assume that a trellis based equalizer is used for equalization. Note that for each Xi,L+1 , ∀i, the previous L bits (Xi,1 , Xi,2 , . . . , Xi,L ) are known bits. Hence, the optimal soft output for the bit in the (i, L + 1)th position is the likelihood ratio P (Xi,L+1 = 1|Y , Xi,1 , Xi,2 , . . . , Xi,L ) Λi,L+1 = (2) P (Xi,L+1 = −1|Y , Xi,1 , Xi,2 , . . . , Xi,L ) Since knowing the past L bits perfectly is equivalent to fixing the state in the trellis of the equalizer, the optimal soft output is equivalent to Λi,L+1 =

P (Xi,L+1 = 1|Y , Si,L+1 ) P (Xi,L+1 = −1|Y , Si,L+1 )

(3)

Using (3), we can calculate the soft outputs Λi,L+1 for 1 ≤ i ≤ n, which are then input to an LDPC decoder to decode the (L+1)th column (codeword). Assuming that n → ∞ and the code X L+1 achieves capacity on the equivalent channel characterized by the input Xi,L+1 and output Λi,L+1 , we can perfectly decode the (L+1)th column. Now the decoder uses these decisions as perfect decision feedback and proceeds to generate optimal soft output for the bits in the (L + 2)th column assuming the previous L bits are perfectly known.

This process continues and for the jth column, the equalizer produces optimal soft outputs Λi,j for all the bits in the jth column assuming the previous L bits are perfectly known i.e, P (Xi,j = 1|Y , Si,j ) Λi,j = (4) P (Xi,j = −1|Y , Si,j ) We will now show how to generate the optimal soft outputs Λi,j in (4) using just one pass of the BCJR algorithm. C. The BCJR-DFE Algorithm Due to space limitation we give only the outline of the algorithm. The formal derivation of this can be found in [10]. The idea is to run n BCJR equalizers in parallel, one for each row. In each row, we first perform the backward recursion in the BCJR algorithm to generate the β’s for each state. Then, in the forward pass, for each j, we use the past decisions for the L bits provided by the decoder to exactly identify the state Si,j = si,j . Thus the α for all states other than si,j can be set to zero and Λi,j is computed. For every coded bit, we have only one backward pass of the BCJR algorithm and trivial forward pass to produce Λi,j . Hence the equalization complexity is nearly half that of a BCJR algorithm. The following important properties of this receiver are proved in [10]. (i) The equivalent channel as seen by the LDPC decoder for each column is perfectly memoryless and is statistically identical for all columns regardless of the ISI channel tap coefficients. (ii) The capacity of the equivalent memoryless channel is exactly Ciid , the achievable information rate for the ISI channel. This shows that this scheme is optimal if codes that achieve capacity on the equivalent memoryless channel are used. D. Comparison to Iterative Receivers Compared to using turbo equalization, the proposed receiver has the following advantages - (i) the complexity is significantly smaller since only one iteration of BCJR equalizer is required as discussed before. (ii) with the proposed receiver, since the channel as seen by each column decoder is memoryless, there is no need to specially design codes for ISI channels such what is required for the turbo equalization case in [4]. Codes optimized for memoryless channels (such as the AWGN channel) suffice. (iii) Regardless of the coefficients of the ISI channel, the equivalent channel seen by the decoder is memoryless. However, the exactly distribution of the noise in the memoryless channel will depend on the coefficients in the ISI channel. Thus, an LDPC code ensemble that can achieve universally good performance on all memoryless channels can be used to achieve universally good performance on all ISI channels for which Ci.i.d ≥ r. The main point is that it is easy to design nearly universally good codes for channels without memory. For turbo equalization, we can show it is impossible (through the use of EXIT charts and underlying Gaussian assumptions) to design one code that is good for several ISI channels with the same Ci.i.d . That is, the LDPC code degree profiles must be carefully optimized for each channel in

In Fig. 2, simulation results for the proposed receiver and that of using turbo equalization are shown for a three tap ISI channel with impulse response H = [0.407 0.815 0.407]. The LDPC code was chosen to be optimized for the AWGN channel in both cases. We wish to emphasize here that if the code is carefully optimized, then turbo equalization can perform as well as the proposed receiver. However, when the channel is not known at the transmitter, it will not be possible to match the code to the channel. For long latencies, the BCJR-DFE outperforms the turbo receiver and has significantly lower complexity. In order to show that the proposed receiver can provide good gains even for small latency, we present results with the multi-rate BCJR-DFE algorithm also for the same channel in Fig. 2. Even for a latency of 50000 bits, the proposed receiver outperforms turbo equalization and has lower complexity. 0

10

BCJR−DFE, data−matrix : 104 × 100 : Latency = 106 Bits Turbo Equalization (15th Iter), n = 106 : Latency = 106 Bits Multi−rate BCJR−DFE, data−matrix : 104 × 5 : Latency = 5 ×104 Bits Turbo Equalization (15th Iter), n = 5 ×104 : Latency = 5 ×104 Bits

−1

10

−2

10

−3

10

−4

10

−5

10

−6

10

1

1.5

turbo eqz. threshold = 2.41 dB

BCJR−DFE threshold = 1.65 dB

In order to make the rate loss due to the L known bits in each row small, m needs to be large, thereby increasing latency and the probability of error propagation. We can reduce m and alleviate error propagation by replacing the known bits with low rate codes and by using different rates for different columns. In the absence of known bits, none of the previous bits are known for the bits in the first column of a data-matrix. So we set the code rate r1 for the first column equal to be r1 = I(Xi ; Y ). For the second column we assume that we have decoded the first column successfully and hence the code rate r2 can be set equal to r2 = I(Xi ; Y |Xi−1 ). Thus we can set the rate for jth column to be rj = I(Xi ; Y |Xi−1 , Xi−2 , · · · , Xi−j+1 ) for 1 ≤ j ≤ L. Also note that for the last few columns, previous and future bits will be known. For example, for the last column, we set rm = I(Xi ; Y |Xi+L , · · · , Xi+1 , Xi−1 , · · · , Xi−L ). This allows us to increase the code rate for the last few columns in a data-matrix, which can compensate for the low code rate in first L columns of a data-matrix. Thus we have a new encoder structure without any known bits, where code rate gradually increases from first column to the last column in a data-matrix. The decoding is slightly more complicated since the channel seen by the decoder is not perfectly memoryless anymore. We first perform a single large backward pass from the last received bit of last row to the first received bit of the first row of a data-matrix. Backward state probabilities (β 0 s) corresponding to all the bits are stored during this iteration. In the forward direction we use n parallel equalizers corresponding to each row of a data-matrix. For the first column we assign equal forward probabilities α0 s to all the starting states of n trellises, as none of the previous bits are known. We can then calculate the LLRs for the first column (j = 1). These LLRs are given as input to the LDPC decoder for the first column. Hard decision outputs from the LDPC decoder are given as decision feedback to n parallel equalizers. Assuming that we successfully decode the first column with LDPC decoder, we can calculate the LLRs of the second column using the fact that previous 1 bit is known for all the bits in second column. So for second column only those forward states of n equalizers are active for which previous bit is equal to the decoded result of first column. In the same way we proceed till first L columns are decoded. Once the first L columns are decoded successfully,

III. S IMULATION R ESULTS

i.i.d. Capacity = 1.44 dB

E. Multi-rate BCJR-DFE

we can treat these L columns as known columns and switch back to the original BCJR-DFE receiver. In the case when channel is not known at the transmitter we need to make sure that the code rates for the first L columns are such that rj < min I(Xi ; Y |Xi−1 , Xi−2 , · · · , Xi−j+1 , H = Hk ) over the class of channels Hk ∈ {H1 , H2 , · · · }.

BER

order to achieve near capacity performance. Hence, when the channel is not known at the transmitter, iterative processing cannot be universal, whereas the DF approach is nearly universal. The disadvantage of the proposed approach is that for a given delay of nm, the codewords can be only of length n, whereas in the case of turbo equalization, the codewords can be length nm. With codes like LDPC codes whose performance improves with length, it may be a disadvantage of the proposed structure. This also means that error propagation can be severe in the BCJR-DFE algorithm if n is small.

2

2.5

3

3.5

4

4.5

5

Eb/No (dB)

Fig. 2. Simulation Results with multi-rate BCJR-DFE for the three tap channel H = [0.407 0.815 0.407]

IV. J OINT T IMING R ECOVERY AND D ECODING In this section, we discuss how the decision feedback principle can be used instead of the turbo principle for the case of joint timing recovery (synchronization) and decoding in the presence of ISI. Consider transmission of the data matrix in Fig. 1 through a channel where the output analog Psignal is given by y(t) can be represented as, y(t) = k xk h(t − kT − τk ) + n(t), where h(t) = P h g(t − lT ) and g(t) = sin(πt/T )/(πt/T ) is an ideal k l zero-excess-bandwidth Nyquist pulse. τk is a timing offset which is modelled as a random walk [11] 2 τk+1 = τk + N (0, σw ).

(5)

−1

V. E XTENSIONS AND C ONCLUSIONS A general framework including coding vertically and performing signal processing horizontally by using past decisions was presented. Although two examples were presented in the paper, the idea is quite general and can be used in many channels with memory. In the case when the received signal can be expressed as the output of a hidden Markov model with a finite number of states (such as in the first example), the proposed receiver is optimal. In other cases, we can assume that the entire past decisions are exact. This leads to a new information theoretic lower bound on the capacity of the channel with memory namely I(Xk ; Y |Xk−1 , . . . , X0 ) [13]. The past decision can be used to simplify signal processing

10

Turbo Synchronization 1st Iteration : Latency = 5000 Bits Turbo Synchronization 2nd Iteration Turbo Synchronization 5th Iteration Turbo Synchronization 20th Iteration 4 per−survivor BCJR−DFE, data−matrix : 5000 × 22 : Latency = 11 × 10 Bits

−2

10

−3

10

BER

A phase locked loop (PLL) is typically used to keep track of timing offsets. Equalization and timing recovery can be performed jointly (front end signal processing) using a persurvivor processor (PSP) approach [11] and then turbo processing can be used to iterate between a soft output decoder and the timing recovery block. Computational complexity of such iterative processing is very high, especially when large number of iterations are required to achieve the better performance. Since the received signal without noise cannot be modelled as the output of a hidden Markov model with a finite number of states, we cannot use the BCJR-DFE algorithm. We can still perform joint timing recovery, equalization and decoding using a decision feedback approach as follows. We first perform a large backward pass of the BCJR algorithm over the entire nm time instants, but we use a per-survivor processing approach and maintain an estimate of τk using a PLL for each state during the backward recursion. During the forward recursion each row is processed separately by running n parallel equalizers as in Section 2. We decode one column at a time, but the key ingredient is that as each column in the data matrix is decoded, we use the hard decisions to produce updated estimates of τk . Assuming no error propagation, in the forward recursion, increasingly good estimates of τk are obtained as k increases, making this perform very well with a complexity of one iteration of the turbo timing recovery algorithm in [11]. Fig. 3 shows a comparison of the performance of the turbo approach (in [11]) compared to the proposed DF approach. LDPC codes of length n = 5000 and rate R = 0.89, optimized for the AWGN channel were used. Number of iterations within the LDPC decoder were set to 100. At the receiver, to obtain samples at different timing offsets, we used a 21 tap sinc interpolation filter given by g(t) = sin(πt/T )/(πt/T ). A very simple Müller and Müller detector was used in the PLL both in the forward and backward passes for both the turbo and decision feedback approaches. Notice that the decision feedback approach performs very well with drastically smaller complexity. In this case, the latency in terms of the decoding time is the same between the two approaches but the latency in terms of number of bits is larger for the decision feedback approach. A more detailed discussion can be seen in [12].

−4

10

−5

10

−6

10

−7

10

5.5

6

6.5 Eb/No (dB)

7

7.5

Fig. 3. Comparison of performance of turbo and DF signal processing over H(D) = 1 + 2D + D 2 channel with σw /T = 0.5

and equivalently render a better equivalent channel to the decoder. This idea can be used to estimate fading channels, or with correlated and even data dependent noise, presenting an alternative to turbo signal processing. R EFERENCES [1] C. Douillard, M. Jezequel, C. Berrou, A. Picart, P. Didier, and A. Glavieux, “Iterative correction of intersymbol interference: Turbo equalization,” European Trans. on Telecommunications, vol. 6, no. 5, pp. 507–511, Sept./Oct. 1995. [2] M. Tüchler, R. Otnes, and A. Schmidbauer, “Performance of soft iterative channel estimation in turbo equalization,” in IEEE Proc. Int. Conf. Commun. ICC, Apr./May 2002, pp. 1858–1862. [3] J. Barry, A. Kavˇcić, S. W. McLaughlin, A. Nayak, and W. Zeng, “Iterative timing recovery,” IEEE Signal Processing Mag., vol. 21, no. 1, pp. 89–102, Jan. 2004. [4] N. Varnica and A. Kavˇcić, “Optimized low-density parity-check codes for partial response channels,” IEEE Communications Letters, vol. 7, no. 4, pp. 168–170, Apr. 2003. [5] J. M. Cioffi, G. P. Dudevoir, M. V. Eyuboglu, and J. G. D. Forney, “MMSE decision-feedback equalizers and coding–part I: Equalization results,” IEEE Trans. Commun., vol. 43, no. 10, pp. 2582–2594, Oct. 1995. [6] ——, “MMSE decision-feedback equalizers and coding–part II: Coding results,” IEEE Trans. Commun., vol. 43, no. 10, pp. 2595–2604, oct 1995. [7] M. Varanasi and T. Guess, “Optimum decision feedback multiuser equalization with successive decoding achieves the total capacity of the gaussian multiple-access channel,” in Proc. of Asilomar Conf. on Signals, Systems & Computers, 1997, pp. 1405–1409. [8] M. Eyuboglu, “Detection of coded modulation signals on linear, severely distorted channels using decision-feedback noise prediction with interleaving,” IEEE Trans. Commun., pp. 401–409, Apr. 1998. [9] K. R. Narayanan and N. Nangare, “A BCJR-DFE based receiver for achieving near capacity performance on ISI channels,” in 42nd Allerton conference, Monticello, Illinois, Sept. 2004. [Online]. Available: http://www.tamu.edu/commtheory [10] ——, “A BCJR-DFE based receiver for achieving near capacity performance on ISI channels,” in to be submitted to the IEEE Tran. Comm. [Online]. Available: http://www.tamu.edu/commtheory [11] P. Kovintavewat, J. Barry, M. Erden, and E. Kurtas, “Per-survivor iterative timing recovery for coded partial response channels,” in Proc. IEEE Globecom ’04, Dallas, Texas, Nov. 2004, pp. 2604–2608. [12] N. Nangare, K. R. Narayanan, X. Yang, and E. Kurtas, “Joint timing recovery, ISI equalization and decoding using per-survivor BCJRDFE,” in Proc. IEEE Globecom ’05, Nov 2005. [13] V. Sethuraman, B. Hajek, and K. R. Narayanan, “Capacity bounds for non coherent channels with peak constraint,” in Proc. IEEE ISIT ’05, Melbourne, Australia, Sept. 2005.