106

IEEE COMMUNICATIONS LETTERS, VOL. 9, NO. 2, FEBRUARY 2005

MIMO Doubly-Iterative Receivers: Pre- vs. Post-Cancellation Filtering Ezio Biglieri, Fellow, IEEE, Alessandro Nordio, Member, IEEE, and Giorgio Taricco, Senior Member, IEEE

Abstract— We describe and analyze two receivers for turbo space–time codes. These are doubly-iterative, and include iterations for turbo decoding as well as for spatial-interference canceling. The first receiver has a minimum-mean-square-error filter preceding the canceler, while in the second one the filter follows the canceler. The latter outperforms the former at the price of a modest complexity increase. Index Terms— MIMO systems, space-time codes, iterative receivers.

I. I NTRODUCTION

R

ECENTLY, multiple-antenna techniques have been recognized to be capable of greatly increasing the spectral efficiency of wireless systems (see, e.g., [6] and the references therein). A major issue is the complexity reduction of optimum decoding, achieved by receiver interfaces that retain a close-tooptimum performance while keeping a moderate complexity: this would remove the practical restriction to small signal constellations or few antennas. In [1], an iterative spatialinterference canceler was shown to provide a good tradeoff between complexity and performance. In the receiver interface, the signals are first combined through a linear Minimum Mean Square Error (MMSE) filter, then spatial interference is reduced by feeding back hard decisions provided by the decoder. The receiver is doubly iterative, in the sense that preliminary results obtained from a few iterations of the turbo decoding algorithm are used to reduce spatial interference. After this reduction, further turbo-decoding iterations are performed in order to improve on the interference cancellation, and so on. In this Letter, we elaborate on the concept of double-iterative decoder by examining two different architectures: in the first one, MMSE filtering precedes interference cancellation, while in the second MMSE filtering follows it. It is expected (and actually confirmed by our results) that the second architecture outperforms the first one (the rationale behind this statement follows from results described, in the context of multiuser detection (MUD), in [3]–[5], [8]) at the price of some additional complexity, which under certain conditions might be marginal. Both architectures improve considerably on suboptimum space–time turbo decoders previously presented in the literature [7] when the number of antennas is large.

Manuscript received April 28, 2004. The associate editor coordinating the review of this letter and approving it for publication was Dr. Sarah Kate Wilson. This work was supported by CERCOM. The authors are with Politecnico di Torino, Italy (e-mail: [email protected]). Digital Object Identifier 10.1109/LCOMM.2005.02032.

II. S YSTEM M ODEL We consider a multiple-input, multiple-output (MIMO) system with t transmit and r receive antennas described over N consecutive symbol intervals by the equation Y = HX + Z

(1)

t×N

where X ∈ C represents a code word of transmitted r×N symbols, Z ∈ C is the noise matrix with iid entries r×t distributed as Nc (0, N0 ), and H = (h1 , . . . , ht ) ∈ C is the channel random matrix, whose entries (H)ij are iid as circularly-distributed complex Gaussian random variables with zero mean and unit variance, i.e., ∼ Nc (0, 1). Matrix H is independent of both X and Z, and remains constant during the transmission of a code word. We assume that the receiver knows the realizations of the channel matrix H, i.e., it has perfect channel state information (CSI) available. r×N is the received signal. A binary “Turbo” Finally, Y ∈ C channel encoder of rate Rc , followed by an interleaver, maps an information bit vector b onto an interleaved encoded bit vector c. The latter is serial-to-parallel converted, and fed to t modulators that output the matrix code word X = (x1 , . . . , xN ) = (xi, )t,N i,=1

(2)

transmitted on the MIMO channel. The row index of X indicates space and the column index indicates time. Thus, the element xi, is transmitted from antenna i ∈ {1, . . . , t} at time ∈ {1, . . . , N }. Let Es E[|xi, |2 ] denote the average transmitted symbol energy. Splitting the encoded vector c in the tN vectors ci (c(i−1)m+1, , . . . , cim, )T (i = 1, . . . , t and = 1, . . . , N ) and denoting the ith antenna modulator map as fi (·) we have xi, = fi (ci, ). Then, the transmitted signal vector at time can be written as x = f (c ) = (f1 (c1 ), . . . , ft (ct ))T . Eq. (1) can be specialized as follows: y = Hx + z

= 1, . . . , N

(3)

III. R ECEIVERS We compare the performances of two receiver interfaces illustrated in Figs. 1 and 2. The former, referred to in the following as MMSE+IC, has the MMSE filter located before the interference cancellation (IC) loop. The latter (IC+MMSE) interface has the MMSE filter located inside the interference cancellation (IC) loop. The IC+MMSE complexity is higher than for the MMSE+IC, since the MMSE filter has to be calculated at each iteration.

c 2005 IEEE 1089-7798/05$20.00

NAME et al.: MIMO DOUBLY-ITERATIVE RECEIVERS: PRE- VS. POST-CANCELATION FILTERING

Assuming, as in [5], that (k) ˆ (k) = x ˆj E xj | x

MMSE

^X(k) Y

Fig. 1.

Interleave.

~ Y IC

Deinterleave.

LLR

~ (k) Y

Turbo decoder

^ b

yields

Λ

(k)

IC

(k)

LLR (k) Yt

MMSE-2

~(k) Yt

2

Deinterleave.

=

1 Es

Turbo decoder

fi

^ b

=

x ˆj

(k) = Es 1 − v j

(k)

(7)

(8)

2

(k)

1

αi

(k) xj . We define the ith normalized E

ˆ (k)

MMSE filter vector fi

Interleave.

~(k) Y1 Λ

Y

where vj

^X(k) MMSE-1

ˆ (k) = Es E |xj |2 | x

E xj − xˆ(k) j

MMSE+IC receiver block diagram.

(k) Y1

107

as

(k)

1 − vj

hj h†j + hi h†i + δs Ir

(9)

−1 hi

j=i (k)

Fig. 2.

To compute the normalization constant αi we notice that (k) † (k) † (k) (k) † (k) (k) † fi hj xj − x ˆj fi yi = fi hi xi + + fi z

IC+MMSE receiver block diagram.

A. MMSE+IC Receiver This receiver was studied by the authors in [2]. The received signal is first passed through an MMSE filter whose output is = GY = X + LX + GZ Y

(4)

G D−1 A

(5)

where

and A (H† H + δs It )−1 H† , D diag(AH), L (D−1 AH−It ), and δs = N0 /Es . The turbo decoder provides (k) of the transmitted signal X (k = 1, 2, . . . soft estimates X is an iteration index). The output of the IC block at iteration k is − LX ˆ (k) (k) = Y (6) Y B. IC+MMSE Receiver This interface is illustrated in Fig. 2. It is based on a bank of t MMSE filters, one for each transmit antenna, located inside the IC loop and hence to be updated at each iteration step. This is expected to outperform MMSE+IC, since now the filter can also mitigate the residual interference. In the following, we assess the validity of the latter statement, the amount of improvement achieved with practical space–time codes, and the complexity increase needed. Consider the th signalling interval, and let the output at iteration k of the interference canceller corresponding to antenna i = 1, . . . , t be given by (k+1) (k) ˆ (k) + x i =H x−x ˆi hi + z y (k)

(k)

ˆ (k) = [ˆ where x x1 , . . . , x ˆt ]T , is the th generic column of (k) X and represents the decoder output at iteration k for k > 0. ˆ (0) 0. The corresponding MMSE filter, For k = 0 we set x specified by the vector f , is obtained by minimizing the mean (k) square error (MSE) ε2 = E[xi − f † yi 2 ] with respect to the vector f . Thus, (k)

fi

= E[|xi |2 ] −1 (k) 2 † † 2 · E[|xj − xˆj | ]hj hj + E[|xi | ]hi hi + N0 Ir hi j=i

j=i

where the first term is the useful signal, the second term is the residual spatial interference, and the third term is the filtered noise. Under the unbiasedness constraint [4] we normalize (k) (k) † the filter fi such that fi hi = 1. Hence, the normalization constant is given by ⎡ ⎤−1 (k) (k) 1 − vj hj h†j + hi h†i + δs Ir ⎦ hi αi = h†i ⎣ j=i

The above equations can be rewritten in a more convenient form by defining the diagonal matrix 2 (k) Σi = (10) (k) (k) (k) (k) diag 1 − v1 , . . . , 1 − vi−1 , 1, 1 − vi+1 , . . . , 1 − vt Hence, (9) can be given the form (k) fi

=

1 (k)

αi

H

(k) Σi

2

†

H + δs Ir

−1

hi

(11)

This expression requires the inversion of an r × r matrix. In order to reduce the computational complexity when r > t we can use the following property −1 −1 AA† + δs Ir A = A A† A + δ s I t (12) which holds for a generic r × t matrix A and a scalar δs . In (k) our case by defining the matrix A = HΣi we get (k)

fi

−1 1 AA† + δs Ir ai (k) αi

=

(13)

(k) = 1 by definition, the kth column of A is Since Σi ii ai = hi . By applying (12) to (13) we obtain (k)

fi

=

1 (k) αi

(k)

HΣi gi

where gi is the i-th column of the matrix G and −1 (k) (k) G = Σi H† HΣi + δs It

(14)

(15)

108

IEEE COMMUNICATIONS LETTERS, VOL. 9, NO. 2, FEBRUARY 2005

100

independence between the noise term and the residual spatial 2 (k) (k) interference term, and Σ(k) = diag(1 − v1 , . . . , 1 − vt ). Eq. (17) has a computational complexity exponential with mt. To reduce it, we may substitute for K(k) a diagonal matrix obtained by nulling its off-diagonal entries [2].

Outage IC+MMSE, k=1 IC+MMSE, k=4 MMSE+IC, k=0 MMSE+IC, k=1 MMSE+IC, k=4

FER

10-1

IV. C OMPARISONS

10-2

10-3 -12

-11

-11

-9

-8

-7 -6 Eb/N0(dB)

-5

-4

-3

-2

Fig. 3. Performance comparison of both MMSE receivers on a block fading channel. The Frame Error Rate (FER) is plotted versus the signal-to-noise ratio Eb /N0 for a system with t = r = 16. The solid line without markers represents the outage-probability ideal limit; the solid lines with markers describe the performance of our MMSE receiver for k = 0, 1, 4 interferencecancellation iterations. The LLRs are computed by approximating K(k) with a diagonal matrix. (k)

and similarly the normalization constant is αi = (k) gi† Σi H† hi Notice that (15) only requires the inversion of a (k) t × t matrix. Also for k = 0, Σi = It , and (14) is precisely the ith column of the filter matrix corresponding to G in (5). Hence, the two receivers are equivalent. The signal at the output of the i-th MMSE filter at iteration (k) (k) † (k) (k) (k) k is then givenby y˜i =fi yi = xi + νi where νi = † † (k) (k) (k) + fi z is the residual noise plus hj xj − x ˆj j=i fi (k)

spatial interference term. Interestingly νi can be rewritten as (k) (k) † (k) † ˆ (k) + fi z x−x (16) νi = fi H − ei

where the vector ei = [0, . . . , 0, 1, 0, . . . , 0] has zero entries except for a 1 at position i. Hence, the output of the t matched filters at time and at iteration i can be grouped † (k) = x + (F(k) H − It )(x − ˜ (k) as follows y into a vector y † (k) (k) (k) ) + F(k) z where F(k) = [f1 , . . . , ft ]. Notice also that x T T It = [e1 , . . . , et ]. The SISO decoder computes the LLR of the coded bits according to the following expression † (k) (k) −1 (k) e−(y˜ −x) (K ) (y˜ −x) Λ(ci,l ) = log

x=f (ci ) / ci,l =1

e−(y˜

(k)

†

−xi )

−1

(K(k) ) (y˜ (k) −x)

x=f (ci ) / ci,l =0

(17) where 2

†

†

K(k) = Es S(k) Σ(k) S(k) + N0 F(k) F(k)

(18)

is the covariance matrix of the filtered noise and residual spa† tial interference and S(k) = F(k) H − It . In (18) we assumed

In Fig. 3 we compare the performance of both receivers for t = r = 16. QPSK (m = 2) is used, along with a rate-1/2 turbo code obtained by parallel concatenation and puncturing of two rate-1/2, 4-state equal recursive systematic convolutional codes. At the receiver, q = 8 turbo decoder iterations are performed for each IC iteration. The code word length is N = 130. The complexity of the two receivers differs in the computation and the operation of their MMSE filters. The former requires the inversion of a t×t matrix, which has a complexity proportional to t3 . This computation is done only once in MMSE+IC, and (k + 1)t times in IC+MMSE, k the number of IC iterations. If N t, k, the resulting complexity can be neglected. In IC+MMSE the signal is filtered k + 1 times, but only once in MMSE+IC. This leads to a complexity increase. Yet, most of the complexity is in the decoder [2], which is common to both receivers. Detailed calculations show that the overall complexity increase of IC+MMSE does not exceed 20% (in the example of Fig. 3, this is about 5%). V. C ONCLUSIONS We have compared two suboptimal doubly-iterative receivers for turbo space-time codes, characterized by pre- and post-interference cancellation filtering. We have shown that the second receiver outperforms the first, at the price of a moderate complexity increase. R EFERENCES [1] E. Biglieri, A. Nordio, and G. Taricco, “Suboptimum receiver interfaces and space-time codes,” IEEE Trans. Signal Processing, vol. 51, pp. 2720-2728, Nov. 2003. [2] E. Biglieri, A. Nordio, and G. Taricco, “Doubly-iterative decoding of space time turbo codes with a large number of antennas,” IEEE Int. Conf. Commun. (ICC 2004), Paris, France, pp. 473-477. [3] J. Boutros and G. Caire, “Iterative multiuser joint decoding: unified framework and asymptotic analysis,” IEEE Trans. Inform. Theory, vol. 48, pp. 1772-1793, July 2002. [4] G. Caire, R. M¨uller, and T. Tanaka, “Iterative multiuser joint decoding: optimal power allocation and low-complexity implementation,” IEEE Trans. Inform. Theory, vol. 50, pp. 1950-1973, Sept. 2004. [5] A. Lampe et al., “A novel iterative multiuser detector for complex modulation schemes,” IEEE J. Select. Areas Commun., vol. 20, pp. 339350, Feb. 2002. [6] A. Paulraj, R. Nabar, and D. Gore, Introduction to Space-Time Wireless Communications. Cambridge, U.K.: Cambridge University Press, 2003. [7] A. Stefanov and T. M. Duman, “Turbo-coded modulation for systems with transmit and receive antenna diversity over block fading channels: system model, decoding approaches, and practical considerations,” IEEE J. Select. Areas Commun., vol. 19, pp. 958-968, May 2001. [8] X. Wang and H. V. Poor, “Iterative (turbo) soft interference cancellation and decoding for coded CDMA,” IEEE Trans. Commun., vol. 47, pp. 1046-1061, July 1999.

IEEE COMMUNICATIONS LETTERS, VOL. 9, NO. 2, FEBRUARY 2005

MIMO Doubly-Iterative Receivers: Pre- vs. Post-Cancellation Filtering Ezio Biglieri, Fellow, IEEE, Alessandro Nordio, Member, IEEE, and Giorgio Taricco, Senior Member, IEEE

Abstract— We describe and analyze two receivers for turbo space–time codes. These are doubly-iterative, and include iterations for turbo decoding as well as for spatial-interference canceling. The first receiver has a minimum-mean-square-error filter preceding the canceler, while in the second one the filter follows the canceler. The latter outperforms the former at the price of a modest complexity increase. Index Terms— MIMO systems, space-time codes, iterative receivers.

I. I NTRODUCTION

R

ECENTLY, multiple-antenna techniques have been recognized to be capable of greatly increasing the spectral efficiency of wireless systems (see, e.g., [6] and the references therein). A major issue is the complexity reduction of optimum decoding, achieved by receiver interfaces that retain a close-tooptimum performance while keeping a moderate complexity: this would remove the practical restriction to small signal constellations or few antennas. In [1], an iterative spatialinterference canceler was shown to provide a good tradeoff between complexity and performance. In the receiver interface, the signals are first combined through a linear Minimum Mean Square Error (MMSE) filter, then spatial interference is reduced by feeding back hard decisions provided by the decoder. The receiver is doubly iterative, in the sense that preliminary results obtained from a few iterations of the turbo decoding algorithm are used to reduce spatial interference. After this reduction, further turbo-decoding iterations are performed in order to improve on the interference cancellation, and so on. In this Letter, we elaborate on the concept of double-iterative decoder by examining two different architectures: in the first one, MMSE filtering precedes interference cancellation, while in the second MMSE filtering follows it. It is expected (and actually confirmed by our results) that the second architecture outperforms the first one (the rationale behind this statement follows from results described, in the context of multiuser detection (MUD), in [3]–[5], [8]) at the price of some additional complexity, which under certain conditions might be marginal. Both architectures improve considerably on suboptimum space–time turbo decoders previously presented in the literature [7] when the number of antennas is large.

Manuscript received April 28, 2004. The associate editor coordinating the review of this letter and approving it for publication was Dr. Sarah Kate Wilson. This work was supported by CERCOM. The authors are with Politecnico di Torino, Italy (e-mail: [email protected]). Digital Object Identifier 10.1109/LCOMM.2005.02032.

II. S YSTEM M ODEL We consider a multiple-input, multiple-output (MIMO) system with t transmit and r receive antennas described over N consecutive symbol intervals by the equation Y = HX + Z

(1)

t×N

where X ∈ C represents a code word of transmitted r×N symbols, Z ∈ C is the noise matrix with iid entries r×t distributed as Nc (0, N0 ), and H = (h1 , . . . , ht ) ∈ C is the channel random matrix, whose entries (H)ij are iid as circularly-distributed complex Gaussian random variables with zero mean and unit variance, i.e., ∼ Nc (0, 1). Matrix H is independent of both X and Z, and remains constant during the transmission of a code word. We assume that the receiver knows the realizations of the channel matrix H, i.e., it has perfect channel state information (CSI) available. r×N is the received signal. A binary “Turbo” Finally, Y ∈ C channel encoder of rate Rc , followed by an interleaver, maps an information bit vector b onto an interleaved encoded bit vector c. The latter is serial-to-parallel converted, and fed to t modulators that output the matrix code word X = (x1 , . . . , xN ) = (xi, )t,N i,=1

(2)

transmitted on the MIMO channel. The row index of X indicates space and the column index indicates time. Thus, the element xi, is transmitted from antenna i ∈ {1, . . . , t} at time ∈ {1, . . . , N }. Let Es E[|xi, |2 ] denote the average transmitted symbol energy. Splitting the encoded vector c in the tN vectors ci (c(i−1)m+1, , . . . , cim, )T (i = 1, . . . , t and = 1, . . . , N ) and denoting the ith antenna modulator map as fi (·) we have xi, = fi (ci, ). Then, the transmitted signal vector at time can be written as x = f (c ) = (f1 (c1 ), . . . , ft (ct ))T . Eq. (1) can be specialized as follows: y = Hx + z

= 1, . . . , N

(3)

III. R ECEIVERS We compare the performances of two receiver interfaces illustrated in Figs. 1 and 2. The former, referred to in the following as MMSE+IC, has the MMSE filter located before the interference cancellation (IC) loop. The latter (IC+MMSE) interface has the MMSE filter located inside the interference cancellation (IC) loop. The IC+MMSE complexity is higher than for the MMSE+IC, since the MMSE filter has to be calculated at each iteration.

c 2005 IEEE 1089-7798/05$20.00

NAME et al.: MIMO DOUBLY-ITERATIVE RECEIVERS: PRE- VS. POST-CANCELATION FILTERING

Assuming, as in [5], that (k) ˆ (k) = x ˆj E xj | x

MMSE

^X(k) Y

Fig. 1.

Interleave.

~ Y IC

Deinterleave.

LLR

~ (k) Y

Turbo decoder

^ b

yields

Λ

(k)

IC

(k)

LLR (k) Yt

MMSE-2

~(k) Yt

2

Deinterleave.

=

1 Es

Turbo decoder

fi

^ b

=

x ˆj

(k) = Es 1 − v j

(k)

(7)

(8)

2

(k)

1

αi

(k) xj . We define the ith normalized E

ˆ (k)

MMSE filter vector fi

Interleave.

~(k) Y1 Λ

Y

where vj

^X(k) MMSE-1

ˆ (k) = Es E |xj |2 | x

E xj − xˆ(k) j

MMSE+IC receiver block diagram.

(k) Y1

107

as

(k)

1 − vj

hj h†j + hi h†i + δs Ir

(9)

−1 hi

j=i (k)

Fig. 2.

To compute the normalization constant αi we notice that (k) † (k) † (k) (k) † (k) (k) † fi hj xj − x ˆj fi yi = fi hi xi + + fi z

IC+MMSE receiver block diagram.

A. MMSE+IC Receiver This receiver was studied by the authors in [2]. The received signal is first passed through an MMSE filter whose output is = GY = X + LX + GZ Y

(4)

G D−1 A

(5)

where

and A (H† H + δs It )−1 H† , D diag(AH), L (D−1 AH−It ), and δs = N0 /Es . The turbo decoder provides (k) of the transmitted signal X (k = 1, 2, . . . soft estimates X is an iteration index). The output of the IC block at iteration k is − LX ˆ (k) (k) = Y (6) Y B. IC+MMSE Receiver This interface is illustrated in Fig. 2. It is based on a bank of t MMSE filters, one for each transmit antenna, located inside the IC loop and hence to be updated at each iteration step. This is expected to outperform MMSE+IC, since now the filter can also mitigate the residual interference. In the following, we assess the validity of the latter statement, the amount of improvement achieved with practical space–time codes, and the complexity increase needed. Consider the th signalling interval, and let the output at iteration k of the interference canceller corresponding to antenna i = 1, . . . , t be given by (k+1) (k) ˆ (k) + x i =H x−x ˆi hi + z y (k)

(k)

ˆ (k) = [ˆ where x x1 , . . . , x ˆt ]T , is the th generic column of (k) X and represents the decoder output at iteration k for k > 0. ˆ (0) 0. The corresponding MMSE filter, For k = 0 we set x specified by the vector f , is obtained by minimizing the mean (k) square error (MSE) ε2 = E[xi − f † yi 2 ] with respect to the vector f . Thus, (k)

fi

= E[|xi |2 ] −1 (k) 2 † † 2 · E[|xj − xˆj | ]hj hj + E[|xi | ]hi hi + N0 Ir hi j=i

j=i

where the first term is the useful signal, the second term is the residual spatial interference, and the third term is the filtered noise. Under the unbiasedness constraint [4] we normalize (k) (k) † the filter fi such that fi hi = 1. Hence, the normalization constant is given by ⎡ ⎤−1 (k) (k) 1 − vj hj h†j + hi h†i + δs Ir ⎦ hi αi = h†i ⎣ j=i

The above equations can be rewritten in a more convenient form by defining the diagonal matrix 2 (k) Σi = (10) (k) (k) (k) (k) diag 1 − v1 , . . . , 1 − vi−1 , 1, 1 − vi+1 , . . . , 1 − vt Hence, (9) can be given the form (k) fi

=

1 (k)

αi

H

(k) Σi

2

†

H + δs Ir

−1

hi

(11)

This expression requires the inversion of an r × r matrix. In order to reduce the computational complexity when r > t we can use the following property −1 −1 AA† + δs Ir A = A A† A + δ s I t (12) which holds for a generic r × t matrix A and a scalar δs . In (k) our case by defining the matrix A = HΣi we get (k)

fi

−1 1 AA† + δs Ir ai (k) αi

=

(13)

(k) = 1 by definition, the kth column of A is Since Σi ii ai = hi . By applying (12) to (13) we obtain (k)

fi

=

1 (k) αi

(k)

HΣi gi

where gi is the i-th column of the matrix G and −1 (k) (k) G = Σi H† HΣi + δs It

(14)

(15)

108

IEEE COMMUNICATIONS LETTERS, VOL. 9, NO. 2, FEBRUARY 2005

100

independence between the noise term and the residual spatial 2 (k) (k) interference term, and Σ(k) = diag(1 − v1 , . . . , 1 − vt ). Eq. (17) has a computational complexity exponential with mt. To reduce it, we may substitute for K(k) a diagonal matrix obtained by nulling its off-diagonal entries [2].

Outage IC+MMSE, k=1 IC+MMSE, k=4 MMSE+IC, k=0 MMSE+IC, k=1 MMSE+IC, k=4

FER

10-1

IV. C OMPARISONS

10-2

10-3 -12

-11

-11

-9

-8

-7 -6 Eb/N0(dB)

-5

-4

-3

-2

Fig. 3. Performance comparison of both MMSE receivers on a block fading channel. The Frame Error Rate (FER) is plotted versus the signal-to-noise ratio Eb /N0 for a system with t = r = 16. The solid line without markers represents the outage-probability ideal limit; the solid lines with markers describe the performance of our MMSE receiver for k = 0, 1, 4 interferencecancellation iterations. The LLRs are computed by approximating K(k) with a diagonal matrix. (k)

and similarly the normalization constant is αi = (k) gi† Σi H† hi Notice that (15) only requires the inversion of a (k) t × t matrix. Also for k = 0, Σi = It , and (14) is precisely the ith column of the filter matrix corresponding to G in (5). Hence, the two receivers are equivalent. The signal at the output of the i-th MMSE filter at iteration (k) (k) † (k) (k) (k) k is then givenby y˜i =fi yi = xi + νi where νi = † † (k) (k) (k) + fi z is the residual noise plus hj xj − x ˆj j=i fi (k)

spatial interference term. Interestingly νi can be rewritten as (k) (k) † (k) † ˆ (k) + fi z x−x (16) νi = fi H − ei

where the vector ei = [0, . . . , 0, 1, 0, . . . , 0] has zero entries except for a 1 at position i. Hence, the output of the t matched filters at time and at iteration i can be grouped † (k) = x + (F(k) H − It )(x − ˜ (k) as follows y into a vector y † (k) (k) (k) ) + F(k) z where F(k) = [f1 , . . . , ft ]. Notice also that x T T It = [e1 , . . . , et ]. The SISO decoder computes the LLR of the coded bits according to the following expression † (k) (k) −1 (k) e−(y˜ −x) (K ) (y˜ −x) Λ(ci,l ) = log

x=f (ci ) / ci,l =1

e−(y˜

(k)

†

−xi )

−1

(K(k) ) (y˜ (k) −x)

x=f (ci ) / ci,l =0

(17) where 2

†

†

K(k) = Es S(k) Σ(k) S(k) + N0 F(k) F(k)

(18)

is the covariance matrix of the filtered noise and residual spa† tial interference and S(k) = F(k) H − It . In (18) we assumed

In Fig. 3 we compare the performance of both receivers for t = r = 16. QPSK (m = 2) is used, along with a rate-1/2 turbo code obtained by parallel concatenation and puncturing of two rate-1/2, 4-state equal recursive systematic convolutional codes. At the receiver, q = 8 turbo decoder iterations are performed for each IC iteration. The code word length is N = 130. The complexity of the two receivers differs in the computation and the operation of their MMSE filters. The former requires the inversion of a t×t matrix, which has a complexity proportional to t3 . This computation is done only once in MMSE+IC, and (k + 1)t times in IC+MMSE, k the number of IC iterations. If N t, k, the resulting complexity can be neglected. In IC+MMSE the signal is filtered k + 1 times, but only once in MMSE+IC. This leads to a complexity increase. Yet, most of the complexity is in the decoder [2], which is common to both receivers. Detailed calculations show that the overall complexity increase of IC+MMSE does not exceed 20% (in the example of Fig. 3, this is about 5%). V. C ONCLUSIONS We have compared two suboptimal doubly-iterative receivers for turbo space-time codes, characterized by pre- and post-interference cancellation filtering. We have shown that the second receiver outperforms the first, at the price of a moderate complexity increase. R EFERENCES [1] E. Biglieri, A. Nordio, and G. Taricco, “Suboptimum receiver interfaces and space-time codes,” IEEE Trans. Signal Processing, vol. 51, pp. 2720-2728, Nov. 2003. [2] E. Biglieri, A. Nordio, and G. Taricco, “Doubly-iterative decoding of space time turbo codes with a large number of antennas,” IEEE Int. Conf. Commun. (ICC 2004), Paris, France, pp. 473-477. [3] J. Boutros and G. Caire, “Iterative multiuser joint decoding: unified framework and asymptotic analysis,” IEEE Trans. Inform. Theory, vol. 48, pp. 1772-1793, July 2002. [4] G. Caire, R. M¨uller, and T. Tanaka, “Iterative multiuser joint decoding: optimal power allocation and low-complexity implementation,” IEEE Trans. Inform. Theory, vol. 50, pp. 1950-1973, Sept. 2004. [5] A. Lampe et al., “A novel iterative multiuser detector for complex modulation schemes,” IEEE J. Select. Areas Commun., vol. 20, pp. 339350, Feb. 2002. [6] A. Paulraj, R. Nabar, and D. Gore, Introduction to Space-Time Wireless Communications. Cambridge, U.K.: Cambridge University Press, 2003. [7] A. Stefanov and T. M. Duman, “Turbo-coded modulation for systems with transmit and receive antenna diversity over block fading channels: system model, decoding approaches, and practical considerations,” IEEE J. Select. Areas Commun., vol. 19, pp. 958-968, May 2001. [8] X. Wang and H. V. Poor, “Iterative (turbo) soft interference cancellation and decoding for coded CDMA,” IEEE Trans. Commun., vol. 47, pp. 1046-1061, July 1999.