Space-Time Block Coding for Single-Carrier Block ... - CiteSeerX

8 downloads 0 Views 382KB Size Report
Nov 1, 2002 - Section III generalizes the case of NT = 2 transmit antennas to more than ...... [2] I. Ghauri and D.T.M. Slock, “Linear receivers for the DS-CDMA ...
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS (REVISED)

Space-Time Block Coding for Single-Carrier Block Transmission DS-CDMA Downlink Frederik Petr´e 1? Marc Engels 1§ 1

Geert Leus 2‡ Marc Moonen 2

Luc Deneire 3 Hugo De Man 1§

Interuniversity Micro-Electronics Center (IMEC) Wireless Systems (WISE) Kapeldreef 75, B-3001 Leuven, Belgium

E-mail : {frederik.petre,marc.engels,hugo.deman}@imec.be 2

Katholieke Universiteit Leuven (KULeuven)

Department of Electrical Engineering (ESAT) Kasteelpark Arenberg 10, B-3001 Leuven, Belgium E-mail : {geert.leus,marc.moonen}@esat.kuleuven.ac.be 3

University of Nice I3S Laboratory

Route des Lucioles 2000, F-06903 Sophia-Antipolis Cedex, France E-mail : [email protected]

Submission date : November 1, 2002 Special Issue : MIMO Systems and Applications Topic of Interest : MIMO ST Coding and Signal Processing Algorithms

?

Also Ph.D. student at KULeuven, E.E. Dept., ESAT/INSYS, supported by the I.W.T.-Flanders (Belgium). Postdoctoral Fellow of the F.W.O.-Flanders (Belgium). § Also Professor at KULeuven, E.E. Dept., ESAT/INSYS



November 4, 2002

DRAFT

1

Abstract To combat fading and achieve high performance over wireless channels, multi-antenna techniques, and more specifically Space-Time Block Coding (STBC) techniques, enable multi-antenna diversity gains over Multiple-Input Multiple-Output (MIMO) channels. Besides, Direct-Sequence Code Division Multiple Access (DS-CDMA) has emerged as the predominant multiple access technique for 3G systems because it increases capacity and facilitates network planning in a cellular network. Hence, the combination of STBC and DS-CDMA has the potential to increase the performance of multiple users in a cellular network. However, this combination introduces multiple access aspects that have hardly been addressed in current research on MIMO communications. Indeed, if not carefully designed, the resulting transmission scheme suffers from increased MultiUser Interference (MUI), which dramatically deteriorates the performance. To tackle this MUI problem in the downlink, we combine two specific DS-CDMA and STBC techniques, namely Single-Carrier Block Transmission (SCBT) DS-CDMA and Time-Reversal (TR) STBC. The resulting transmission scheme allows for deterministic Maximum Likelihood (ML) user separation through low-complexity code-matched filtering as well as deterministic ML transmit stream separation through linear processing. Moreover, it can achieve maximum diversity gains of N T NR (L + 1) for every user in the system, irrespective of the system load, where NT is the number of transmit antennas, NR the number of receive antennas, and L the order of the underlying multi-path channels. In addition, it turns out that a low-complexity linear receiver based on frequency-domain equalization comes close to extracting the full diversity in reduced as well as full load settings. In this perspective, we also develop two (Recursive) Least Squares methods for direct equalizer estimation. Both methods exploit the presence of a Code Division Multiplexed Pilot (CDMP) but differ in the amount of additional a-priori information they assume to determine the equalizer coefficients. Simulation results demonstrate the outstanding performance of the proposed transceiver compared to competing alternatives. Keywords MIMO, Space-Time Block Coding, DS-CDMA, Block Transmission, Frequency-selective fading channels, Equalizer estimation

I. I NTRODUCTION Direct-Sequence Code Division Multiple Access (DS-CDMA) has emerged as the predominant multiple access technique for 3G cellular systems because it increases capacity and facilitates network planning in a cellular network. In the downlink, these systems rely on the orthogonality of the spreading codes to separate the different user signals. However, for increasing chip rates, the time-dispersive nature of the multi-path channel destroys the orthogonality amongst users, giving rise to Multi-User Interference (MUI). Chip-level equalization completely or partially restores the orthogonality and hence allows to suppress the MUI; see e.g. [1–4]. However, this requires multiple receive antennas (two for chip rate sampling in a single-cell context) to guarantee a Zero-Forcing (ZF) solution for any well-conditioned channels [5]. On the other hand, Single-Carrier Block Transmission (SCBT) DS-CDMA, recently proNovember 4, 2002

DRAFT

2

posed as Chip-Interleaved Block-Spread (CIBS) CDMA in [6], only requires a single receive antenna because it effectively deals with the frequency-selectivity of the channel through Zero Padding (ZP) the chip blocks. Moreover, SCBT-DS-CDMA preserves the orthogonality amongst users regardless of the underlying multi-path channel, which enables deterministic Maximum Likelihood (ML) user separation through low-complexity code-matched filtering [6]. Increased equalization flexibility and reduced complexity are other benign properties that favor SCBT-DS-CDMA for broadband downlink transmission compared to conventional DS-CDMA [7]. However, the spectral efficiency and hence the user data rate of Single-Input Single-Output (SISO) SCBT-DS-CDMA systems is limited by the received signal-to-noise ratio. As opposed to SISO systems, Multiple-Input Multiple-Output (MIMO) systems that employ N T transmit and NR receive antennas, have been shown to realize an Nmin -fold capacity increase in rich scattering environments [8–10], where Nmin = min {NT , NR } is called the multiplexing gain. Both Spatial Multiplexing and Space-Time Coding are popular MIMO communication techniques that do not require any Channel State Information (CSI) at the transmitter. On the one hand, Spatial Multiplexing (SM), a.k.a. BLAST, achieves high spectral efficiencies by transmitting independent data streams from the different transmit antennas [11] (see also [12]). SM requires, however, at least as many receive as transmit antennas NR ≥NT , which seriously impairs cost-efficient implementations at the mobile station. On the other hand, Space-Time Coding (STC) achieves high Quality-of-Service (QoS) through diversity and coding gains by introducing both spatial and temporal correlations between the transmitted signals [13–15]. As opposed to SM, STC enables cheap implementations at the mobile station, since it supports any number of receive antennas. In this perspective, Space-Time Block Coding (STBC) techniques, introduced in [14] for NT = 2 transmit antennas, and later generalized in [15] for any number of transmit antennas, are particularly appealing because they facilitate ML detection with simple linear processing. However, these STBC techniques have been originally designed for frequency-flat fading channels exploiting only multi-antenna diversity of order NT ·NR . Therefore, Time-Reversal (TR) STBC techniques, originally proposed in [16] for single-carrier serial transmission, have been recently combined with Single-Carrier Block Transmission (SCBT) in [17] and [18] for signalling over frequency-selective fading channels. While [17] only exploits multi-antenna diversity, [18] exploits multi-antenna as well as multi-path diversity and achieves maximum diversity gains of order NT NR (L + 1) over frequency-selective fading channels, where L is the order of the underlying multi-path channels. Up till now, research on STBC techniques has mainly focused on point-to-point MIMO communication November 4, 2002

DRAFT

3

links, thereby neglecting the multiple access technique in the design of the transmission scheme. Given the recent success of DS-CDMA, this clearly justifies the need for a transceiver that combines STBC with DS-CDMA. However, if not designed properly, the resulting communication scheme suffers from an NT -fold increase of the MUI, which has a detrimental effect on the performance. In this paper, we combine TR-STBC with SCBT-DS-CDMA for point-to-multipoint (downlink) frequency-selective fading MIMO communication links. The proposed transceiver preserves the orthogonality amongst users as well as transmit streams regardless of the underlying multi-path channels. This allows for deterministic ML user separation through low-complexity code-matched filtering as well as deterministic ML transmit stream separation through linear processing. Moreover, applying ML Viterbi equalization for every transmit stream separately guarantees symbol recovery. Therefore, maximum diversity gains of NT ·NR ·(L + 1) can be achieved for every user in the system, irrespective of the system load. We will pursue, however, a slightly suboptimal approach that combines the transmit stream separation and equalization steps into one single linear equalization step. This enables the design of low-complexity equalizer estimation algorithms, that circumvent the need for explicit channel estimation. In this perspective, we propose two (Recursive) Least Squares ((R)LS) methods for direct equalizer estimation that act on a per-tone basis in the Frequency Domain (FD). Both methods exploit the presence of a Code Division Multiplexed Pilot (CDMP) (similar to the Common PIlot CHannel (CPICH) in 3G systems [19]) but differ in the amount of additional a-priori information they assume to determine the equalizer coefficients. Moreover, they benefit from the ML user separability of the proposed transceiver. Some parts of this work were presented in our earlier publication [20]. Another alternative to remove MUI deterministically in a ST coded multi-user setup has been recently reported in [21] for single-rate and in [22] for multi-rate services. The transceiver of [21] and [22] combines Generalized Multi-Carrier (GMC) CDMA, originally developed in [23], with the STBC techniques of [15] but implemented on a per-carrier basis. However, similar to all MC systems, single-user GMCCDMA transmissions generally suffer from high Peak-to-Average-Power-Ratio (PAPR), which requires expensive RF front-ends and/or additional signal processing to compensate for the non-linear effects introduced by the front-end 1 . As opposed to [21] and [22], our transceiver is based on SCBT-DS-CDMA which allows for perfectly Constant-Modulus (CM) single-user transmissions, at least if we replace the zeros by known symbols [6, 24]. The paper is organized as follows. Section II details the downlink transceiver design for the case of 1

There exist special cases of GMC-CDMA that have no PAPR problem, but they constitute a special SCBT-DS-CDMA design, see e.g. [23]

November 4, 2002

DRAFT

4

NT = 2 transmit antennas, showing that a ST block-coded multi-user detection problem can be succesfully transformed into a number of independent ST block-coded single-user equalization problems. Section III generalizes the case of NT = 2 transmit antennas to more than two transmit antennas NT > 2. Section IV develops the different Least Squares (LS) equalizer estimation methods. Simulation results are presented in Section V, while conclusions are drawn in Section VI. Notation : We use roman letters to represent scalars, lower boldface letters to denote vectors (i.e. blocks) and upper boldface letters to denote matrices (i.e. a collection of blocks). (·) ∗ , (·)T , and (·)H represent conjugate, transpose, and Hermitian, respectively. Further, | · | and k · k represent the absolute value and Frobenius norm, respectively. We reserve E{·} for expectation, b·c for integer flooring and ? for linear convolution. Subscripts nt and nr point to the nt -th transmit and the nr -th receive antenna respectively. Superscripts m, p and d point to the m-th user, the pilot and the data respectively. Arguments i and n denote symbol (block) and chip (block) indices respectively. At the transmitter, letter s is devoted to symbols, c to codes, x to “internal” chips and u to transmitted chips. h denotes the channel, w the noise and v the received signal. y and z respectively denote the received signal and the noise after the receive matrix R. B is the number of data symbols per block and K is the number of transmitted symbols per block. Tilded symbols x˜ denote frequency domain signals, upperlined symbols x¯ denote space-time block ´ group signals on a per-tone basis. coded signals and acuted vectors x II. D OWNLINK

TRANSCEIVER DESIGN

Let us consider the downlink of a ST block-coded SCBT-DS-CDMA system with M active mobile stations. The base station is equipped with NT transmit (TX) antennas whereas the mobile station of interest is equipped with NR receive (RX) antennas. In the following, we will first describe the base station transmitter design followed by the channel model and the mobile station receiver design. A. Transmitter design The block diagram in Fig. 1 describes the ST block-coded downlink SCBT-DS-CDMA transmission scheme (where only the m-th user is explicitly shown), that transforms the M user data symbol sep quences {sm [i]}M m=1 and the pilot symbol sequence s [i] into NT ST block-coded multi-user chip seT quences {unt [n]}N nt =1 with a rate

1 . Tc

For conciseness, we limit ourselves to the case of NT = 2 transmit

antennas and treat the more general case of NT > 2 transmit antennas in Section III. Referring back to Fig. 1, the m-th user’s data symbol sequence sm [i] (similarly for the pilot symbol sequence sp [i]) is

November 4, 2002

DRAFT

5

 NT m first demultiplexed into NT parallel lower rate sequences sm nt [i] := s [iNT + nt − 1] nt =1 . Unlike the traditional approach of symbol spreading that operates on a single symbol, we apply here block spreading

that operates on a block of symbols; block spreading has also been used in e.g. [25] and [6]. Specifically, NT  each of the sequences sm nt [i] nt =1 is serial-to-parallel converted into blocks of B symbols, leading to the n  T o NT  m m [(i + 1)B − 1] that are subsequently spread [iB], . . ., s [i] := s symbol block sequences sm nt nt nt nt =1

by a factor N with the user composite code sequence cm [n] (pilot composite code sequence cp [n]). For example, cm [n] is the multiplication of a short orthogonal Walsh-Hadamard spreading code that is mobile station specific (pilot specific) and a long overlay scrambling code that is base station specific. At this point, it is important to note that all NT parallel symbol block sequences are spread by the same composite code sequence. For each of the NT parallel streams, the different user chip block sequences and the pilot chip block sequence are added, resulting into the nt -th multi-user chip block sequence : xnt [n] =

M X

m=1

m p p sm nt [i]c [n] + snt [i]c [n],

i=

jnk N

.

(1)

For future discussions, we find it convenient to also define the m-th user’s total data symbol block seh iT T m T quence sm [i] := sm [i] , s [i] (similarly for the total pilot symbol block sequence sp [i]) and the total 1 2 h iT multi-user chip block sequence x[n] := x1 [n]T , x2 [n]T . Unlike the traditional approach of performing ST encoding at the symbol level (see e.g. [14] and [15]),

we perform ST encoding at the block level (see also Fig. 1); this was also done in e.g. [17, 26] and [18]. Following the TR-STBC approach for single-user point-to-point SCBT [18], we apply the TR-STBC approach to SCBT-DS-CDMA. Our ST encoder operates in the Time-Domain (TD) at the chip block level rather than at the symbol block level and takes the two multi-user chip blocks {x nt [n]}2nt =1 to output the following 2B × 2 matrix of ST coded multi-user chip blocks :     (0) ∗ ¯ [2n] x ¯ 1 [2n + 1] x [n] −PB · x2 [n] x = 1 ,  1 (0) ∗ ¯ 2 [2n] x ¯ 2 [2n + 1] x x2 [n] PB · x1 [n]

(2)

(j)

where PJ is a J × J permutation matrix implementing a time-reversal followed by a cyclic shift over j ¯ 1 [n] and x ¯ 2 [n] are forwarded positions [18]. At each time interval n, the ST coded multi-user chip blocks x to the first respectively the second transmit antenna. From (2), we can easily verify that the transmitted multi-user chip block at time instant 2n + 1 from one antenna is the time-reversed conjugate of the transmitted multi-user chip block at time instant 2n from the other antenna (with a possible sign change). (0)

For frequency-flat fading channels, symbol blocking is not necessary, so B = 1 and P B = 1, and the ST November 4, 2002

DRAFT

6

encoder of (2) reduces to the well known Alamouti scheme [14]. However, for frequency-selective fading (0)

channels, the permutation matrix PB is necessary to facilitate the exploitation of both multi-antenna as well as multi-path diversity [18]. At each transmit antenna nt ∈ {1, 2}, the K × B zero padding transmit matrix Tzp := [IB , 0B×µ ]T , ¯ nt [n] : unt [n] = Tzp · x ¯ nt [n], the purpose of which with K = B + µ, pads a zero postfix of length µ to x will be clarified later. The resulting transmitted multi-user chip block sequence u nt [n] is parallel-to-serial converted into the corresponding scalar sequence [unt [nK], . . ., unt [(n + 1)K − 1]]T := unt [n]. Finally, the ST coded multi-user chip sequence unt [n] is pulse shaped (not shown in Fig. 1) to the corresponding P∞ continuous-time signal unt (t) := n=−∞ unt [n]ψ(t − nTc ), where Tc is the chip period and ψ(t) the transmit filter.

B. Channel model The NT transmit antennas at the base station together with the NR receive antennas at the mobile station create a MIMO channel with NT NR entries. The transmitted waveform unt (t) propagates to each receive antenna nr ∈ {1, . . ., NR } through a multi-path channel gnr ,nt (t) and is filtered by the receive ¯ ¯ that is matched to the transmit filter ψ(t). Let hnr ,nt [l] := (ψ ? gnr ,nt ? ψ)(t)| filter ψ(t) t=lTc be the chipsampled equivalent discrete-time FIR channel from the nt -th transmit antenna to the nr -th receive antenna. The FIR channel hnr ,nt [l] models the frequency-selective multi-path propagation including the effect of transmit and receive filters. The chip-sampled received signal at the n r -th receive antenna vnr [n] is the superposition of a channel-distorded version of the signals from the N T transmit antennas, and can be written as : vnr [n] =

NT X

nt =1

hnr ,nt [n] ? unt [n] + wnr [n] =

nr ,nt NT LX X

hnr ,nt [l]unt [n − l] + wnr [n],

(3)

nt =1 l=0

where Lnr ,nt is the order of the FIR channel from the nt -th transmit antenna to the nr -th receive antenna, and wnr [n] denotes the additive gaussian noise at the nr -th receive antenna, which we assume to be white (AWGN) with variance σw2 . Furthermore, we define L as the upperbound on the channel orders : L ≥ max {Lnr ,nt }. L can be well approximated by L ≈ bτmax /Tc c + 1, where τmax is the maximum nt ,nr

delay spread within the given propagation environment [27].

November 4, 2002

DRAFT

7

C. Receiver design The block diagrams in Fig. 2 and 3 describe the reception scheme for the mobile station of interest R (which we assume to be the m-th one), which transforms the different received sequences {v nr [n]}N nr =1

into an estimate on the desired user’s data symbol sequence sˆm [i]. Referring back to Fig. 2 and assuming perfect synchronization, at each receive antenna nr ∈ {1, . . . , NR }, the received sequence vnr [n] is serialto-parallel converted into its corresponding block sequence v nr [n] := [vnr [nK], . . ., vnr [(n + 1)K − 1]]T . From the scalar input/output relationship in (3), we can derive the corresponding block input/output relationship : vnr [n] =

NT X

(Hnr ,nt [0] · unt [n] + Hnr ,nt [1] · unt [n − 1]) + wnr [n],

(4)

nt =1

where wnr [n] := [wnr [nK], . . ., wnr [(n + 1)K − 1]]T is the noise block sequence, Hnr ,nt [0] is a K × K lower triangular Toeplitz matrix with entries [Hnr ,nt [0]]p,q = hnr ,nt [p − q] and Hnr ,nt [1] is a K × K upper triangular Toeplitz matrix with entries [Hnr ,nt [1]]p,q = hnr ,nt [K + p − q] (see e.g. [23] for a detailed derivation of the single-user SISO case). The time dispersive nature of the multi-path channel gives rise to so-called Inter Block Interference (IBI) between successive blocks, which is modeled by the second ¯ nr [n] := term in (4). The K × K receive matrix R := IK completely preserves each block of vnr [n] : y R · vnr [n] = vnr [n]. Unlike classical OFDM systems, that eliminate the IBI by discarding the cyclic prefix at the receiver, here the IBI is entirely dealt with at the transmitter. Indeed, by choosing the length of the zero postfix to be at least the maximum channel order µ ≥ L, we obtain R · H nr ,nt [1] · Tzp = 0, so the IBI is completely removed. Moreover, the zero postfix of the previous block u nt [n − 1] acts as a cyclic prefix of the current block unt [n], so that the Toeplitz matrix Hnr ,nt [0] can be safely replaced by the ˙ nr ,nt := Hnr ,nt [0]+Hnr ,nt [1] through the identity Hnr ,nt [0]·Tzp = H ˙ nr ,nt ·Tzp . K ×K circulant matrix H In this way, we obtain a simplified block input/output relationship in the TD : ¯ nr [n] = y

NT X

˙ nr ,nt · Tzp · x ¯ nt [n] + ¯znr [n], H

(5)

nt =1

where ¯znr [n] := R · wnr [n] = wnr [n] is the corresponding noise block sequence. Moreover, circulant matrices possess two nice properties that enable simple ST decoding on the one hand and simple per-tone Frequency-Domain (FD) equalization on the other hand : (k)

Property 1: Pre- and post-permuting a circulant matrix with PK yields its transpose [18] : (k) ˙ nr ,nt · P(k) = H ˙T . PK · H nr ,nt K November 4, 2002

DRAFT

8

Property 2: Circulant matrices can be diagonalized by FFT operations [28] : ˙ nr ,nt = FH · H ˜ nr ,nt · FK , H K ˜ nr ,nt ), h ˜ nr ,nt := [Hnr ,nt (ej0 ), Hnr ,nt (ej K ), . . . , Hnr ,nt (ej K (K−1) )] the FD channel ˜ nr ,nt := diag(h with H P response evaluated on the FFT grid, Hnr ,nt (z) := Ll=0 hnr ,nt [l]z −l the z-transform of hnr ,nt [l], and FK 2π



the K × K FFT matrix.

C.1 Space-Time decoding Following a similar approach as in [18] for point-to-point SCBT, ST decoding is performed in the TD at the chip block level rather than at the symbol block level. From the ST code design in (2), we can easily verify that : (B)

(8)

(B)

(9)

¯ 2 [2n]∗ , ¯ 1 [2n + 1] = −PK · Tzp · x Tzp · x ¯ 2 [2n + 1] = PK · Tzp · x ¯ 1 [2n]∗ , Tzp · x (B)

(0)

which can be interpreted as Time-Reversal ST coding with permutation matrix P K (rather than PB ) applied to the zero-padded block sequences Tzp · x1 [n]∗ and Tzp · x2 [n]∗ (rather than to the block se¯ nr [2n] and quences x1 [n]∗ and x2 [n]∗ themselves). We can now consider two consecutive chip blocks y ¯ nr [2n] ¯ nr [2n + 1] both satisfying the TD block input/output relationship of (5), and define y nr ,1 [n] := y y (B)

¯ nr [2n + 1]∗ (see also Fig. 2). By exploiting the ST code structure of (8) and (9), and ynr ,2 [n] := PK · y and by relying on Property 1, we arrive at : ˙ nr ,1 · Tzp · x ˙ nr ,2 · Tzp · x ¯ nr [2n] = H ¯ 1 [2n] + H ¯ 2 [2n] + znr ,1 [n], ynr ,1 [n] := y

(10)

(B) ˙ H · Tzp · x ˙ H · Tzp · x ¯ 2 [2n] + H ¯ 1 [2n] + znr ,2 [n], ¯ nr [2n + 1]∗ = −H ynr ,2 [n] := PK · y nr ,1 nr ,2

(11)

(B)

where znr ,1 [n] := z¯nr [2n] and znr ,2 [n] := PK · ¯znr [2n + 1]∗ are the corresponding TD noise block sequences after ST decoding. Aiming at low-complexity FD processing, we transform y nr ,1 [n] and ynr ,2 [n] ˜ nr ,1 [n] := FK · ynr ,1 [n] and y ˜ nr ,2 [n] := FK · ynr ,2 [n] (see also Fig. 2). Relying into the FD by defining y on Property 2, this leads to the following FD block input/output relationship after ST decoding : ˜ nr ,1 · FK · Tzp · x ˜ nr ,2 · FK · Tzp · x ˜ nr ,1 [n] := FK · ynr ,1 [n] = H ¯ 1 [2n] + H ¯ 2 [2n] + ˜znr ,1 [n], y

(12)

˜ ∗ · FK · Tzp · x ˜ ∗ · FK · Tzp · x ˜ nr ,2 [n] := FK · ynr ,2 [n] = −H ¯ 2 [2n] + H ¯ 1 [2n] + ˜znr ,2 [n], y nr ,1 nr ,2

(13)

November 4, 2002

DRAFT

9

where ˜znr ,1 [n] := FK · znr ,1 [n] and z˜nr ,2 [n] := FK · znr ,2 [n] are the corresponding FD noise block sequences. Combining (12) and (13) into a single block matrix form, we obtain :         ˜ ˜ ˜ [n] ˜z [n] y F · Tzp · x1 [n] H Hnr ,2  nr ,1  +  nr ,1  =  nr ,1 · K , ˜∗ ˜∗ ˜ nr ,2 [n] ˜ F · T · x [n] y z [n] H − H K zp 2 nr ,2 nr ,2 nr ,1 {z } | {z } | {z } | {z } | ˜ nr H

˜ nr [n] y

˜ [n] x

(14)

˜ znr [n]

˜ nr in (14), we can ¯ 1 [2n] = x1 [n] and x ¯ 2 [2n] = x2 [n] follow from (2). From the structure of H where x deduce that our transceiver preserves the orthogonality amongst transmit streams for each tone separately regardless of the underlying multi-path channel. This allows for deterministic ML transmit stream separation through linear processing, assuming perfect CSI. A similar property was also encountered in the classical Alamouti scheme but only for point-to-point frequency-flat fading MIMO channels [14]. h iT ˜ 1 [n]T , . . ., y ˜ NR [n]T , finally leads to the ˜ [n] := y Stacking the contributions of the NR RX antennas y following per-RX-antenna FD data model :

˜ ·x ˜ [n] = H ˜ [n] + ˜z[n], y

(15)

h iT h iT T T T T ˜ ˜ ˜ where H := H1 , . . ., HNR is the per-RX-antenna channel matrix and z˜[n] := ˜z1 [n] , . . ., ˜zNR [n]

the corresponding noise block sequence. C.2 Frequency-domain equalization

Although one could argue that a maximum diversity transmission scheme, like the one we propose here, is only useful when an ML receiver is applied, we will show in the following that it is also useful when a suboptimal receiver is applied. Unlike the approach followed by [29], that applies ML linear transmit stream separation followed by ML Viterbi equalization, we combine these operations into one single linear equalization step, at the expense of some loss in ML optimality. As will be shown in Section IV, this suboptimal approach enables the design of low-complexity per-tone equalizer estimation algorithms, that circumvent the need for explicit channel estimation. ´ [n] and the per-tone output block y ´ [n] as : Defining the per-tone input block x

November 4, 2002

´ [n] := PT · x ˜ [n] = [´ ´ K [n]]T , x x1 [n], . . . , x

(16)

´ [n] := PR · y ˜ [n] = [´ ´ K [n]]T , y y1 [n], . . . , y

(17)

DRAFT

10

where PT permutes a per-TX-antenna ordering into a per-tone ordering and where P R permutes a perRX-antenna ordering into a per-tone ordering, we obtain the following per-tone data model : ´ ·x ´ [n] = H ´ [n] + ´z[n], y

(18)

´ is a block diagonal where ´z[n] := PR · z˜[n] is the per-tone noise block. The per-tone channel matrix H matrix, given by :

n o ´ := PR · H ˜ · PT = diag H ´ 1, . . . , H ´K . H T

(19)

´ k , with Looking at (14), (15), (16) and (17), it is important to note that each of the 2N R × 2 submatrices H k ∈ {1, . . . , K}, retains the orthogonality amongst transmit streams on a particular tone k, similarly to the Alamouti scheme for frequency-flat fading channels [14]. Armed with the model of (18), we can now discuss different equalization options that recover the ´ [n]. Since ML Viterbi equalization, as used in [29], requires exponential multi-user chip block sequence x complexity in the channel order L, we will only focus on low-complexity linear alternatives. A first possiblity is to apply a Zero-Forcing (ZF) chip equalizer [30] : ´ ZF = (H ´ H · H) ´ −1 · H ´ H, G

(20)

that completely eliminates the Inter Chip Interference (ICI) at the expense of excessive noise enhancement. A second possiblity is to apply a Minimum Mean Square Error (MMSE) chip equalizer [30], that explicitly takes into account the noise and balances ICI eliminination with noise enhancement. The MMSE chip equalizer is expressed as : ´ M M SE = (H ´H ·H ´ + σ 2 · R−1 )−1 · H ´ H, G w x ´

(21)

´ H [n]} is the input covariance matrix, and σw2 the AWGN variance. From (20) where Rx´ := E{´ x[n] · x ´ M M SE reduces to G ´ ZF at high SNR. and (21), it is also clear that G ´ in (19) is It is important to note that the block diagonal per-tone structure of the channel matrix H ´ ZF as well as the MMSE equalizer matrix G ´ M M SE . So, also reflected in both the ZF equalizer matrix G ´ := diag{G ´ 1, . . . , G ´ K }, both (20) and (21) actually capture K parallel and independent equalizers : G ´ k is a size 2×2NR matrix that recovers the two parallel streams on the k-th tone only, with where G k ∈ {1, . . . , K} (see also Fig. 2).

November 4, 2002

DRAFT

11

C.3 User-specific detection ´ ´ [(i + 1)N − 1]], we obtain the ´ [n] into Y[i] Stacking N consecutive chip blocks y := [´ y[iN ], . . ., y symbol block level equivalent of (18) : ´ =H ´ · X[i] ´ + Z[i], ´ Y[i]

(22)

´ and Z[i] ´ are similarly defined as Y[i]. ´ where X[i] From (14) and (16), we also have that : ´ = PT · FK · Tzp · X[i], X[i]

(23)

where X[i] := [x[iN ], . . ., x[(i + 1)N − 1]] stacks N consecutive chip blocks x[n], F K := diag {FK , FK } is the compound FFT matrix and Tzp := diag {Tzp , Tzp } the compound zero padding transmit matrix. From (1), it is also clear that : T

X[i] = Sd [i] · Cd [i] + sp [i] · cp [i]T ,

(24)

  where the multi-user data symbol matrix Sd [i] := s1 [i], . . ., sM [i] stacks the total data symbol blocks

of the different active users at the i-th symbol block instant. The active multi-user code matrix C d [i] :=  1  c [i], . . ., cM [i] stacks the composite code vectors of the different active users at the i-th symbol block

instant, where cm [i] := [cm [iN ], . . ., cm [(i + 1)N − 1]]T is the m-th user’s composite code vector used to spread its total data symbol block sm [i]. The total pilot symbol block sp [i] and the pilot composite code

vector cp [i] := c0 [i] are similarly defined as sm [i] and cm [i] respectively. For future discussions, we find   ¯ d [i] := c¯1 [i], . . ., c¯M¯ [i] that stacks the it convenient to also define the inactive multi-user code matrix C

¯ := N − M − 1 unused composite code vectors c¯m¯ [i] := cM +m¯ [i], with m ¯ }. It is important M ¯ ∈ {1, . . ., M ¯ d [i] together form an orthonormal basis of the N -dimensional code space. to note that cp [i], Cd [i] and C ´ whether ZF or MMSE, may subsequently be Starting from (22), the obtained chip equalizer matrix G, used to extract the desired user’s total data symbol block : H ´ · Y[i] ´ ·cm [i]∗ , ˆsm [i] = TzpT · FK · PTT · G | {z }

|

{z

b X[i]

b ˜ X[i]

(25)

}

b ˜ is transformed to the TD by the compound where the estimate of the FD multi-user chip block matrix X[i] H and has its zero postfix removed by the transpose of the compound ZP transmit matrix IFFT matrix FK

November 4, 2002

DRAFT

12

b is finally descrambled and deTzp . The resulting estimate of the TD multi-user chip block matrix X[i]

spread with the desired user’s composite code vector cm [i] to obtain an estimate of the desired user’s total data symbol block ˆsm [i]. These operations are alternatively represented by Fig. 2 (see right-hand side) and Fig. 3. In the above procedure, we have first applied equalization at the chip block level followed by despreading with the desired user’s composite code vector. However, it might be instructive to reverse the order of these operations, so first performing the despreading followed by equalization at the symbol block level. From (23) and (24), we can easily derive that : ´ =S ´ d [i] · Cd [i]T + ´sp [i] · cp [i]T , X[i]

(26)

´ d [i] := PT · FK · Tzp · Sd [i] is the permuted version of the multi-user data symbol matrix Sd [i] after where S ZP and FFT processing, and likewise ´sp [i] := PT · FK · Tzp · sp [i] for the total pilot symbol block sp [i]. So, despreading (22) with the desired user’s composite code vector c m [i] leads to the following single-user data model : ´ · cm [i]∗ = H ´ · ´sm [i] + Z[i] ´ · cm [i]∗ , Y[i]

(27)

because of the orthonormality between the user and the pilot composite code vectors at each symbol instant. From (27), we can also conclude that our transceiver preserves the orthogonality amongst users regardless of the underlying multi-path channels and succesfully converts (through despreading) a multiuser chip block equalization problem into a single-user symbol block equalization problem. Similarly to (20) and (21), we can then obtain ZF and MMSE symbol block equalizers that act on a per-tone basis in the FD. Finally, it is important to note that a classical DS-CDMA transmission scheme does not possess this nice property. Indeed, in classical DS-CDMA, the orthogonality amongst users is lost due to multipath propagation, which requires chip-level equalization before the actual despreading can take place. III. E XTENSION

TO MULTIPLE TRANSMIT ANTENNAS

In Section II, we have limited our discussion to the case of NT = 2 transmit antennas and any number of receive antennas NR . In this Section, we will extend our transceiver design to the general case of NT > 2 transmit antennas. In general, the number of independent transmit streams NS can be different from the number of transmit antennas NT ; in Section II we implicitly assumed that NS = NT = 2. Here, the m-th user’s data symbol NS  m sequence is first demultiplexed into NS parallel lower rate sequences sm ns [i] := s [iNS + ns − 1] ns =1 . November 4, 2002

DRAFT

13

The block spreading and code division multiplexing operations that lead to (1) remain the same if we replace nt and NT by ns and NS , respectively. The ST encoder takes the resulting multi-user chip blocks S {xns [n]}N ns =1 to output the following NT B×ND matrix of ST coded multi-user chip blocks :   ¯ [nND ] . . . x ¯ 1 [(n + 1)ND − 1] x   1  .. .  . ¯ . . X[n] = Ω {x1 [n], . . . , xNS [n]} =  . , . .   ¯ NT [nND ] . . . x ¯ NT [(n + 1)ND − 1] x

(28)

where ND is the time span (measured in number of chip block instants), and R := N S /ND the rate of the underlying ST code. In [29], R = 1/2 ST code designs for single-carrier block transmissions have been derived from the Generalized Complex Orthogonal Designs (GCOD) of [15]. The design rules can be summarized as follows : d1. Construct a GCOD Gc of size ND ×NT in the variables x1 , . . . , xNS , with ND = 2NS [15], d2. Replace x1 , . . . , xNS in GcT by x1 [n], . . . , xNS [n], (0)

(0)

d3. Replace x∗1 , . . . , x∗NS in GcT by PB · x1 [n]∗ , . . . , PB · xNS [n]∗ . ¯ nt [n] is forwarded to the nt -th antenna and At each time instant n, the ST coded multi-user chip block x transmitted over the FIR channel hnr ,nt [n] after zero padding and parallel-to-serial conversion. ¯ nr [n] := R·vnr [n] (see Subsection II-C), After serial-to-parallel conversion and preserving each block y ¯ nr [(n+1)ND −1]. ¯ nr [nND ], . . . , y we collect at each receive antenna ND = 2NS consecutive chip blocks y ¯ nr [nND + In order to exploit the structure of the ST code design in (28), we further define y nr ,nd [n] := y (B)

¯ nr [nND + nd − 1]∗ for the last NS nd − 1] for the first NS blocks nd = 1, . . . , NS and ynr ,nd [n] := PK · y ˜ nr ,nd [n] := blocks nd = NS + 1, . . . , ND . Pursuing FD processing, we employ the K × K FFT matrix : y FK · ynr ,nd [n]. Following similar steps as in Paragraph II-C.1, we obtain the same block matrix form as h h iT iT ˜ nr ,1 [n]T , . . ., y ˜ nr ,ND [n]T and x ˜ nr [n] := y ˜ 1 [n]T , . . ., x ˜ NS [n]T . It is ˜ [n] := x in (14) if we redefine y ˜ nr is now a size ND K×NS K matrix but still reflects the important to remark that the channel matrix H orthogonality of the underlying ST block code. Stacking the contributions of the N R RX antennas, finally ˜ is now a size NR ND K×NS K channel leads to the same per-RX-antenna FD data model of (15) where H matrix. Similarly, with the same transmit and receive permutation matrices, P T and PR , we also obtain the same per-tone data model of (18). Since we basically start from the same data model, the ZF and MMSE equalizers that we have developed in Paragraph II-C.2 as well as the equalizer estimation methods of Section IV are directly appli´ := cable here as well. So, both (20) and (21) still capture K parallel and independent equalizers : G November 4, 2002

DRAFT

14

´ 1, . . . , G ´ K }, where G ´ k is now a size NS ×NR ND matrix that recovers the NS parallel streams on diag{G the k-th tone only. IV. P RACTICAL

EQUALIZER ESTIMATORS

Up till now, we have assumed perfect CSI at the receiver to calculate either ZF (see (20)) or MMSE (see (21)) type of equalizers. However, in practice the receiver needs to acquire and/or update CSI before the actual equalization and detection can take place. In the following, we derive two equalizer estimation methods, that directly determine the equalizer coefficients without having to estimate the channel first. Least Squares (LS) methods for burst processing as well as Recursive Least Squares (RLS) methods for adaptive processing are discussed. ´ to have full column rank and the input matrix Starting from (22) and assuming the channel matrix H ´ to have full row rank, the matrix G, ´ for which G ´ · Y[i] ´ − X[i] ´ = 0 in the absence of noise, is the X[i] ZF equalizer matrix given by (20). In the presence of noise, we have to solve the corresponding Least Squares (LS) minimization problem, which we denote for convenience as : ´ LS = arg min G ´ G

I−1

2 X

´ ´ ´

,

G · Y[i] − X[i]

(29)

i=0

where we consider I consecutive symbol block instants i ∈ {0, . . ., I − 1}, corresponding to a burst length of NT BI data symbols per user. Two distinct ways exist to solve this minimization problem. The first ´ to be fixed over the entire burst length (under the assumption that way constrains the equalizer matrix G ´ remains constant as well) and leads to an LS type of burst processing algorithm. the channel matrix H ´ from one symbol block instant to the next and The second way updates the chip equalizer matrix G[i] leads to a Recursive Least Squares (RLS) type of adaptive processing algorithm. Note that a numerically stable Square Root Information (SRI) type of RLS algorithm can be easily derived from its corresponding LS algorithm. Such algorithm consists of a triangular QR Decomposition (QRD) updating step and a triangular backsubstitution step [31]. Substituting (26) into (29), leads to : ´ LS = arg min G ´ G

I−1

2 X

´ ´ ´ d [i] · Cd [i]T − ´sp [i] · cp [i]T

G · Y[i] − S

,

(30)

i=0

´ and the per-tone multi-user total data which is an LS problem in both the per-tone equalizer matrix G ´ d [i]. Taking (30) as a starting point, we will design in the following two equalizer estisymbol matrix S mation methods, that differ in the amount of a-priori information they exploit to determine the equalizer November 4, 2002

DRAFT

15

coefficients. The first method, coined CDMP-trained, only exploits the presence of a Code Division Multiplexed Pilot (CDMP). The second method, coined semi-blind, additionally exploits knowledge of the unused composite code vectors in any practical CDMA system. A. CDMP-trained equalizer estimator The CDMP-trained equalizer estimator directly determines the equalizer coefficients from the per-tone I−1 ´ output matrices {Y[i]} based on the knowledge of the pilot composite code vectors {c p [i]}I−1 and the i=0

i=0

p

per-tone total pilot symbol blocks {´s

I−1 [i]}i=0 .

By despreading (30) with the pilot composite code vector

cp [i] and by relying on the orthonormality of the user and the pilot composite code vectors at each symbol instant, we obtain : ´ LS = arg min G ´ G

I−1

2 X

´ ´ p ∗ p

G · Y[i] · c [i] − ´s [i] ,

(31)

i=0

´ · Y[i] ´ is first despread with which can be interpreted as follows. The equalized per-tone output matrix G the pilot composite code vector cp [i]. The resulting equalized per-tone output matrix after despreading ´ · Y[i] ´ · cp [i]∗ should then be as close as possible in a Least Squares sense to the per-tone total pilot G symbol block ´sp [i]. In practice however, the order of chip block equalization and despreading is reversed. ´ is first despread with the pilot composite code vector cp [i]. A Hence, the per-tone output matrix Y[i] similar CDMP-trained method for traditional DS-CDMA was studied in [4, 5]. However, here the block ´ · cp [i]∗ deterministically removes the multi-user interference caused by the despreading operation Y[i] active user signals and converts a chip block equalization problem into an equivalent symbol block equalization problem (refer back to (27)). B. Semi-blind equalizer estimator Compared to the CDMP-trained equalizer estimator, the semi-blind equalizer estimator additionally I−1

exploits knowledge of the active multi-user code matrices {Cd [i]}i=0 or equivalently the inactive multi¯ d [i]}I−1 . By despreading (30) with the active multi-user code matrix Cd [i] and by user code matrices {C i=0

´ to be known and fixed, we obtain an LS estimate of the per-tone assuming the per-tone equalizer matrix G ´ · Y[i] ´ · Cd [i]∗ . Substituting S ´ d [i] into the original LS problem ´ d [i] = G multi-user data symbol matrix S LS

LS

´ only : of (30) leads to a modified LS problem in the per-tone equalizer matrix G ´ LS = arg min G ´ G

November 4, 2002

I−1

2 X

´ ´ T

G · Y[i] · Pd [i] − ´sp [i] · cp [i] ,

(32)

i=0

DRAFT

16

  ∗ T where Pd [i] := IN − Cd [i] · Cd [i] is the projection matrix corresponding to Cd [i]. The interpretation ´ · Y[i] ´ is first projected on the orthogonal of (32) goes as follows. The equalized per-tone output matrix G complement of the subspace spanned by the active multi-user code matrix C d [i], employing the orthogonal ´ · Y[i] ´ · projection matrix Pd [i]. The resulting equalized per-tone output matrix after projecting, i.e. G Pd [i], should then be as close as possible in a Least Squares sense to the per-tone total pilot chip matrix ´sp [i] · cp [i]T . With some algebraic manipulations, it is easy to prove that (32) can be rewritten as : ´ LS G

I−1 

2 

2 X

´ ´

´ ´ d,p p ∗ p = arg min

G · Y[i] · c [i] − ´s [i] + α G · Y[i] · P [i] , ´ G

(33)

i=0

  ∗ T where Pd,p [i] := IN − Cd [i] · Cd [i] − cp [i]∗ · cp [i]T is the orthogonal projection matrix correspond-

ing to both Cd [i] and cp [i], and α = 1 is a weighting factor. This shows that the original LS problem

of (32) naturally decouples into two different parts : a training-based part and a fully-blind part. On the one hand, the training-based part (described by the first term of (33)) corresponds to the CDMP-trained equalizer estimator of (31). On the other hand, the fully-blind part (described by the second term of (33)) ´ · Y[i] ´ on the orthogonal complement of the subspace projects the equalized per-tone output matrix G spanned by both the active multi-user code matrix Cd [i] and the pilot composite code vector cp [i]. Fur´ · Y[i] ´ · Pd,p [i] should be thermore, the energy in the equalized per-tone output matrix after projecting G as small as possible in a Least Squares sense, which actually corresponds to a Minimum Output Energy (MOE) criterion. This explains why the equalizer estimation method discussed above can be classified as a semi-blind method. The weighting factor α controls the importance of the fully-blind part relative to the training-based part. Giving equal importance (α = 1) to both the training-based and the fully-blind part, as initially suggested by (33), does not necessarily lead to the best performance. Similar observations were made for semi-blind channel estimation algorithms in [32] and [33]. For α = 0, the semi-blind equalizer estimator reduces to the CDMP-trained equalizer estimator of (31). For α→∞, the semi-blind equalizer estimator converges to the purely blind equalizer estimator. Clearly, an optimal α opt should exist that leads to the best performance. However, a detailed analysis of α opt is out of the scope of this paper. As will be demonstrated in Section V, like in [33], α = 0.1 gives close to optimal performance. Let us now turn our attention again to the orthogonal projection matrix P d,p [i] that performs a projection on the orthogonal complement of the subspace spanned by the active multi-user code matrix C d [i] as well as the pilot composite code vector cp [i]. Due to the orthonormality of the composite code vectors at each November 4, 2002

DRAFT

17

symbol instant, this is equivalent to directly projecting on the subspace spanned by the inactive multi-user ¯ d [i]. Therefore, we are allowed to substitute Pd,p [i] = C ¯ d [i]∗ · C ¯ d [i]T into (33), leading to : code matrix C ´ LS = arg min G ´ G

I−1 

2

2  X

´ ´

´ ´ p ∗ p d ∗ ¯

G · Y[i] · c [i] − ´s [i] + α G · Y[i] · C [i] ,

(34)

i=0

¯ d [i]T has been left out because it does not alter the LS criterion. The where the right multiplication with C second term in (34) expresses that the energy in the equalized per-tone output matrix after despreading ¯ d [i] should be minimal. This suggests that the inactive comwith the inactive multi-user code matrix C posite code vectors can be interpreted as virtual pilot composite code vectors that continuously carry zero virtual pilot symbols. It is also worth noting that for a fully-loaded system, so when M = N − 1, all composite code vectors are in use and the fully-blind term of (34) becomes obsolete. This means that for a fully-loaded system the semi-blind approach reduces to the regular CDMP-trained approach. Similar semi-blind and purely blind methods for traditional DS-CDMA were studied in [5, 34–36]. ´ · cp [i]∗ and Y[i] ´ ·C ¯ d [i]∗ deterministically remove However, here the block despreading operations Y[i] the MUI caused by any other signals and succesfully convert a chip block equalization problem into an equivalent symbol block equalization problem (see also (27)). C. Complexity comparison With T := NR ND the number of equalizer taps per tone and per transmit stream (see the end of Section III), Table I evaluates the operation count of the two RLS adaptive equalizer estimators, in terms of multiply/accumulates. A multiply/accumulation is an operation of the type d = c+a∗b that involves one complex multiplication, one complex addition, three complex reads and one complex write. Hence, the number of corresponding data transfers is four times the number of multiply/accumulates. Also, Table I only accounts for the updating phase, where the new values of the equalizer coefficients are determined. The detection phase, where these equalizer coefficients are used to detect the desired user’s symbol block, is common to both adaptive equalizers. Both phases are executed continuously at a rate

Rs , NS B

where Rs is

the symbol rate. According to (31), the RLS CDMP-trained updating consists of three computational steps per time update, namely first a pilot despreading step, then a triangular QRD-updating step, and finally a triangular backsubstitution step. On the one hand, the pilot despreading step only requires O (KN T ) operations. On the other hand, both the triangular QRD-updating and the triangular backsubstitution step require O (KNS T 2 ) operations, leading to an overall complexity of O (KNS T 2 ). It is important to note however November 4, 2002

DRAFT

18

that these operations need to be executed at a rate which is BNS times smaller than the symbol rate Rs . According to (34), the virtual pilot despreading step and the triangular QRD-updating step of the RLS ¯ + 1) times more complex than their respective counterparts in the RLS pilotsemi-blind updating are (M  ¯ T2 . trained updating. This leads to an overall complexity of O KNS M V. S IMULATION

RESULTS

We consider the downlink of a ST-coded CDMA system with, unless otherwise stated, N T = 2 transmit antennas at the base station and NR = 2 receive antennas at the mobile station of interest. Each user’s QPSK modulated data symbols are spread by a real orthogonal Walsh-Hadamard spreading code of length N = 16 along with a complex random scrambling code. Furthermore, we assume that each channel is FIR with order L = 3 and has Rayleigh distributed channel taps of equal variance

1 . NT NR (L+1)

In this way,

the average received power is normalized with respect to the number of transmit as well as the number of receive antennas. In the following, we simulate the average Bit Error Rate (BER) versus the average received Signal-to-Noise Ratio (SNR) over 500 Monte Carlo channel realizations for three different test cases. A. Comparison with the optimal ML diversity bound We test the proposed transceiver, employing a cascade of TR-STBC and SCBT-DS-CDMA at the transmitter and applying MMSE FD equalization based on perfect CSI at the receiver, for three different MIMO system setups (NT , NR ) : the (1, 1) setup, the (2, 1) setup with TX diversity only and the (2, 2) setup with both TX and RX diversity. We choose the initial block length B = 12, the length of the zero postfix µ = 4, and correspondingly the transmitted block length K = B + µ = 16. The system is fully-loaded supporting M = 16 active users. No pilot is present. The performance of each setup is compared with its corresponding optimal ML performance bound for NT NR (L + 1)-fold diversity over Rayleigh fading channels [27]. Fig. 4 depicts the results for frequency-selective channels with channel order L = 1. Fixing the BER at 10−3 and focusing on the proposed transceiver, the (2, 1) setup outperforms the (1, 1) setup by 6 dB. The (2, 2) setup achieves on its turn a 3.5 dB gain compared to the (2, 1) setup. Comparing the simulated performance of the proposed transceiver with its corresponding ML diversity bound, it incurs a 4 dB loss for the (1, 1) setup, but only a 0.4 dB loss for the (2, 2) setup. So, the larger the number of TX and/or RX antennas, the better the proposed transceiver with linear receiver processing succeeds in extracting the full

November 4, 2002

DRAFT

19

diversity of order NT NR (L + 1). Fig. 5 shows the same results but now for frequency-selective channels with channel order L = 3. Again fixing the BER at 10−3 and focusing on the proposed transceiver, the (2, 1) setup outperforms the (1, 1) setup by 4 dB, whereas the (2, 2) setup achieves on its turn a 2 dB gain compared to the (2, 1) setup. So, compared to Fig. 4, the corresponding gains are now smaller because of the inherently larger underlying multi-path diversity. B. Comparison with other ST-coded CDMA transceivers In the following, we compare four different ST-coded CDMA transceivers : T1. The first transceiver applies the ST-coded DS-CDMA transmission scheme that was proposed for the UMTS and the IS-2000 WCDMA standards, also known as Space-Time Spreading (STS) [37]. At the receiver, a space-time RAKE is applied similar to the time-only RAKE discussed in [37], but instead of using a time-only Maximum Ratio Combiner (MRC) based on exact CSI, we use a space-time MRC based on exact CSI. The bandwidth efficiency of the first transceiver supporting M 1 users can be calculated as : 1 =

M1 , N

(35)

where N is the length of the Walsh-Hadamard spreading codes. T2. The second transceiver employs a cascade of the classical downlink DS-CDMA transmission scheme and TR-STBC [38]. At the receiver, a time-domain ST chip equalizer is applied similar to the ones discussed in [38], but instead of using a chip equalizer that is trained with the pilot, we use an ideal MMSE chip equalizer based on perfect CSI. The bandwidth efficiency of the second transceiver supporting M 2 users can be determined as : 2 =

M2 B , BN + µ

(36)

where B is the initial block length and µ the length of the zero postfix. T3. The third transceiver is the one proposed in Section II that combines the SCBT-DS-CDMA transmission scheme with TR-STBC. At the receiver, a frequency-domain MMSE equalizer based on perfect CSI is used. The bandwidth efficiency of our transceiver supporting M 3 users can be calculated as : 3 =

M3 B . N (B + µ)

(37)

T4. The fourth transceiver combines the classical Multi-Carrier (MC) DS-CDMA transmission scheme with the STBC techniques of [15], but implemented on a per-tone basis [39, 40]. At the receiver, the ST November 4, 2002

DRAFT

20

block decoding as well as the MMSE equalization are performed in the frequency-domain based on exact CSI. The bandwidth efficiency of the fourth transceiver supporting M 4 users can be determined as : 4 =

M4 Q , N (Q + µ)

(38)

where Q is the number of tones and µ the length of the cyclic prefix. In order to make a fair comparison between the four transceivers, we should force their bandwidth efficiencies to be the same 1 = 2 = 3 = 4 . With (35), (36), (37) and (38), this leads to the following relationship between the number of users to be supported by the different transceivers : M2 =

BN + µ M1 BN

M3 =

B+µ M1 B

With B = 12, Q = B, N = 16 and µ = 4, we can derive that M2 =

M4 =

Q+µ M1 Q

49 M1 ≈M1 48

(39)

and M4 = M3 = 34 M1 .

It is important to remark that for this comparison no pilot is present. Fig. 6 compares the performance of the four transceivers for a small system load with M 1 = M2 = 3 and M3 = M4 = 4. Also shown in the figure is the optimal ML performance bound for N T NR (L + 1) = 16-fold diversity over Rayleigh fading channels [27]. T2 and T3 have similar performance with a small advantage for T3 over T2 at high SNR. They both come close to extracting the full diversity, so both multi-antenna as well as multi-path diversity. As opposed to T2 and T3, T1 does not succeed in extracting the full diversity : at a BER of 10−4 it incurs a 3.8 dB loss compared to T3. Likewise, T4 does not succeed in extracting the full diversity either since it only exploits multi-antenna diversity : at a BER of 10 −4 it incurs a 3.4 dB loss compared to T3. Fig. 7 depicts the same curves but now for a large system load with M 1 = M2 = 12 and M3 = M4 = 16. Since both T3 and T4 are MUI-free ST-coded CDMA transceivers, their performance remains unaffected by the MUI. So, even at large system load, T3 comes close to extracting the full diversity : e.g. at a BER of 10−4 it only incurs a 0.6 dB loss compared to the optimal ML diversity bound. Unlike T3, T4 only exploits multi-antenna diversity and incurs a 4 dB loss compared to the optimal ML diversity bound. As opposed to T3, T2 does not deterministically remove the MUI (T2 completely suppresses the MUI only at infinite SNR), and therefore fails in extracting the full diversity : e.g. at a BER of 10 −4 , T3 achieves a 2.4 dB gain compared to T2. We also observe that T1 now performs poorly compared to T2, T3 and T4 : e.g. at a BER of 3·10−3 , T3 achieves a 12.3 dB gain compared to T1. In contrast with T3 and T4 that deterministically remove the MUI, and with T2 that completely suppresses the MUI at infinite SNR, T1 does not completely suppress the MUI at high SNR. Hence, T1 suffers from a BER saturation level that increases with the system load M1 . November 4, 2002

DRAFT

21

C. Comparison of the different equalizer estimators In order to test the equalizer estimators discussed in Section IV, we employ the same system parameters as before (see the beginning of Section V) with M = 7 active users and one Code Division Multiplexed Pilot (CDMP), similar to the so-called Common PIlot CHannel (CPICH) [19] in forthcoming 3G systems. As we already indicated in Subsection IV-B, the performance of our semi-blind equalizer estimator is quite sensitive to the choice of the weighting factor α. Fixing the SNR to 10 dB, Fig. 8 and 9 show the influence of α when I = 20 and I = 5 symbol blocks are collected respectively to perform the LS equalizer estimators. For α→0, the semi-blind equalizer estimator converges to the CDMP-trained equalizer estimator. For α→∞, the semi-blind equalizer estimator converges to the purely blind equalizer estimator, that always performs worse than the CDMP-trained equalizer estimator. These simulation results also indicate that α = 0.1 gives close to optimal performance. Moreover, the optimum is more pronounced when only a small number of symbol blocks are collected. Hence, we employ α = 0.1 for the following test cases. Fig. 10 shows the BER versus SNR results when I = 20 symbol blocks are collected to perform the LS equalizer estimators. At a BER of 10−3 , the performance of the semi-blind equalizer estimator comes within 0.75 dB of the ideal MMSE equalizer with perfect CSI. Moreover, the semi-blind equalizer estimator performs 0.4 dB better than the CDMP-trained equalizer estimator. Fig. 11 further corroborates the fast convergence of the semi-blind equalizer estimator, depicting the performance when only I = 5 symbol blocks are considered. E.g. at a BER of 10 −3 , the semi-blind equalizer estimator now outperforms the CDMP-trained equalizer estimator by 7 dB. The faster convergence stems from the fact that the semi-blind equalizer estimator additionally exploits knowledge of the unused spreading codes, whereas the CDMP-trained equalizer estimator only exploits the presence of a CDMP. VI. C ONCLUSION To enable significant performance improvements compared to 3G cellular systems, we have designed a novel ST block-coded CDMA transceiver, that can yield gains of up to 12 dB in full load situations. To this end, we have combined two specific DS-CDMA and STBC techniques, namely Single-Carrier Block Transmission (SCBT) DS-CDMA on the one hand and Time-Reversal (TR) Space-Time Block Coding (STBC) on the other hand. The resulting transmission scheme allows for deterministic ML user separation through low-complexity code-matched filtering as well as deterministic ML transmit stream separation through linear processing. Moreover, it can achieve maximum diversity gains of N T NR (L + 1) November 4, 2002

DRAFT

22

for every user in the system, irrespective of the system load. In addition, a low-complexity linear receiver based on frequency-domain equalization approaches the optimal ML performance (within 0.6 dB for a (2, 2) system) and comes close to extracting the full diversity in reduced as well as full load settings. In this perspective, we have also proposed two (Recursive) Least Squares based methods for direct equalizer estimation, coined CDMP-trained and semi-blind respectively. Both methods exploit the presence of a Code Division Multiplexed Pilot (CDMP) and benefit from the ML user separability through codematched filtering. When collecting a large number of symbol blocks, the semi-blind method only performs slightly better than the regular CDMP-trained method and comes within 0.75 dB of the ideal MMSE equalizer. However, by additionally exploiting knowledge of the unused spreading codes in a practical CDMA system, the semi-blind method outperforms the CDMP-trained method when only a small number of symbol blocks are collected; gains of up to 7 dB for I = 5 symbol blocks have been established.

November 4, 2002

DRAFT

23

R EFERENCES [1]

A. Klein, “Data detection algorithms specially designed for the downlink of CDMA mobile radio systems,” in VTC, May 1997, vol. 1, pp. 203–207.

[2]

I. Ghauri and D.T.M. Slock, “Linear receivers for the DS-CDMA downlink exploiting orthogonality of spreading sequences,” in Asilomar Conference on Signals, Systems and Computers, November 1998, vol. 1, pp. 650–654.

[3]

T.P. Krauss, W.J. Hillery, and M.D. Zoltowski, “Downlink specific linear equalization for frequency selective CDMA cellular systems,”

[4]

C.D. Frank, E. Visotsky, and U. Madhow, “Adaptive interference suppression for the downlink of a Direct Sequence CDMA system

Kluwer Journal of VLSI Signal Processing, vol. 30, no. 3, pp. 143–161, March 2002. with long spreading sequences,” Kluwer Journal of VLSI Signal Processing, vol. 30, no. 3, pp. 273–291, March 2002. [5]

F. Petr´e, G. Leus, L. Deneire, M. Engels, and M. Moonen, “Space-time chip equalizer receivers for WCDMA downlink with code-

[6]

S. Zhou, G.B. Giannakis, and C. Le Martret, “Chip-Interleaved Block-Spread Code Division Multiple Access,” IEEE Transactions on

multiplexed pilot and soft handover,” in GLOBECOM, November 2001, vol. 1, pp. 280–284. Communications, vol. 50, no. 2, pp. 235–248, February 2002. [7]

G. Leus, P. Xia, S. Zhou, and G.B. Giannakis, “Chip-Interleaved Block-Spread CDMA or DS-CDMA for cellular downlink?,” in

[8]

G.J. Foschini and M.J. Gans, “On limits of wireless communications in a fading environment when using multiple antennas,” Wireless

ICASSP, May 2002, vol. 3, pp. 2305–2308. Personal Communications, vol. 6, no. 3, pp. 311–335, March 1998. [9]

G.G. Raleigh and J.M. Cioffi, “Spatio-temporal coding for wireless communications,” IEEE Transactions on Communications, vol. 46, no. 3, pp. 357–366, March 1998.

[10] D. Gesbert, H. Bolcskei, D. Gore, and A. Paulraj, “MIMO wireless channels : capacity and performance prediction,” in GLOBECOM, December 2000, vol. 2, pp. 1083 –1088. [11] G.J. Foschini, “Layered space-time architecture for wireless communication in a fading environment when using multiple antennas,” Bell Labs Technical Journal, vol. 1, no. 2, pp. 41–59, September 1996. [12] A. Paulraj and T. Kailath, “Increasing capacity in wireless broadcast systems using distributed transmission/directional reception (DTDR),” U.S. Patent 5345599, Stanford University, September 1994. [13] V. Tarokh, N. Seshadri, and A.R. Calderbank, “Space-time codes for high data rate wireless communication : Performance criterion and code construction,” IEEE Transactions on Information Theory, vol. 44, no. 2, pp. 744–765, March 1998. [14] S.M. Alamouti,

“A simple transmit diversity technique for wireless communications,”

IEEE Journal on Selected Areas in

Communications, vol. 16, no. 8, pp. 1451–1458, October 1998. [15] V. Tarokh, H. Jafarkhani, and A.R. Calderbank, “Space-time block codes from orthogonal designs,” IEEE Transactions on Information Theory, vol. 45, no. 5, pp. 1456–1467, July 1999. [16] E. Lindskog and A. Paulraj, “A transmit diversity scheme for channels with intersymbol interference,” in ICC, June 2000, vol. 1, pp. 307–311. [17] N. Al-Dhahir, “Single-carrier frequency-domain equalization for space-time block-coded transmissions over frequency-selective fading channels,” IEEE Communications Letters, vol. 5, no. 7, pp. 304 –306, July 2001. [18] S. Zhou and G.B. Giannakis, “Space-time coding with maximum diversity gains over frequency-selective fading channels,” IEEE Signal Processing Letters, vol. 8, no. 10, pp. 269–272, October 2001. [19] H. Holma and A. Toskala, WCDMA for UMTS, Wiley, 2001. [20] F. Petr´e, G. Leus, L. Deneire, M. Moonen, and M. Engels, “Per-tone chip equalization for space-time block coded single-carrier block transmission DS-CDMA downlink,” in EUSIPCO, September 2002, vol. 2, pp. 425–428, invited. [21] Z. Liu and G.B. Giannakis, “Space-time block-coded multiple access through frequency-selective fading channels,” IEEE Transactions on Communications, vol. 49, no. 6, pp. 1033–1044, June 2001.

November 4, 2002

DRAFT

24

[22] A. Stamoulis, Z. Liu, and G.B. Giannakis, “Space-time block-coded OFDMA with linear precoding for multirate services,” IEEE Transactions on Signal Processing, vol. 50, no. 1, pp. 119 –129, January 2002. [23] Z. Wang and G.B. Giannakis, “Wireless multicarrier communications : Where Fourier meets Shannon,” IEEE Signal Processing Magazine, vol. 17, pp. 29–48, May 2000. [24] Luc Deneire, Bert Gyselinckx, and Marc Engels, “Training sequence vs. cyclic prefix. a new look on single carrier communications,” IEEE Communication Letters, vol. 5, no. 7, pp. 292–294, July 2001. [25] G. Leus and M. Moonen, “MUI-free receiver for a synchronous DS-CDMA system based on block spreading in the presence of frequency-selective fading,” IEEE Transactions on Signal Processing, vol. 48, no. 11, pp. 3175 –3188, November 2000. [26] Z. Liu, G.B. Giannakis, B. Muquet, and S. Zhou,

“Space-time coding for broadband wireless communications,”

Wireless

Communications and Mobile Computing, vol. 1, no. 1, pp. 35–53, January-March 2001. [27] J.G. Proakis, Digital Communications, McGraw-Hill, third edition, 1995. [28] G.H. Golub and C.F. Van Loan, Matrix Computations, John Hopkins University Press, third edition, 1996. [29] S. Zhou and G.B. Giannakis, “Single-carrier space-time block coded transmissions over frequency-selective fading channels,” IEEE Transactions on Information Theory, 2002, accepted. [30] A. Klein, G.K. Kaleh, and P.W. Baier, “Zero forcing and minimum mean-square-error equalization for multiuser detection in codedivision multiple-access channels,” IEEE Transactions on Vehicular Technology, vol. 14, no. 9, pp. 1784–1795, December 1996. [31] J.G. Proakis, C.M. Rader, F. Ling, C.L. Nikias, M. Moonen, and I. Proudler, Algorithms for Statistical Signal Processing, Prentice Hall, 2002. [32] E. de Carvalho and D.T.M. Slock, “Deterministic quadratic semi-blind FIR multichannel estimation algorithms and performance,” in ICASSP, June 2000, vol. 5, pp. 2553 –2556. [33] G. Leus and M. Moonen, “Semi-blind channel estimation for block transmissions with non-zero padding,” in Asilomar Conference on Signals, Systems and Computers, November 2001, vol. 1, pp. 762 –766. [34] K. Li and H. Liu, “A new blind receiver for downlink DS-CDMA communications,” IEEE Communications Letters, vol. 3, no. 7, pp. 193–195, July 1999. [35] S. Mudulodu and A. Paulraj, “A blind multiuser receiver for the CDMA downlink,” in ICASSP, May 2000, vol. 5, pp. 2933–2936. [36] D.T.M. Slock and I. Ghauri, “Blind maximum SINR receiver for the DS-CDMA downlink,” in ICASSP, June 2000, vol. 5, pp. 2485 –2488. [37] B. Hochwald, T.L. Marzetta, and C.B. Papadias, “A transmitter diversity scheme for wideband CDMA systems based on space-time spreading,” IEEE Journal on Selected Areas in Communications, vol. 19, no. 1, pp. 48–60, January 2001. [38] G. Leus, F. Petr´e, and M. Moonen, “Space-time chip equalization for space-time coded downlink CDMA,” in ICC, April 2002, vol. 1, pp. 568 –572. [39] F. Petr´e, G. Leus, L. Deneire, M. Moonen, and M. Engels, “Per-tone pilot-trained chip equalizers for space-time coded MC-DS-CDMA downlink,” in ICASSP, May 2002, vol. 3, pp. 2349–2352. [40] Y. Li, J.C. Chuang, and N.R. Sollenberger, “Transmitter diversity for OFDM systems and its impact on high-rate data wireless networks,” IEEE Journal on Selected Areas in Communications, vol. 17, no. 7, pp. 1233–1243, July 1999.

November 4, 2002

DRAFT

25

MGN OQP RLS

[\ ]Q^ _a` S/P 13254

cm [n] other users

x

Nx

EGFIH JLK

pilot

S/P

TGUVXW YLZ

b cdXe fag S/P 63758

cm [n] other users

x

Nx

i'h jk lm ?3@5A

. . . %'&( )* + 93:5;

... + ,. /0 +

Block ST Encoder

  Tzp 

-o n pq rs B3C5D

TX 1  ! P/S TX 2  

Tzp

  "#$

¦…¤ ¥ §a¨ ©ª üvýxþ

IFFT



Tzp

­« ¬ ®°¯ ±² ÿ

IFFT



Tzp

P/S

pilot Fig. 1. Base station transmission scheme

RX 1

|~}L €‚

S/P

R

¶v·x¸

Block ST Decoder

ßvàxá

FFT FFT

… ‘a’ “” s$tpuwvyx `^ ] _ a bdc

L,K M&N