ADAPTIVE TRELLIS-CODED MODULATION WITH IMPERFECT CHANNEL STATE INFORMATION AT THE RECEIVER AND TRANSMITTER Duc V. Duong and Geir E. Øien Norwegian University of Science and Technology, NTNU Department of Electronics and Telecommunications N-7491 Trondheim, Norway email: {duong,oien}@iet.ntnu.no A BSTRACT In this paper we analyze and optimize an adaptive coded modulation (ACM) scheme based on pilot-symbol-assisted trellis-coded modulation (PSAM). The pilot symbol period and the power allocation between pilot and data symbols are optimized. In the analysis we consider the effect of both estimation and prediction errors on the performance. The analysis is presented for a flat Rayleigh fading channel and numerical results are given for the Jakes fading spectrum. Our results show that the average spectral efficiency (ASE) for this scheme is about 0.7 bits/s/Hz higher than for uncoded M-QAM on the same channel. This corresponds to approximately 3 dB in average channel signal-to-noiseratio (CSNR). The optimum power allocation and the optimum pilot spacing is more or less the same for both cases. I.

I NTRODUCTION

Adaptive coded modulation is a promising technique to combat fading and improve spectral efficiency on wireless channels [1, 2, 3]. In order to adapt to the varying conditions of the channel, known symbols (pilot symbols) are typically transmitted along with the information symbols, at regular intervals. The pilots are extracted at the receiver side and are used to estimate and predict the channel. The predicted channel state information (CSI) is fed back to the transmitter. Based on this, the transmitter can dynamically adapt the rate and the power for the code to the mode which is most suitable for the channel, and improve ASE while keeping the instantaneous bit error rate (BER) below the target BER (BER0 ). The capacity-achieving adaptive scheme may be approximated by using discrete rate updating [2]. The scheme presented in this paper is a generalization of a recent paper on adaptive, uncoded modulation by Cai and Giannakis [4]. We extend their concept by looking into the case when channel coding is included. Specifically, we consider pilot-symbol-assisted trellis-coded modulation (PSAM). As in [4], the pilot period and the power allocation between pilot and data symbols are optimized, and both estimation and prediction errors in the optimization process are considered. As in [2], trellis codes designed for AWGN channels are used as component codes. The reader should be aware that some notation used in this paper has other meanings than in the reference paper, and that we omit the time indices wherever it is possible, for notational simplicity. D(z) represents the diagonal matrix with vector z on its diagonal. IK is the K × K identity matrix. Superscripts T , ∗ and H stand for transpose, complex conjugate, and Hermitian transpose, respectively. bzc denotes the biggest integer less than z. The remainder of this paper is organized as follows. Section II describes the system we are considering. BER ana-

lysis is treated in Section III and the adaptive PSAM optimization follows in Section IV. The concluding remarks can be found in Section VI. II.

S YSTEM OVERVIEW

We study the system illustrated in Fig. 1. The adaptive encoder/modulator and decoder/demodulator use a set of MQAM constellations of sizes {Mn }N n=1 with corresponding spectral efficiencies (SE)s {Rn }N n=1 . Here, R1 < R2 < · · · < RN . If the predicted CSNR falls into the interval γˆ ∈ [ˆ γn , γˆn+1 i, the nth constellation (i.e., SE Rn ) will be used to maintain BER(en |ˆ γ ) ≤ BER0 . No transmission is allowed when γˆ < γˆ1 since none of the codes can guarantee BER(en |ˆ γ ) ≤ BER0 in this interval. We define γˆN +1 = ∞ and the code corresponding to the highest possible rate is always utilized. A pilot symbol is inserted at the beginning of each frame and is followed by L − 1 information symbols. In the receiver, coherent detection is performed based on the estimates from the estimator. While the predictor is a causal filter, the estimator is constructed to utilize information both from past and future symbols. Therefore, we must allow our system to have an additional decoding delay. Both the estimator and the predictor is linear, and optimal in the maximum a posteriori (MAP) sense [5]. The predictor uses Kp pilot symbols from earlier pilot time instants to predict τ seconds ahead. Note that τ is the feedback channel delay, consisting of the actual transmission delay and processing time needed by the transmitter to activate the chosen code for transmission. Similar to [4], we denote the received, noisy and faded pilot symbols in complex baseband as p ypl (k) = Epl h(k; 0)s(k; 0) + η(k; 0) (1) and the received data symbols as p yd (k; l) = Ed h(k; l)s(k; l)+η(k; l), l ∈ [1, L−1], (2) where the index k counts the frame, and l counts the symbol within that frame. Ed and Epl is the power per data and per pilot symbol, respectively, to be optimized later in SecL−1 tion IV-A; s(k; 0) is the pilot symbol, and {s(k; l)}l=1 are the data symbols in the kth frame. Further calculation is done by assuming that E[|s(k; l)|2 ] = |s(k; 0)|2 = 1; η(·) is AWGN with zero mean, and variance N0 /2 per dimension, with the dimensions being uncorrelated. The channel fading envelope h(k; l) is a stationary complex Gaussian random process (RP) with zero mean, and variance σh2 = 1. The adaptive scheme presented here will optimize the pilot symbol spacing and at the same time find the optimal power allocation between pilot and information symbols to maximize the ASE while keeping BER below BER0 . Thus,

Adaptive encoder/ modulator

Power control

Pilot insertion

Zero−error Feedback channel

Fading channel

Adaptive decoder/ demodulator

Buffer

Power selection

Channel predictor

Constellation selection

Channel estimator

Figure 1: The adaptive PSAM system model

we need to analyze BER performance in the presence of both estimation and prediction errors. III.

γ(k; l) =

BER A NALYSIS

A. BER in the Presence of Channel Estimation Errors Let he (k; l) be the linear estimate of h(k; l). The estimation error is given by e (k; l) = h(k; l) − he (k; l). The mean square error (MSE) is defined as σ2e (l) = E[|e (k; l)|2 ]. Since the channel is a Gaussian RP with zero mean, he (k; l) and e (k; l) are zero mean Gaussian random variables. The purpose here is to find an expression for the BER after having estimated the channel and found the MSE σ2e (l). Let Ke be the order of the estimator. That is, this estimator uses Ke pilot symbols, ypl (k − bKe /2c), · · · , ypl (k + b(Ke − 1)/2c), to estimate one sample. If we define the pilot symbol vector s = [s(k−bKe /2c; 0), · · · , s(k+b(Ke − 1)/2c; 0)]T , h = [h(k − bKe /2c; 0), · · · , h(k + b(Ke − 1)/2c; 0)]T , R = E[hhH ], and re = E[hh∗ (k; l)], the linear channel estimator can be written as [4, 5] p −1 we = Epl (Epl D(s)RD∗ (s) + N0 IKe ) D(s)re . (3) Since the received samples are complex Gaussian, the optimally estimated channel in the MAP sense is a linear combination of the observations [5, p.741-742], and the estimated channel and the corresponding channel MMSE are given by [4, 5, 6] he (k; l) = weH y(k) and σ2e (l) = 1 −

(4)

p ∗ Epl rH e D (s)we

(5)

respectively, where y(k) = [ypl (k − bKe /2c), · · · , ypl (k + b(Ke − 1)/2c)]T . By assuming that the amplitude of the pilot symbols is constant for a given channel model, and defining uk as the kth eigenvector of R, with the corresponding eigenvalue λk , σ2e (l) in (5) can be expressed as σ2e (l) = 1 −

time of detection is given by [4]

Ke X |uH re |2 (1 − α)L¯ γ k

k=1

(1 − α)L¯ γ λk + 1

.

(6)

γ¯ in the above equation is the average CSNR and is equal to E[|h(k; l)|2 ]E/N0 = E/N0 , where E is the total average power the system is allowed to use (for both data and pilot symbols). It shows clearly that σ2e (l) depends on, among other things, α and L. The former determines the power allocation between pilot and data symbols (equal when α = 1 − 1/L). We will later optimize the system with respect to both α and L. Given an estimate of the channel, he (k; l), we do a symbol-by-symbol maximum √ likelihood detection by calculating z(k; l) = yd (k; l)/( Ed he (k; l)). The CSNR at the

Ed |he (k; l)|2 , N0 + gEd σ2e (l)

(7)

where g = 1 for 4-QAM, and g = 1.3 for M-QAM with M > 4. Tight approximations for trellis-coded modulation (TCM) BER performance on AWGN channels at high CSNR can be found in [2, 7]: ( bn an exp − M γ when γ ≥ γn,T n BER(en |γ) = (8) 1/2 when γ < γn,T , where Mn is the number of points in the symbol constellation used by the trellis code, an and bn are constellation dependent constants which can be found in [7, Tab. I], and γn,T = Mn ln(2an )/bn is the CNSR threshold where the BER expression goes from 1/2 to the exponential approximation. By replacing γ in (8) with (7), we have the conditional BER when we have the estimate of the channel. With this expression at hand, we will derive the BER when we have predicted the channel in the next subsection. B.

BER in the Presence of Both Channel Estimation and Prediction Errors

The linear channel predictor predicts a sample of the set {h(k; l)}L−1 l=1 by using Kp pilot symbols in the past. Let hp (k; l) denotes the predicted channel. The prediction error is p (k; l) = h(k; l) − hp (k; l), and the MSE of the predicted channel is σ2p (l) = E[|p (k; l)|2 ]. The predicted channel hp (k; l) is also Gaussian distributed with zero mean and variance σh2 p (l) = σh2 − σ2p (l) = 1 − σ2p (l). Suppose now that the feedback delay τ = DLTs , where D is a positive integer and Ts is the channel symbol duration. By defining the pilot symbol vector s = [s(k − D; 0), · · · , s(k − D − Kp + 1; 0)]T , h = [h(k − D; 0), · · · , h(k − D − Kp + 1; 0)]T , R = E[hhH ], and rp (l) = E[hh∗ (k; l)], the MMSE of the predicted channel is written as p ∗ σ2p (l) = 1 − Epl rH (9) p D (s)wp , where wp =

p −1 Epl Epl D(s)RD∗ (s) + N0 IKp D(s)rp (10)

is the causal predictor. Similar to the estimation case, σ2p (l) in (9) can be written as Kp 2 X |uH γ 2 k rp | (1 − α)L¯ σp (l) = 1 − . (11) (1 − α)L¯ γ λk + 1 k=1

The estimated channel can be written as a sum of the predicted channel and the errors as he (k; l) = hp (k; l) + p (k; l) − e (k; l).

(12)

Note that hp (k; l) and e (k; l) are correlated governed by ρ = E[e (k; l)h∗p (k; l)]/σh2 p (l) [4], while hp (k; l) and p (k; l) are uncorrelated due to the orthogonality principle [8]. As a result, when hp (k; l) is given, he (k; l) is Gaussian distributed with mean E[he (k; l)|hp (k; l)] = (1− ρ)hp (k; l), and variance σ ˜h2 e = E[|p (k; l) − e (k; l) + 2 ρhp (k; l)| ]. In practice ρ takes on small values [4], thus, we set it to zero in our further calculations and simulations. We assume that e (k; l) and p (k; l) are uncorrelated, and, thus, σ ˜h2 e = σ2e (l) + σ2p (l). Hence, once the predicted channel is given, the amplitude of the estimated channel follows a Rice distribution with Rician factor K = |(1 − ρ)hp (k; l)|2 /˜ σh2 e = |hp (k; l)|2 /˜ σh2 e . Then the BER conditioning on hp is given by Z ∞ BER(en hp ) = BER(en |he |)p(|he | hp )d|he |, (13) 0

where p(|he | hp ) denotes the Rice pdf and BER(en |he |) = BER(en |he ) in (8) (with γ replaced by (7)). Combining (7) and (8) and using it in (13) yields BER(en hp ) = T1 − T21 + T22 , (14) where ∞

Z T1 =

−M

an e

bn Ed |he |2 2 (l)) n (N0 +gEd σ e

p(|he | hp )d|he | (15)

0 ψn,T

Z T21 =

−M

an e

bn Ed |he |2 2 (l)) n (N0 +gEd σ e

p(|he | hp )d|he |(16)

0

T22

1 = 2

Z

ψn,T

p(|he | hp )d|he |.

(17)

0

The integration limit ψn,T in equations (16) and (17) reflects the limit γn,T in (8) and is found by solving (8) (with γ replaced by (7), and wrt. |he |) when the two cases are equal. Thus, ψn,T = |he | = p (Mn (E/¯ γ + gEd σ2e (l)) ln(2an ))/(bn Ed ). Closed-form solutions for those integrals are achievable but too longwinded to reproduce here. An expression for BER given the predicted CSNR, BER(en |ˆ γ ), is found by defining the instantaneous predicted CSNR as γˆ = E¯d |hp |2 /N0 and using it in the solutions of the integrals (15)–(17). The result is shown in (18) at the bottom of this page, where Q(·, ·) is the normalized incompleted gamma function [9,

Eq. (11.3)], An = bn /(Mn (E/¯ γ + gEd σ2e (l))) and dn = 2 1/(An Ed σ ˜he + 1). Now, the average predicted CSNR is given by γ¯ˆ = E¯d σh2 p (l)/N0 = E¯d (1 − σ2p (l))/N0 . Since the predicted channel is complex Gaussian with zero mean, the pdf of the predicted CSNR is exponentially distributed and given by 1 γˆ p(ˆ γ ) = ¯ exp − ¯ . (19) γˆ γˆ This pdf will be used when we derive the overall BER in the next section. IV. A.

A DAPTIVE PSAM

ASE Analysis

As indicated earlier, E is the total average power for both pilot and data symbols. Thus, the average power per data symbol is E¯d = αLE/(L − 1), and per pilot symbol it is Epl = (1 − α)LE. The goal of this section is to design an adaptive system based on the CSI. The system will adapt the code, the pilot spacing L as well as the power allocation ratio α in such a way that ASE is maximized while keeping BER(en |ˆ γ ) ≤ BER0 . The approach in this paper is accounted for both estimation and prediction errors, and it can be seen from the expression for BER(en γˆ ) in (18). The variance of the predicted channel is largest when we predict the last symbol in a frame (i.e., l = L − 1), while the variance of the estimated channel is almost the same for all l when the estimator Ke ≥ 20 [4]. Thus, we use σ2e (L − 1) and the conservative choice of σ2p (L−1) when deriving the switching thresholds as well as in the optimization process. The optimal switching thresholds, {ˆ γn }N n=1 , are found by solving BER(en |ˆ γ ) = BER0 wrt. γˆn . If γˆ ∈ [ˆ γn , γˆn+1 i, the nth constellation will be used. No transmission takes place when γˆ < γˆ1 . We assume that γˆ0 = 0. To this end, we have to use numerical search approach since the expression for BER(en |ˆ γ ) can not give an analytical solution in the coded case (in constrast to the uncoded case of [4]). Since there is no transmission when γˆ < γˆ1 , the actual transmission data power is [1, 4] Ed = R ∞ γ ˆ1

E¯d = E¯d exp(ˆ γ1 /γ¯ˆ ). p(ˆ γ )dˆ γ

(20)

For a 2G-dimensional1 trellis code, the SE for the nth constellation will be Rn = (1 − 1/L)(log2 (Mn ) − 1/G), where the term 1 − 1/L accounts for that every Lth channel 1 One 2G-dimensional symbol is transmitted as a sequence of G consecutive complex coded symbols.

γˆ dn An EEd BER(en |ˆ γ ) = an dn exp − γ¯ E¯d ! ∞ !m γˆ dn E(An Ed + 1/˜ σh2 e ) X 1 γˆ dn E 1 − Q 1 + m, (ψn,T )2 /(dn σ ˜h2 e ) − an dn exp − 2 ¯ ¯ m! γ¯ Ed σ γ¯ Ed ˜he m=0 ! ∞ !m X 1 1 γˆ E γˆ E + exp − ¯ 2 1 − Q 1 + m, (ψn,T )2 /˜ σh2 e (18) 2 ¯ 2 γ¯ Ed σ ˜he m=0 m! γ¯ Ed σ ˜he

symbol is a pilot, where no information data are transmitted. Let Pn = P (ˆ γn < γˆ < γˆn+1 ), i.e., the probability that γˆ ∈ [ˆ γn , γˆn+1 i. The ASE is given by N L−1 X Rn Pn L n=1 ( γˆ1 L−1 exp − ¯ (log2 (M1 ) − 1/G) = L γˆ ) N X Mn γˆn . (21) + exp − ¯ log2 Mn−1 γˆ n=2

ASE =

Because of Nyquist signaling, L can not exceed Lmax = b1/(2fd Ts )c [6, 7], where fd is the Doppler frequency and Ts is the channel symbol duration. For L ∈ [2, Lmax ] we have the following optimization problem ASE(α) 0 < α < 1.

1

0.95

0.9

(22)

0.85

At this point, the optimization process does not include Ed in the constraint as in [4]. From (20) we see that Ed depends on the first threshold (hence the first constellation). Thus, for the first constellation, given a value of α, Ed is found by letting it vary within the whole range of γˆ . Then, solution of BER(en |ˆ γ ) = BER0 for that Ed value yields the γˆ1 (which is optimal if α is optimal). Once γˆ1 is known, Ed is explicitly given by (20), and it will be used for finding the other thresholds. In that way, the maximization process of (22) is done easily by picking the α ∈ h0, 1i which maximizes the ASE. After solving (22) for each L, the maximum ASE is found by searching over all possible L. We can therefore find the optimum values of L and α.

0.8 α

maxα subject to

lations of sizes {Mn } = {4, 8, 16, 32, 64, 128, 256, 512} to encode and decode eight 4-dimensional trellis codes (as in [2, 7]). The carrier frequency is 2 GHz and the channel symbol duration is 5 µs. This corresponds to a channel bandwidth of 200 kHz if using Nyquist signaling. We let the feedback delay τ = 1 ms or τ = 0.2/fd (which is moderately large, see for instance [3]), and the mobile speed v = 30 m/s. As the result, the Doppler frequency is fd = 200 Hz. We set the BER0 = 10−5 , and choose the order of the estimator and predictor to be Ke = 20 and Kp = 250, respectively. The choice of Kp = 250 leads to a suboptimal but satisfactory predictor [10]. Note that the parameters are chosen to be the same as in [4]. In that way, we can easily compare our results.

0.75

0.7

0.65

0.6

0.55

Z

10

15

20 25 Average CSNR

30

35

40

35

40

Optimal power Equal power

Average BER Performance Analysis 60

50

40 L

In the previous subsection, the optimal switching thresholds, {ˆ γn }N n=1 , are found to meet the BER0 constraint. Those thresholds are used here to find the overall average BER. First we need the average BER for the nth constellation. The average BER for the nth constellation can be found by

5

Figure 2: Optimum power allocation.

70

B.

Optimally allocated Equally allocated

30

γ ˆn+1

BER(en |ˆ γ )p(ˆ γ )dˆ γ

BER(en ) =

20

γ ˆn

= V1 − V21 + V22 .

(23)

The overall average BER (averaged over all codes) is given as the ratio between the average number of bits in error, and the number of bits transmitted in total [3, 7]: PN Rn BER(en ) BER = n=1 (24) PN n=1 Rn Pn Closed-form solution for (23) is also achievable and it is also too long-winded to reproduce here. V.

R ESULTS AND D ISCUSSION

To illustrate the scheme presented so far we consider an ACM system which utilizes a set of 8 QAM signal constel-

10

0

5

10

15

20 25 Average CSNR

30

Figure 3: Optimum pilot spacing. Fig. 2 shows the optimum power ratio between pilot and data symbols, while Fig. 3 shows the optimum pilot period. We see that more power is allocated to pilot symbols than data symbols. This can partially be explained by the fact that we already have protected our data, so less power is needed. Thus, the power can be more freely distributed to the pilots. When the CSNR is about 30 dB and higher, the spacing between two pilots increases steeply (cf. Fig. 3).

To correspond to the large pilot spacing, more power is put on the pilot symbols to maintain the BER0 . That explains why the curve for optimum α bends down for high CSNRs. We observe that our optimum ASE is approximately 0.7 bits/s/Hz higher than for uncoded M-QAM for the same system (cf. Fig. 4 and [4, Fig. 5]). This corresponds to a gain of about 3 dB in average CSNR. This gain is due to the time diversity provided by coding [5, pp.680–681]. 8

tween pilot and data symbols, are optimized. Both channel estimation errors in the receiver and channel prediction errors at the transmitter have been taken into account. The ASE performance of our system, which is based on TCM, is approximately 0.7 bits/s/Hz higher than for uncoded MQAM. The performance is also higher than when pilot symbols are transmitted at a fixed rate, or when both pilot and data symbols have equal power. The BER performance curves lie below the target BER for the whole CSNR range. A very interesting aspect is to include receive antenna diversity in this system; this is a topic for further research.

Average spectral efficiency [bits/s/Hz]

7

R EFERENCES

6

[1] A. J. Goldsmith and S.-G. Chua, “Variable-rate variable-power MQAM for fading channels,” IEEE Transactions on Communications, vol. 45, no. 10, pp. 1218–1230, Oct. 1997.

5

4

[2] K. J. Hole, H. Holm, and G. E. Øien, “Adaptive multidimensional coded modulation over flat fading channels,” IEEE Journal on Selected Areas in Communications, vol. 18, no. 7, pp. 1153–1158, July 2000.

3

2

1

0

Optimal power, optimal L Optimal power, L = 10 Equal power, optimal L 5

10

15

20 25 Average CSNR

30

35

40

Figure 4: Average spectral efficiency. Average BER is plotted in Fig. 5, and it is always lower than BER0 . This is as expected, since the thresholds for switching between different codes are calculated to keep the BER(en |ˆ γ ) ≤ BER0 . Thus the average must be lower as a result. The BER curve for optimum power allocation and optimum L is also (unnecessary) lower than the one when only optimum L is considered. On the other hand, the throughput in form of ASE is 0.5 still bits/s/Hz higher when the power distribution is optimum. −5

Average BER

10

Optimal power, optimal L Equal power, optimal L

[3] M.-S. Alouini and A. J. Goldsmith, “Adaptive modulation over Nakagami fading channels,” Kluwer J. Wireless Communications, vol. 13, pp. 119–143, May 2000. [4] X. Cai and G. B. Giannakis, “Adaptive PSAM accounting for channel estimation and prediction errors,” to appear in IEEE Transactions on Wireless Communications, 2004. [5] H. Meyr, M. Moeneclaey, and S. A. Fechtel, Digital Communication Receivers: Synchronization, Channel Estimation and Signal Processing. John Wiley & Sons, 1998. [6] J. K. Cavers, “An analysis of pilot symbol assisted modulation for Rayleigh fading channels,” IEEE Transactions on Vehicular Technology, vol. 40, no. 4, pp. 686–693, Nov. 1991. [7] G. E. Øien, H. Holm, and K. J. Hole, “Impact of channel prediction on adaptive coded modulation performance in Rayleigh fading,” IEEE Transactions on Vehicular Technology, vol. 53, no. 3, pp. 758–769, May 2004.

−6

10

[8] C. W. Therrien, Discrete Random Signals and Statistical Signal Processing. Signal Processing Series. Englewood Cliffs, NJ: Prentice Hall, 1992. [9] N. M. Temme, Special Functions: An Introduction to the Classical Functions of Mathematical Physics. New York: John Wiley & Sons Inc., 1996. −7

10

5

10

15

20 25 Average CSNR

30

35

40

Figure 5: Overall average BER.

VI.

C ONCLUSION

In this paper we have investigated an ACM system where both the pilot symbol spacing, and the power allocation be-

[10] G. E. Øien, R. K. Hansen, D. V. Duong, H. Holm, and K. J. Hole, “Bit error rate analysis of adaptive coded modulation with mismatched and complexitylimited channel prediction,” in Proc. IEEE Nordic Signal Processing Symposium (NORSIG-2002), Norway, Oct. 2002.

I NTRODUCTION

Adaptive coded modulation is a promising technique to combat fading and improve spectral efficiency on wireless channels [1, 2, 3]. In order to adapt to the varying conditions of the channel, known symbols (pilot symbols) are typically transmitted along with the information symbols, at regular intervals. The pilots are extracted at the receiver side and are used to estimate and predict the channel. The predicted channel state information (CSI) is fed back to the transmitter. Based on this, the transmitter can dynamically adapt the rate and the power for the code to the mode which is most suitable for the channel, and improve ASE while keeping the instantaneous bit error rate (BER) below the target BER (BER0 ). The capacity-achieving adaptive scheme may be approximated by using discrete rate updating [2]. The scheme presented in this paper is a generalization of a recent paper on adaptive, uncoded modulation by Cai and Giannakis [4]. We extend their concept by looking into the case when channel coding is included. Specifically, we consider pilot-symbol-assisted trellis-coded modulation (PSAM). As in [4], the pilot period and the power allocation between pilot and data symbols are optimized, and both estimation and prediction errors in the optimization process are considered. As in [2], trellis codes designed for AWGN channels are used as component codes. The reader should be aware that some notation used in this paper has other meanings than in the reference paper, and that we omit the time indices wherever it is possible, for notational simplicity. D(z) represents the diagonal matrix with vector z on its diagonal. IK is the K × K identity matrix. Superscripts T , ∗ and H stand for transpose, complex conjugate, and Hermitian transpose, respectively. bzc denotes the biggest integer less than z. The remainder of this paper is organized as follows. Section II describes the system we are considering. BER ana-

lysis is treated in Section III and the adaptive PSAM optimization follows in Section IV. The concluding remarks can be found in Section VI. II.

S YSTEM OVERVIEW

We study the system illustrated in Fig. 1. The adaptive encoder/modulator and decoder/demodulator use a set of MQAM constellations of sizes {Mn }N n=1 with corresponding spectral efficiencies (SE)s {Rn }N n=1 . Here, R1 < R2 < · · · < RN . If the predicted CSNR falls into the interval γˆ ∈ [ˆ γn , γˆn+1 i, the nth constellation (i.e., SE Rn ) will be used to maintain BER(en |ˆ γ ) ≤ BER0 . No transmission is allowed when γˆ < γˆ1 since none of the codes can guarantee BER(en |ˆ γ ) ≤ BER0 in this interval. We define γˆN +1 = ∞ and the code corresponding to the highest possible rate is always utilized. A pilot symbol is inserted at the beginning of each frame and is followed by L − 1 information symbols. In the receiver, coherent detection is performed based on the estimates from the estimator. While the predictor is a causal filter, the estimator is constructed to utilize information both from past and future symbols. Therefore, we must allow our system to have an additional decoding delay. Both the estimator and the predictor is linear, and optimal in the maximum a posteriori (MAP) sense [5]. The predictor uses Kp pilot symbols from earlier pilot time instants to predict τ seconds ahead. Note that τ is the feedback channel delay, consisting of the actual transmission delay and processing time needed by the transmitter to activate the chosen code for transmission. Similar to [4], we denote the received, noisy and faded pilot symbols in complex baseband as p ypl (k) = Epl h(k; 0)s(k; 0) + η(k; 0) (1) and the received data symbols as p yd (k; l) = Ed h(k; l)s(k; l)+η(k; l), l ∈ [1, L−1], (2) where the index k counts the frame, and l counts the symbol within that frame. Ed and Epl is the power per data and per pilot symbol, respectively, to be optimized later in SecL−1 tion IV-A; s(k; 0) is the pilot symbol, and {s(k; l)}l=1 are the data symbols in the kth frame. Further calculation is done by assuming that E[|s(k; l)|2 ] = |s(k; 0)|2 = 1; η(·) is AWGN with zero mean, and variance N0 /2 per dimension, with the dimensions being uncorrelated. The channel fading envelope h(k; l) is a stationary complex Gaussian random process (RP) with zero mean, and variance σh2 = 1. The adaptive scheme presented here will optimize the pilot symbol spacing and at the same time find the optimal power allocation between pilot and information symbols to maximize the ASE while keeping BER below BER0 . Thus,

Adaptive encoder/ modulator

Power control

Pilot insertion

Zero−error Feedback channel

Fading channel

Adaptive decoder/ demodulator

Buffer

Power selection

Channel predictor

Constellation selection

Channel estimator

Figure 1: The adaptive PSAM system model

we need to analyze BER performance in the presence of both estimation and prediction errors. III.

γ(k; l) =

BER A NALYSIS

A. BER in the Presence of Channel Estimation Errors Let he (k; l) be the linear estimate of h(k; l). The estimation error is given by e (k; l) = h(k; l) − he (k; l). The mean square error (MSE) is defined as σ2e (l) = E[|e (k; l)|2 ]. Since the channel is a Gaussian RP with zero mean, he (k; l) and e (k; l) are zero mean Gaussian random variables. The purpose here is to find an expression for the BER after having estimated the channel and found the MSE σ2e (l). Let Ke be the order of the estimator. That is, this estimator uses Ke pilot symbols, ypl (k − bKe /2c), · · · , ypl (k + b(Ke − 1)/2c), to estimate one sample. If we define the pilot symbol vector s = [s(k−bKe /2c; 0), · · · , s(k+b(Ke − 1)/2c; 0)]T , h = [h(k − bKe /2c; 0), · · · , h(k + b(Ke − 1)/2c; 0)]T , R = E[hhH ], and re = E[hh∗ (k; l)], the linear channel estimator can be written as [4, 5] p −1 we = Epl (Epl D(s)RD∗ (s) + N0 IKe ) D(s)re . (3) Since the received samples are complex Gaussian, the optimally estimated channel in the MAP sense is a linear combination of the observations [5, p.741-742], and the estimated channel and the corresponding channel MMSE are given by [4, 5, 6] he (k; l) = weH y(k) and σ2e (l) = 1 −

(4)

p ∗ Epl rH e D (s)we

(5)

respectively, where y(k) = [ypl (k − bKe /2c), · · · , ypl (k + b(Ke − 1)/2c)]T . By assuming that the amplitude of the pilot symbols is constant for a given channel model, and defining uk as the kth eigenvector of R, with the corresponding eigenvalue λk , σ2e (l) in (5) can be expressed as σ2e (l) = 1 −

time of detection is given by [4]

Ke X |uH re |2 (1 − α)L¯ γ k

k=1

(1 − α)L¯ γ λk + 1

.

(6)

γ¯ in the above equation is the average CSNR and is equal to E[|h(k; l)|2 ]E/N0 = E/N0 , where E is the total average power the system is allowed to use (for both data and pilot symbols). It shows clearly that σ2e (l) depends on, among other things, α and L. The former determines the power allocation between pilot and data symbols (equal when α = 1 − 1/L). We will later optimize the system with respect to both α and L. Given an estimate of the channel, he (k; l), we do a symbol-by-symbol maximum √ likelihood detection by calculating z(k; l) = yd (k; l)/( Ed he (k; l)). The CSNR at the

Ed |he (k; l)|2 , N0 + gEd σ2e (l)

(7)

where g = 1 for 4-QAM, and g = 1.3 for M-QAM with M > 4. Tight approximations for trellis-coded modulation (TCM) BER performance on AWGN channels at high CSNR can be found in [2, 7]: ( bn an exp − M γ when γ ≥ γn,T n BER(en |γ) = (8) 1/2 when γ < γn,T , where Mn is the number of points in the symbol constellation used by the trellis code, an and bn are constellation dependent constants which can be found in [7, Tab. I], and γn,T = Mn ln(2an )/bn is the CNSR threshold where the BER expression goes from 1/2 to the exponential approximation. By replacing γ in (8) with (7), we have the conditional BER when we have the estimate of the channel. With this expression at hand, we will derive the BER when we have predicted the channel in the next subsection. B.

BER in the Presence of Both Channel Estimation and Prediction Errors

The linear channel predictor predicts a sample of the set {h(k; l)}L−1 l=1 by using Kp pilot symbols in the past. Let hp (k; l) denotes the predicted channel. The prediction error is p (k; l) = h(k; l) − hp (k; l), and the MSE of the predicted channel is σ2p (l) = E[|p (k; l)|2 ]. The predicted channel hp (k; l) is also Gaussian distributed with zero mean and variance σh2 p (l) = σh2 − σ2p (l) = 1 − σ2p (l). Suppose now that the feedback delay τ = DLTs , where D is a positive integer and Ts is the channel symbol duration. By defining the pilot symbol vector s = [s(k − D; 0), · · · , s(k − D − Kp + 1; 0)]T , h = [h(k − D; 0), · · · , h(k − D − Kp + 1; 0)]T , R = E[hhH ], and rp (l) = E[hh∗ (k; l)], the MMSE of the predicted channel is written as p ∗ σ2p (l) = 1 − Epl rH (9) p D (s)wp , where wp =

p −1 Epl Epl D(s)RD∗ (s) + N0 IKp D(s)rp (10)

is the causal predictor. Similar to the estimation case, σ2p (l) in (9) can be written as Kp 2 X |uH γ 2 k rp | (1 − α)L¯ σp (l) = 1 − . (11) (1 − α)L¯ γ λk + 1 k=1

The estimated channel can be written as a sum of the predicted channel and the errors as he (k; l) = hp (k; l) + p (k; l) − e (k; l).

(12)

Note that hp (k; l) and e (k; l) are correlated governed by ρ = E[e (k; l)h∗p (k; l)]/σh2 p (l) [4], while hp (k; l) and p (k; l) are uncorrelated due to the orthogonality principle [8]. As a result, when hp (k; l) is given, he (k; l) is Gaussian distributed with mean E[he (k; l)|hp (k; l)] = (1− ρ)hp (k; l), and variance σ ˜h2 e = E[|p (k; l) − e (k; l) + 2 ρhp (k; l)| ]. In practice ρ takes on small values [4], thus, we set it to zero in our further calculations and simulations. We assume that e (k; l) and p (k; l) are uncorrelated, and, thus, σ ˜h2 e = σ2e (l) + σ2p (l). Hence, once the predicted channel is given, the amplitude of the estimated channel follows a Rice distribution with Rician factor K = |(1 − ρ)hp (k; l)|2 /˜ σh2 e = |hp (k; l)|2 /˜ σh2 e . Then the BER conditioning on hp is given by Z ∞ BER(en hp ) = BER(en |he |)p(|he | hp )d|he |, (13) 0

where p(|he | hp ) denotes the Rice pdf and BER(en |he |) = BER(en |he ) in (8) (with γ replaced by (7)). Combining (7) and (8) and using it in (13) yields BER(en hp ) = T1 − T21 + T22 , (14) where ∞

Z T1 =

−M

an e

bn Ed |he |2 2 (l)) n (N0 +gEd σ e

p(|he | hp )d|he | (15)

0 ψn,T

Z T21 =

−M

an e

bn Ed |he |2 2 (l)) n (N0 +gEd σ e

p(|he | hp )d|he |(16)

0

T22

1 = 2

Z

ψn,T

p(|he | hp )d|he |.

(17)

0

The integration limit ψn,T in equations (16) and (17) reflects the limit γn,T in (8) and is found by solving (8) (with γ replaced by (7), and wrt. |he |) when the two cases are equal. Thus, ψn,T = |he | = p (Mn (E/¯ γ + gEd σ2e (l)) ln(2an ))/(bn Ed ). Closed-form solutions for those integrals are achievable but too longwinded to reproduce here. An expression for BER given the predicted CSNR, BER(en |ˆ γ ), is found by defining the instantaneous predicted CSNR as γˆ = E¯d |hp |2 /N0 and using it in the solutions of the integrals (15)–(17). The result is shown in (18) at the bottom of this page, where Q(·, ·) is the normalized incompleted gamma function [9,

Eq. (11.3)], An = bn /(Mn (E/¯ γ + gEd σ2e (l))) and dn = 2 1/(An Ed σ ˜he + 1). Now, the average predicted CSNR is given by γ¯ˆ = E¯d σh2 p (l)/N0 = E¯d (1 − σ2p (l))/N0 . Since the predicted channel is complex Gaussian with zero mean, the pdf of the predicted CSNR is exponentially distributed and given by 1 γˆ p(ˆ γ ) = ¯ exp − ¯ . (19) γˆ γˆ This pdf will be used when we derive the overall BER in the next section. IV. A.

A DAPTIVE PSAM

ASE Analysis

As indicated earlier, E is the total average power for both pilot and data symbols. Thus, the average power per data symbol is E¯d = αLE/(L − 1), and per pilot symbol it is Epl = (1 − α)LE. The goal of this section is to design an adaptive system based on the CSI. The system will adapt the code, the pilot spacing L as well as the power allocation ratio α in such a way that ASE is maximized while keeping BER(en |ˆ γ ) ≤ BER0 . The approach in this paper is accounted for both estimation and prediction errors, and it can be seen from the expression for BER(en γˆ ) in (18). The variance of the predicted channel is largest when we predict the last symbol in a frame (i.e., l = L − 1), while the variance of the estimated channel is almost the same for all l when the estimator Ke ≥ 20 [4]. Thus, we use σ2e (L − 1) and the conservative choice of σ2p (L−1) when deriving the switching thresholds as well as in the optimization process. The optimal switching thresholds, {ˆ γn }N n=1 , are found by solving BER(en |ˆ γ ) = BER0 wrt. γˆn . If γˆ ∈ [ˆ γn , γˆn+1 i, the nth constellation will be used. No transmission takes place when γˆ < γˆ1 . We assume that γˆ0 = 0. To this end, we have to use numerical search approach since the expression for BER(en |ˆ γ ) can not give an analytical solution in the coded case (in constrast to the uncoded case of [4]). Since there is no transmission when γˆ < γˆ1 , the actual transmission data power is [1, 4] Ed = R ∞ γ ˆ1

E¯d = E¯d exp(ˆ γ1 /γ¯ˆ ). p(ˆ γ )dˆ γ

(20)

For a 2G-dimensional1 trellis code, the SE for the nth constellation will be Rn = (1 − 1/L)(log2 (Mn ) − 1/G), where the term 1 − 1/L accounts for that every Lth channel 1 One 2G-dimensional symbol is transmitted as a sequence of G consecutive complex coded symbols.

γˆ dn An EEd BER(en |ˆ γ ) = an dn exp − γ¯ E¯d ! ∞ !m γˆ dn E(An Ed + 1/˜ σh2 e ) X 1 γˆ dn E 1 − Q 1 + m, (ψn,T )2 /(dn σ ˜h2 e ) − an dn exp − 2 ¯ ¯ m! γ¯ Ed σ γ¯ Ed ˜he m=0 ! ∞ !m X 1 1 γˆ E γˆ E + exp − ¯ 2 1 − Q 1 + m, (ψn,T )2 /˜ σh2 e (18) 2 ¯ 2 γ¯ Ed σ ˜he m=0 m! γ¯ Ed σ ˜he

symbol is a pilot, where no information data are transmitted. Let Pn = P (ˆ γn < γˆ < γˆn+1 ), i.e., the probability that γˆ ∈ [ˆ γn , γˆn+1 i. The ASE is given by N L−1 X Rn Pn L n=1 ( γˆ1 L−1 exp − ¯ (log2 (M1 ) − 1/G) = L γˆ ) N X Mn γˆn . (21) + exp − ¯ log2 Mn−1 γˆ n=2

ASE =

Because of Nyquist signaling, L can not exceed Lmax = b1/(2fd Ts )c [6, 7], where fd is the Doppler frequency and Ts is the channel symbol duration. For L ∈ [2, Lmax ] we have the following optimization problem ASE(α) 0 < α < 1.

1

0.95

0.9

(22)

0.85

At this point, the optimization process does not include Ed in the constraint as in [4]. From (20) we see that Ed depends on the first threshold (hence the first constellation). Thus, for the first constellation, given a value of α, Ed is found by letting it vary within the whole range of γˆ . Then, solution of BER(en |ˆ γ ) = BER0 for that Ed value yields the γˆ1 (which is optimal if α is optimal). Once γˆ1 is known, Ed is explicitly given by (20), and it will be used for finding the other thresholds. In that way, the maximization process of (22) is done easily by picking the α ∈ h0, 1i which maximizes the ASE. After solving (22) for each L, the maximum ASE is found by searching over all possible L. We can therefore find the optimum values of L and α.

0.8 α

maxα subject to

lations of sizes {Mn } = {4, 8, 16, 32, 64, 128, 256, 512} to encode and decode eight 4-dimensional trellis codes (as in [2, 7]). The carrier frequency is 2 GHz and the channel symbol duration is 5 µs. This corresponds to a channel bandwidth of 200 kHz if using Nyquist signaling. We let the feedback delay τ = 1 ms or τ = 0.2/fd (which is moderately large, see for instance [3]), and the mobile speed v = 30 m/s. As the result, the Doppler frequency is fd = 200 Hz. We set the BER0 = 10−5 , and choose the order of the estimator and predictor to be Ke = 20 and Kp = 250, respectively. The choice of Kp = 250 leads to a suboptimal but satisfactory predictor [10]. Note that the parameters are chosen to be the same as in [4]. In that way, we can easily compare our results.

0.75

0.7

0.65

0.6

0.55

Z

10

15

20 25 Average CSNR

30

35

40

35

40

Optimal power Equal power

Average BER Performance Analysis 60

50

40 L

In the previous subsection, the optimal switching thresholds, {ˆ γn }N n=1 , are found to meet the BER0 constraint. Those thresholds are used here to find the overall average BER. First we need the average BER for the nth constellation. The average BER for the nth constellation can be found by

5

Figure 2: Optimum power allocation.

70

B.

Optimally allocated Equally allocated

30

γ ˆn+1

BER(en |ˆ γ )p(ˆ γ )dˆ γ

BER(en ) =

20

γ ˆn

= V1 − V21 + V22 .

(23)

The overall average BER (averaged over all codes) is given as the ratio between the average number of bits in error, and the number of bits transmitted in total [3, 7]: PN Rn BER(en ) BER = n=1 (24) PN n=1 Rn Pn Closed-form solution for (23) is also achievable and it is also too long-winded to reproduce here. V.

R ESULTS AND D ISCUSSION

To illustrate the scheme presented so far we consider an ACM system which utilizes a set of 8 QAM signal constel-

10

0

5

10

15

20 25 Average CSNR

30

Figure 3: Optimum pilot spacing. Fig. 2 shows the optimum power ratio between pilot and data symbols, while Fig. 3 shows the optimum pilot period. We see that more power is allocated to pilot symbols than data symbols. This can partially be explained by the fact that we already have protected our data, so less power is needed. Thus, the power can be more freely distributed to the pilots. When the CSNR is about 30 dB and higher, the spacing between two pilots increases steeply (cf. Fig. 3).

To correspond to the large pilot spacing, more power is put on the pilot symbols to maintain the BER0 . That explains why the curve for optimum α bends down for high CSNRs. We observe that our optimum ASE is approximately 0.7 bits/s/Hz higher than for uncoded M-QAM for the same system (cf. Fig. 4 and [4, Fig. 5]). This corresponds to a gain of about 3 dB in average CSNR. This gain is due to the time diversity provided by coding [5, pp.680–681]. 8

tween pilot and data symbols, are optimized. Both channel estimation errors in the receiver and channel prediction errors at the transmitter have been taken into account. The ASE performance of our system, which is based on TCM, is approximately 0.7 bits/s/Hz higher than for uncoded MQAM. The performance is also higher than when pilot symbols are transmitted at a fixed rate, or when both pilot and data symbols have equal power. The BER performance curves lie below the target BER for the whole CSNR range. A very interesting aspect is to include receive antenna diversity in this system; this is a topic for further research.

Average spectral efficiency [bits/s/Hz]

7

R EFERENCES

6

[1] A. J. Goldsmith and S.-G. Chua, “Variable-rate variable-power MQAM for fading channels,” IEEE Transactions on Communications, vol. 45, no. 10, pp. 1218–1230, Oct. 1997.

5

4

[2] K. J. Hole, H. Holm, and G. E. Øien, “Adaptive multidimensional coded modulation over flat fading channels,” IEEE Journal on Selected Areas in Communications, vol. 18, no. 7, pp. 1153–1158, July 2000.

3

2

1

0

Optimal power, optimal L Optimal power, L = 10 Equal power, optimal L 5

10

15

20 25 Average CSNR

30

35

40

Figure 4: Average spectral efficiency. Average BER is plotted in Fig. 5, and it is always lower than BER0 . This is as expected, since the thresholds for switching between different codes are calculated to keep the BER(en |ˆ γ ) ≤ BER0 . Thus the average must be lower as a result. The BER curve for optimum power allocation and optimum L is also (unnecessary) lower than the one when only optimum L is considered. On the other hand, the throughput in form of ASE is 0.5 still bits/s/Hz higher when the power distribution is optimum. −5

Average BER

10

Optimal power, optimal L Equal power, optimal L

[3] M.-S. Alouini and A. J. Goldsmith, “Adaptive modulation over Nakagami fading channels,” Kluwer J. Wireless Communications, vol. 13, pp. 119–143, May 2000. [4] X. Cai and G. B. Giannakis, “Adaptive PSAM accounting for channel estimation and prediction errors,” to appear in IEEE Transactions on Wireless Communications, 2004. [5] H. Meyr, M. Moeneclaey, and S. A. Fechtel, Digital Communication Receivers: Synchronization, Channel Estimation and Signal Processing. John Wiley & Sons, 1998. [6] J. K. Cavers, “An analysis of pilot symbol assisted modulation for Rayleigh fading channels,” IEEE Transactions on Vehicular Technology, vol. 40, no. 4, pp. 686–693, Nov. 1991. [7] G. E. Øien, H. Holm, and K. J. Hole, “Impact of channel prediction on adaptive coded modulation performance in Rayleigh fading,” IEEE Transactions on Vehicular Technology, vol. 53, no. 3, pp. 758–769, May 2004.

−6

10

[8] C. W. Therrien, Discrete Random Signals and Statistical Signal Processing. Signal Processing Series. Englewood Cliffs, NJ: Prentice Hall, 1992. [9] N. M. Temme, Special Functions: An Introduction to the Classical Functions of Mathematical Physics. New York: John Wiley & Sons Inc., 1996. −7

10

5

10

15

20 25 Average CSNR

30

35

40

Figure 5: Overall average BER.

VI.

C ONCLUSION

In this paper we have investigated an ACM system where both the pilot symbol spacing, and the power allocation be-

[10] G. E. Øien, R. K. Hansen, D. V. Duong, H. Holm, and K. J. Hole, “Bit error rate analysis of adaptive coded modulation with mismatched and complexitylimited channel prediction,” in Proc. IEEE Nordic Signal Processing Symposium (NORSIG-2002), Norway, Oct. 2002.