Achievable Rates for Shaped Bit-Metric Decoding

1

Achievable Rates for Shaped Bit-Metric Decoding Georg Böcherer, Member, IEEE

arXiv:1410.8075v4 [cs.IT] 28 Sep 2015

Abstract A new achievable rate for bit-metric decoding (BMD) is derived using random coding arguments. The rate expression can be evaluated for any input distribution, and in particular the bit-levels of binary input labels can be stochastically dependent. Probabilistic shaping with dependent bit-levels (shaped BMD), shaping of independent bit-levels (bit-shaped BMD) and uniformly distributed independent bit-levels (uniform BMD) are evaluated on the additive white Gaussian noise (AWGN) channel with bipolar amplitude shift keying (ASK). For 32-ASK at a rate of 3.8 bits/channel use, the gap to 32-ASK capacity is 0.008 dB for shaped BMD, 0.46 dB for bit-shaped BMD, and 1.42 dB for uniform BMD. These numerical results illustrate that dependence between the bit-levels is beneficial on the AWGN channel. The relation to the generalized mutual information (GMI) is discussed. Index Terms bit-metric decoding, bit-interleaved coded modulation (BICM), achievable rate, amplitude shift keying (ASK), binary labeling

I. I NTRODUCTION Bit-interleaved coded modulation (BICM) combines high order modulation with binary error correcting codes [2], [3]. This makes BICM attractive for practical application and BICM is widely used in standards, e.g., in DVB-T2/S2/C2. At a BICM receiver, bit-metric decoding (BMD) is used [4, Sec. II]. For BMD, the channel input is labeled by bit strings of length m. The m bit-levels are treated independently at the decoder. Let B = (B1 , B2 , . . . , Bm ) denote a vector of m binary random variables Bi , i = 1, 2, . . . , m, representing the bit-levels. Consider the channel PY |B with output Y and define RBMD := H(B) −

m X

H(Bi |Y )

(1)

i=1

where H(·) and H(·|·) denote entropy and conditional entropy, respectively. Define Rind BMD

:=

m X

I(Bi ; Y )

(2)

i=1

where I(·; ·) denotes the mutual information. If the bit-levels are independent, we have RBMD = Rind BMD . Martinez et al. showed in [4] that (2) with independent and uniformly distributed bit-levels is achievable with BMD. We call this method uniform BMD. Guillén i Fàbregas and Martinez [5] generalized the result of [4] to non-uniformly distributed independent bit-levels. We call this method bit-shaped BMD. An important tool to assess the performance of decoding metrics is the generalized mutual information (GMI) [6, Sec. 2.4]. An interpretation of uniform BMD and bit-shaped BMD as a GMI are given in [4] and [5], respectively. In [7, Sec. 4.2.4], the GMI is evaluated for a bit-metric. It is observed that the GMI increases when the bits are dependent. We call this approach shaped GMI. Besides the GMI, other methods to evaluate decoding metrics exist [8], [9], which can also be applied to BICM [10]. Our main contribution is to show that RBMD in (1) with arbitrarily distributed bit-levels is achievable with BMD. In particular, the bit-levels can be dependent, in which case RBMD is not equal to Rind BMD . We call our method shaped BMD. For example, consider the additive white Gaussian noise (AWGN) channel with A part of this work has been presented at ISIT 2014 in Honolulu [1]. The author is with the Institute for Communications Engineering, Technische Universität München, Munich D-80333, Germany (e-mail: [email protected]).

2

4.2

bits/channel use

4.1

1 2

log2 (1 + SNR) ASK capacity C shaped BMD shaped GMI [7, Sec. 4.2.4] bit-shaped BMD [5] uniform ASK uniform BMD [4]

4

3.9

3.8 22.5

23

23.5

24

24.5

25

25.5

26

SNR in dB

Fig. 1. Achievable rates for bipolar ASK with 32 equidistant signal points, see Sec. IV. At 3.8 bits/channel use, bit-shaped BMD is 0.46 dB less energy efficient than shaped BMD.

bipolar amplitude shift keying (ASK), see Sec. IV for details. We display information rate results for 32ASK in Fig. 1. At a rate of 3.8 bits/channel use, the gap to ASK capacity of shaped BMD is 0.008 dB, the gap for shaped GMI is 0.1 dB, the gap for bit-shaped BMD is 0.46 dB, and the gap is 1.42 dB for uniform BMD. Dependence between the bit-levels is thus beneficial on the AWGN channel. The rate expression (1) is used in [11] to construct surrogate channels, which are used to design low-density parity-check codes for shaped BMD. This paper is organized as follows. We state our main result in Sec. II. We give an interpretation in terms of GMI in Sec. III and we discuss its application to the AWGN channel in Sec. IV. Sec. V concludes the paper and the appendix provides a technical results. II. ACHIEVABLE R ATE Let PY |B be a discrete memoryless channel (DMC) with input B = (B1 , B2 , . . . , Bm ) and output Y taking values in Y. Let C be a codebook with codewords bn = (b1 , b2 , . . . , bn ) with bi ∈ {0, 1}m . We denote the ith bit-level of a codeword by bni = (bi1 , bi2 , . . . , bin ). A bit-metric decoder uses a decision rule of the form m Y argmax qi (bni , y n ) (3) bn ∈C

i=1

where for each bit-level i, the value of the bit-metric qi (bni , y n ) depends on the distribution PBY only via the marginal X PBi Y (b, y) = PY |B (y|a)PB (a). (4) a∈{0,1}m : ai =b

Theorem 1. Let PB be a distribution on {0, 1}m and let PY |B be a DMC with finite output alphabet. If R < RBMD then reliable transmission at rate R can be achieved by a bit-metric decoder.

3

Proof: See Appendix A. Remark 1. Theorem 1 generalizes to discrete input–continuous output channels by following the procedure described in [12, Sec. 3.4.1], see also [12, Remark 3.8]. A. Dependent Bit-Levels Can Be Better We develop a simple contrived example to show that dependent bit-levels can be better than independent bit-levels. Consider the identity channel with input label B1 B2 and transition probabilities ∀ab ∈ {00, 01, 10, 11}.

PY |B1 B2 (ab|ab) = 1, Consider the input cost function f satisfying

f (00) = f (11) = ∞, f (01) = f (10) = 0 and suppose we impose the average cost constraint E[f (B1 B2 )] < ∞, where E[·] denotes expectation. For independent bit-levels B1 and B2 , this constraint can be achieved only by PB1 (0) = PB2 (1) = 1 or PB1 (1) = PB2 (0) = 1. In both cases, we have RBMD = H(B) −

2 X

H(Bi |Y ) = 0.

(5)

i=1

We next choose PB1 B2 (01) = PB1 B2 (10) = 1/2, which makes the bit-levels dependent. The average input cost is zero and we have m X RBMD = H(B) − H(Bi |Y ) = 1. (6) i=1

We conclude that for the considered input-constraint channel, no positive rate is achievable with independent bit-levels and any rate below one is achievable with dependent bit-levels. B. BMD Rate Can Be Negative Consider the erase-all channel with output alphabet {e} and transition probabilities PY |B1 B2 (e|ab) = 1,

∀ab ∈ {00, 01, 10, 11}.

For the input distribution PB1 B2 (01) = PB1 B2 (10) = 1/2, we compute RBMD = H(B) −

2 X

H(Bi |Y ) = 1 − 2 = −1.

(7)

i=1

Since there is no information rate smaller than 0, the statement of Theorem 1 is meaningless for this example. For independent bit-levels, RBMD is equal to (2) and therefore non-negative for any bit-distributions. III. M ISMATCHED D ECODING P ERSPECTIVE In this section, we provide an interpretation of shaped BMD (1) as a GMI and we discuss previously proposed applications of GMI to BMD.

4

A. Generalized Mutual Information The maximum likelihood (ML) decoder for a code C is n

fML (y ) = argmax bn ∈C

n Y

(8)

PY |B (yi |bi ).

i=1

A mismatched decoder uses a metric Q(b, y) instead of the channel likelihood PY |B and calculates the estimate n Y n Q(bi , yi ). (9) fQ (y ) = argmax bn ∈C

i=1

The GMI is [6, Sec. 2.4] "

Q(B, Y )s IGMI (PB , Q, s) := E log2 P s b∈supp PB PB (b)Q(b, Y )

# (10)

where supp PB = {b ∈ {0, 1}m : PB (b) > 0} is the support of PB . A mismatched decoder with metric Q can achieve the rate [6] (11)

max IGMI (PB , Q, s). s≥0

B. RBMD as Generalized Mutual Information Theorem 2. For the metric m

Qm QBMD (b, y) :=

PBi (bi ) Y PY |Bi (y|bi ) PB (b) i=1

(12)

i=1

we have (a)

IGMI (PB , QBMD , 1) ≥

m X

I(Bi ; Y ) − D(PB k

Qm

i=1

PBi )

(13)

i=1

= RBMD

(14)

with equality in (a) if and only if PB is strictly positive, i.e., if Pb (b) > 0 for all b ∈ {0, 1}m . Proof: See Appendix 2. Remark 2. For independent bit-levels, the informational divergence in (13) is equal to zero and Theorem 2 recovers [5, Corollary 1]. If the bit-levels are independent and uniformly distributed, Theorem 2 recovers [4, Corollary 1]. Qm The expression (13) can be interpreted as follows. The factor Pm i=1 PY |Bi (y|bi ) of QBMD accounts for the channel mismatch at the decoder, and results in the term i=1 I(Bi ; Y ), which can Qmbe achieved on the parallel bit-channels P by a large codebook that is generated iid according to Y |Bi i=1 PBi . The factor Qm Q P (b ) m i B i=1 i of QBMD accounts for the codebook mismatch and results in D(PB k i=1 PBi ), which is the PB (b) rate loss when of all codewords in the large codebook, only those are transmitted that are typical with respect to PB . A related work on codebook mismatch is [13]. In particular, (13) is reminiscent of [13, Eq. (2.9)].

5

C. Shaped GMI In [7, Sec. 4.2.4], dependent bit-levels are used together with the metric ˜ y) = Q(b,

m Y

PY |Bi (y|bi )

(15)

i=1

and the shaped GMI rate ˜ s) RsGMI = max IGMI (PB , Q, s≥0

(16)

is evaluated. Since the GMI is equal to 0 for s = 0, RsGMI is non-negative and can by Sec. II-B be larger than RBMD . In the next section, we will see that for bipolar ASK on the AWGN channel, RBMD is larger than RsGMI . IV. 2m -ASK M ODULATION FOR THE AWGN C HANNEL The signal constellation of bipolar ASK is given by XASK = {±1, ±3, . . . , ±(2m − 1)}.

(17)

The points x ∈ XASK are labeled by a binary vector B = (B1 , . . . , Bm ). We use the Binary Reflected Gray Code (BRGC) [14]. The labeling influences the rate that is achievable by BMD, see, e.g., [15]. To control the transmit power, the channel input xB is scaled by a positive real number ∆. The input-output relation of the AWGN channel is Y = ∆ · xB + Z

(18)

where Z is zero mean Gaussian noise with variance one. The input is subject to an average power constraint P, i.e., ∆ and PB must satisfy E[(∆xB )2 ] ≤ P. The ASK capacity is C=

max

∆,PB : E[(∆xB )2 ]≤P

I(B; Y ).

(19)

The optimal parameters ∆∗ , PB∗ can be calculated using the Blahut-Arimoto algorithm [16], [17] and they can be approximated closely by maximizing over the family of Maxwell-Boltzmann distributions [18]. We evaluate RBMD (shaped BMD) and RsGMI (shaped GMI) in ∆∗ , PB∗ . In Fig. 1, we plot for 32 signal points (m = 5) the ASK capacity C and the information rate curves of shaped BMD and shaped GMI together with the corresponding rate curves that result from uniform inputs. Since we normalized the noise power to one, the signal-to-noise ratio (SNR) in dB is given by E[(∆xB )2 ] . (20) 1 The gap between the 32-ASK capacity C and the shaped BMD rate RBMD is negligibly small over the considered SNR range. At 3.8 bits/channel use, the gap between C and RBMD is 0.008 dB and the gap of sGMI is 0.1 dB. For comparison, we calculate the bit-shaped BMD rate. The optimization problem is SNR = 10 log10

maximize PB ,∆

subject to

m X

I(Bi ; Y )

i=1

PB =

m Y

(21) PB i ,

E[(∆xB )2 ] ≤ P.

i=1

This is a non-convex optimization problem [19], [20] so we calculate a solution by exhaustive search over the bit distributions with a precision of ±0.005. The resulting rate curve is displayed in Fig. 1. We observe that bit-shaped BMD (independent bit-levels) is 0.46 dB less energy efficient than shaped BMD (dependent bit-levels) at 3.8 bits/channel use.

6

V. C ONCLUSIONS The achievable rate in (1) allows dependence between the bit-levels while the achievable rate in (2) (see [4], [5]) requires independent bit-levels. We have shown that on the AWGN channel under bit-metric decoding, dependent bit-levels can achieve higher rates than independent bit-levels. ACKNOWLEDGMENT This work was supported by the German Ministry of Education and Research in the framework of an Alexander von Humboldt Professorship. The author is grateful to G. Kramer and F. Steiner for helpful comments on drafts. A PPENDIX A P ROOF OF T HEOREM 1 We prove Theorem 1 by random coding arguments. We use letter typicality as defined in [21, Chap. 1]. Tn (PBY ) is the set of sequences bn , y n that are jointly -typical with respect to PBY . The set of conditionally typical sequences is defined as Tn (PBY |y n ) := {bn : (bn , y n ) ∈ Tn (PBY )}.

(22)

In the following, let 1 > > 0. Code Construction: Choose 2nR codewords B n (w), w = 1, 2, . . . , 2nR of length n by choosing the n · 2nR symbols independently according to PB . Denote the resulting set by C˜ and define the set of valid codewords by C := C˜ ∩ Tn (PB ). Encoding: Given message w ∈ {1, 2, 3, . . . , 2nR }, transmit B n (w). Decoding: We define the bit metric ( 1, bni ∈ Tn1 (PBi Y |y n ) qi (bni , y n ) = (23) 0, otherwise. The corresponding decoding metric is n

n

q(b , y ) =

m Y

qi (bni , y n ).

(24)

i=1

ˆ n ) := {bn ∈ C : q(bn , y n ) = 1}. The decoder output is Define the set B(y ( ˆ n ) = {bn } bn , if B(y error, otherwise. Analysis: Suppose message w was encoded. The two error events are ˆ n) E1 := B n (w) ∈ / B(Y ˆ n) . E2 := ∃w˜ 6= w : B n (w) ˜ ∈ B(Y

(25)

(26) (27)

n First error event: The event E1 uses the distribution PBY . We have

Pr(E1 ) = 1 − Pr[q(B n (w), Y n ) = 1] "m # \ = 1 − Pr Bin (w) ∈ Tn1 (PBi Y |Y n ) i=1 (a)

≤ 1 − Pr (B n (w), Y n ) ∈ Tn1 (PBY ) (b) n→∞

→ 0

(28)

7

where (a) follows because joint typicality implies marginal typicality [21, Sec. 1.5]. The limit (b) follows by [21, Theorem 1.1]. Second error event: Define the event A := {Y n ∈ Tn (PY )}. We have (a)

(29)

≤ Pr[E2 |A] + Pr[A{ ]

(30)

Pr[E2 ] = Pr[A] Pr[E2 |A] + Pr[A{ ] Pr[E2 |A{ ]

where A{ is the complement of A and where we used the law of total probability in (a). The probability of A{ approaches 0 as n → ∞, by [21, Theorem 1.1]. It therefore suffices to bound Pr[E2 |A]. We have " # [ ˆ n ) A Pr[E2 |A] = Pr B n (w) ˜ ∈ B(Y (31) w6 ˜ =w (a)

X

ˆ n )|A] Pr[B n (w) ˜ ∈ B(Y

(32)

ˆ n )|A] Pr[B n ∈ B(Y

(33)

ˆ n )|A] ≤ 2nR Pr[B n ∈ B(Y

(34)

≤

w6 ˜ =w (b)

=

X w6 ˜ =w

where (a) follows by the union bound and where B n in (33) is independent of Y n and distributed according to PBn . We have equality in (b) because B n (w) ˜ and Y n are independent for w˜ 6= w. We now have X (a) ˆ n )|A, Y n = y n ] ˆ n )|A] = Pr[Y n = y n |A] Pr[B n ∈ B(y (35) Pr[B n ∈ B(Y y n ∈Y n (b)

=

X

ˆ n )] Pr[Y n = y n |A] Pr[B n ∈ B(y

(36)

y n ∈Y n (c)

=

X

ˆ n )] Pr[Y n = y n |A] Pr[B n ∈ B(y

y n ∈Tn (PY

(37)

)

where (a) follows by the law of total probability, (b) follows because B n and Y n are independent and ˆ n )] for y n ∈ T n (PY ). By [21, (c) follows because we condition on A. We next bound Pr[B n ∈ B(y Theorem 1.2], we have |Tn1 (PBi Y |y n )| ≤ 2n H(Bi |Y )(1+1 ) .

(38)

ˆ n ) is thus bounded as The size of B(y ˆ n )| ≤ 2n |B(y

Pm

i=1

H(Bi |Y )(1+1 )

.

(39)

Furthermore, by [21, Theorem 1.1], we have Pr[B n = bn ] ≤ 2−n H(B)(1−) ,

if bn ∈ Tn (PB ).

(40)

We have ˆ n )] = Pr[B n ∈ B(y

X

Pr[B = b]

(41)

2−n H(B)(1−)

(42)

ˆ n) bn ∈B(y (a)

≤

X ˆ n) bn ∈B(y

(b)

≤ 2n

Pm

i=1

H(Bi |Y )(1+1 ) −n H(B)(1−)

2

(43)

8

ˆ n ) ⊆ T n (PB ) and (b) follows by (39). Using (43) in (37) and where (a) follows by (40) and because B(y (37) in (34), we have X Pm Pr[Y n = y n |A] · 2n i=1 H(Bi |Y )(1+1 ) 2−n H(B)(1−) (44) Pr[E2 |A] ≤ 2nR y n ∈Tn (PY ) P nR n m i=1 H(Bi |Y )(1+1 ) −n H(B)(1−)

=2

2

X

2

Pr[Y n = y n |A]

(45)

y n ∈Tn (PY )

≤ 2nR 2n

Pm

i=1

H(Bi |Y )(1+1 ) −n H(B)(1−)

2

(46)

.

The term in (46) approaches zero as n → ∞ if m hX i R+ H(Bi |Y )(1 + 1 ) − H(B)(1 − ) < 0.

(47)

i=1

Using

Pm

i=1

H(Bi |Y ) ≤ m and H(B) ≤ m, we have that the term in (46) approaches zero as n → ∞ if R < H(B) −

m X

(48)

H(Bi |Y ) − m(1 + )

i=1

for any 0 < < 1 . Theorem 1 is proved by choosing large n. A PPENDIX B P ROOF OF T HEOREM 2 We have " # m Y P (B ) i i=1 Bi + E log2 PY |Bi (Y |Bi ) PB (B) i=1 {zQ } | {z }

Qm

IGMI (PB , QBMD , 1) = E log2 |

=− D(PB k

m i=1

PBi )

=−

" − E log2

X

Pm

i=1

H(Y |Bi )

!# m Y P (b ) B i i i=1 . PY |Bj (Y |bj ) PB (b) j=1 {z }

Qm PB (b)

b∈supp PB

|

(49)

(?)

For the term (?), we have    " !# m m X Y X Y (a) (?) = E log2 PBi (bi )PY |Bi (Y |bi ) ≤ E log2  PBi (bi )PY |Bi (Y |bi ) (50) b∈supp PB i=1

b∈{0,1}m i=1

 = E log2

 PBi (b)PY |Bi (Y |b) 

m X Y

(51)

i=1 b∈{0,1}

" = E log2

m Y

# PY (Y )

(52)

i=1

=−

m X i=1

H(Y )

(53)

9

with equality in (a) if and only if PB is strictly positive. Using (53) in (49), we have m X Q IGMI (PB , QBMD , 1) ≥ H(Y ) − H(Y |Bi ) − D(PB k m i=1 PBi ) (a)

=

i=1 m X

I(Bi ; Y ) − D(PB k

Qm

i=1

(54) (55)

PBi )

i=1

with equality in (a) if and only if PB is strictly positive. We next prove (14). We have m m m i X X hX Qm I(Bi ; Y ) − D(PB k i=1 PBi ) = H(Bi ) − H(Bi |Y ) − H(Bi ) − H(B) i=1

i=1

= H(B) −

(56)

i=1 m X

H(Bi |Y ).

(57)

i=1

R EFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21]

G. Böcherer, “Probabilistic signal shaping for bit-metric decoding,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), 2014, pp. 431–435. E. Zehavi, “8-PSK trellis codes for a Rayleigh channel,” IEEE Trans. Commun., vol. 40, no. 5, pp. 873–884, 1992. G. Caire, G. Taricco, and E. Biglieri, “Bit-interleaved coded modulation,” IEEE Trans. Inf. Theory, vol. 44, no. 3, pp. 927–946, 1998. A. Martinez, A. Guillén i Fàbregas, G. Caire, and F. Willems, “Bit-interleaved coded modulation revisited: A mismatched decoding perspective,” IEEE Trans. Inf. Theory, vol. 55, no. 6, pp. 2756–2765, 2009. A. Guillén i Fàbregas and A. Martinez, “Bit-interleaved coded modulation with shaping,” in Proc. IEEE Inf. Theory Workshop (ITW), 2010, pp. 1–5. G. Kaplan and S. Shamai (Shitz), “Information rates and error exponents of compound channels with application to antipodal signaling ¨ vol. 47, no. 4, pp. 228–239, 1993. in a fading environment,” AEU, L. Peng, “Fundamentals of bit-interleaved coded modulation and reliable source transmission,” Ph.D. dissertation, University of Cambridge, 2012. N. Merhav, G. Kaplan, A. Lapidoth, and S. Shamai (Shitz), “On information rates for mismatched decoders,” IEEE Trans. Inf. Theory, pp. 1953–1967, 1994. A. Ganti, A. Lapidoth, and E. Telatar, “Mismatched decoding revisited: General alphabets, channels with memory, and the wide-band limit,” IEEE Trans. Inf. Theory, pp. 2315–2328, 2000. L. Peng, A. Guillén i Fàbregas, and A. Martinez, “Improved exponents and rates for bit-interleaved coded modulation,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), 2013, pp. 1989–1993. F. Steiner, G. Böcherer, and G. Liva, “Protograph-based LDPC code design for bit-metric decoding,” in IEEE Int. Symp. Inf. Theory (ISIT), 2015. A. El Gamal and Y.-H. Kim, Network Information Theory. Cambridge University Press, 2011. S. Achtenberg and D. Raphaeli, “Theoretic shaping bounds for single letter constraints and mismatched decoding,” arXiv, 2013. [Online]. Available: http://arxiv.org/abs/1308.5938v1 F. Gray, “Pulse code communication,” U. S. Patent 2 632 058, 1953. E. Agrell and A. Alvarado, “Optimal alphabets and binary labelings for BICM at low SNR,” IEEE Trans. Inf. Theory, vol. 57, no. 10, pp. 6650–6672, 2011. R. Blahut, “Computation of channel capacity and rate-distortion functions,” IEEE Trans. Inf. Theory, vol. 18, no. 4, pp. 460–473, 1972. S. Arimoto, “An algorithm for computing the capacity of arbitrary discrete memoryless channels,” IEEE Trans. Inf. Theory, vol. 18, no. 1, pp. 14–20, 1972. F. R. Kschischang and S. Pasupathy, “Optimal nonuniform signaling for Gaussian channels,” IEEE Trans. Inf. Theory, vol. 39, no. 3, pp. 913–929, 1993. A. Alvarado, F. Brännström, and E. Agrell, “High SNR bounds for the BICM capacity,” in Proc. IEEE Inf. Theory Workshop (ITW), 2011, pp. 360–364. G. Böcherer, F. Altenbach, A. Alvarado, S. Corroy, and R. Mathar, “An efficient algorithm to calculate BICM capacity,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), 2012. G. Kramer, “Topics in multi-user information theory,” Foundations and Trends in Comm. and Inf. Theory, vol. 4, no. 4–5, pp. 265–444, 2007.