Distortion Bounds for Vector Quantizers with Finite Codebook Size

0 downloads 0 Views 1MB Size Report
Ron Meir, Member, IEEE, and Vitaly Maiorov. Abstract—Upper and lower bounds are presented for the distortion of the optimal N-point vector quantizer applied ...
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

Distortion Bounds for Vector Quantizers with Finite Codebook Size

1621

Thus Q induces a partition of IRk with cells Si = fx : Q(x) = y i g; and the classification of input vectors into codewords is done according to the nearest neighbor rule

Abstract—Upper and lower bounds are presented for the distortion of the optimal -point vector quantizer applied to -dimensional signals. Under certain smoothness conditions on the source distribution, the bounds are shown to hold for each and every value of , the codebook size. These results extend bounds derived in the high-resolution limit, which assumes that the number of code vectors is arbitrarily large. Two approaches to the upper bound are presented. The first, constructive construction, achieves the correct asymptotic rate of convergence as well as the correct dependence on the source density, although leading to an inferior value for the constant. The second construction, based on a random coding argument, is shown to additionally achieve a value of the constant which is much closer to the best known result derived within the asymptotic theory. Lower bound results derived in the correspondence are again shown to possess the correct asymptotic form and yield a constant which is almost indistinguishable from the best value achieved in the asymptotic regime. Finally, application of the results to the problem of source coding yields upper bounds on the distortion rate function for a wide class of processes.

N

k

where the popular l norm is given by

I. INTRODUCTION The problem of determining the exact behavior of the distortion of optimal vector quantizers has attracted much attention over the years, since the pioneering work of Zador [17], [18] and Gersho [6]. Many results have been derived, involving upper as well as lower bounds on the performance, applicable in the asymptotic regime where the codebook size is very large (see, for example, [2]–[4], [8], [14], and [16]). While many intriguing phenomena have been discovered in this limit, it has not been clear how well these results apply to practical situations where the size of the codebook is restricted a priori to some fixed finite number. Moreover, approaches possessing some desirable asymptotic behavior are not always optimal for finite codebooks. The main contribution of this work is the derivation of upper and lower bounds on the performance of vector quantizers which hold for each and every value of N , the size of the codebook. As will be shown, the bounds are tight to within a constant factor and yield the correct asymptotic behavior as the size of the codebook increases. We should note, however, that while some results exist in the literature concerning performance bounds for finite values of the codebook size [19], [9], these bounds relate the performance of scalar quantizers to that of optimal vector quantizers and are not directly concerned with the performance of the optimal vector quantizer per se, which is the target of this work. Let IR denote the real line and let IRk denote the k-dimensional Euclidean space. A k-dimensional N -point vector quantizer is a measurable mapping Q : IRk !CN , where CN = fy 1 ; y 2 ; 1 1 1 ; y N g is a finite collection of vectors in IRk , referred to as the codebook of Q: Manuscript received September 28, 1997; revised November 16, 1998. The work of V. Maiorov was supported in part by the Center for Absorption in Science, Ministry of Immigrant Absorption, State of Israel. R. Meir is with the Department of Electrical Engineering, Technion–Israel Institute of Technology, Haifa 32000, Israel. V. Maiorov is with the Department of Mathematics, Technion–Israel Institute of Technology, Haifa 32000, Israel. Communicated by K. Zeger, Associate Editor at Large. Publisher Item Identifier S 0018-9448(99)04376-X.

k

kx 0 y k

N

Index Terms— Finite codebook, lower bounds, upper bounds, vector quantization.

if kx 0 y i k < kx 0 y j k

Q(x) = y i ;

Ron Meir, Member, IEEE, and Vitaly Maiorov

i=1

jxi 0 yi j

:

Denoting by p(x) the probability density of the k-dimensional random vector X , which we assume throughout to exist, we define the distortion of the optimal N -point quantizer by

Dk;N (r; ; p)

inf Q

IR

kx 0 Q(x)kr ; p(x) dxx

where the infimum is taken over all k-dimensional N -point quantizers. We limit ourselves in the bulk of this work to the case  = 2, corresponding to a Euclidean norm. The case  6= 2 will be discussed briefly at the end of Section III (see Corollary III.2). The main aim of this correspondence is to compute upper and lower bounds on the optimal distortion, which hold for each and every value of k and N . This goal should be contrasted with the large body of work concerned with the so-called high-resolution limit, N ! 1. In the latter case, Bucklew and Wise (see [3, Theorem 2]), following the seminal work of Zador [17], have established the limiting behavior

lim

fDk;N (r; 2; p)N g = Jk;r kpk

(1)

N !1

which holds under certain regularity conditions, and where

kpk =

x p(x) dx

:

For a compact domain, Jk;r is determined by the behavior of the optimal distortion of a uniform signal, and has not been calculated exactly for general k and r; some tabulated values can be found in [7]. The following upper and lower bounds on Jk;r have been given by Bucklew [2] and Yamada et al. [16], respectively:

k

k+r

0

Vk

 Jk;r  0(1 + r=k)

k+1 k

r

k 0 V k+r k (2)

where Vk is the volume of the unit ball in k-dimensional Euclidean space, and is given by

Vk

=

20 12 k : k0 k2

(3)

Note that a slightly better, albeit more complex, lower bound has been given in [5]. We observe that both bounds coincide in the limit k ! 1, yielding the value lim k k!1

0r=2 Jk;r = (2e)0r=2:

Note that similar upper and lower bounds can be shown to hold for the more general k 1 k norm, with slightly modified constants. As pointed out by Bucklew [2], a key assumption in the derivations has been that the density p(x) may be assumed to be constant over small bounded sets. However, since in this work we will be interested in results which hold for finite values of N , the assumption that the quantization cells are small does not hold, and, consequently,

0018–9448/99$10.00  1999 IEEE

1622

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

replacing the density in each cell by a constant value is invalid. In order to circumvent this problem we introduce certain smoothness conditions on the density p x , allowing us to control the variation of the density within each quantization region. As will be shown in Section II, these assumptions reduce to the requirement of the finiteness of certain integrals of the derivative of the density. With these assumptions we are able to derive upper and lower bounds on the distortion, which will be shown to possess the correct asymptotic behavior given in (1). These upper and lower bounds are given in Theorems III.1, IV.1, and V.1. A simple consequence of these results is that there exists a certain finite value of N N0 , which can be calculated if the density is known, such that for N > N0 the distortion is bounded as follows:

()

=

N0

cl kpk

 Dk;N (r; 2; p)  cu kpk

N0 :

(4)

The constants cl and cu , which will be explicitly evaluated in the sequel, depend on the Euclidean dimension k and the particular distortion function used, but not on the codebook size N or the density p. It is then clear that these bounds reduce to the optimal asymptotic results, up to a multiplicative constant. For arbitrary finite values of N the bounds take on a slightly more complex form, involving certain derivative-based semi-norms of the density, as can be seen in Theorems III.1, IV.1, and V.1. As mentioned above, these terms appear as a result of controlling the variation of the density over the quantization regions, which can no longer be assumed to be small. The remainder of this correspondence is organized as follows. Section II is concerned with some basic definitions and notation which will be used throughout the correspondence. In Section III we present an upper bound on the distortion of a particular N -point vector quantizer, leading to a constructive proof. Section IV then proposes a nonconstructive approach, which is shown to achieve faster convergence rates. Section V provides a lower bound, which establishes the tightness of the upper bound. We then conclude in Section VI with a brief discussion. Before moving to the main body of the text we clarify some issues concerning notation. First, all logarithms appearing in the correspondence are natural logarithms. We use the symbol j 1 j as a generic symbol for the size of a set. In the case of a bounded region , we let j j stand for the Lebesgue measure of the region, while for a finite set A we let jAj represent the cardinality of the set. Finally, we comment that the results of this correspondence are established for compactly supported signals. This type of assumption is implicit in much of the work on high-resolution quantization (see, for example, [2]). More generally, one makes some kind of boundedness assumption regarding moments of the form E kX kr , which permit one to neglect the so-called overload region. In this respect see, for example, [3] and [14].





II. PRELIMINARIES We begin by defining a space of “smooth” density functions. As mentioned in Section I this requirement is needed in order to control the variation of the density over a quantization cell, which in contrast to the high-resolution limit case cannot be assumed to be small. Let be a compact domain and let Lp be the space of p 1=p is finite. continuous functions for which kf kL ( ) j f

j Let f 2 Lp be differentiable, and define for any positive integer a semi-norm



( ) =(

( )

kf kL ( ) =

k

@f @x

i=1 i



dxx

)

:

(5)

W 1 ( ) is then given by 1 W ( ) = f : kf kL ( ) + kf k

The Sobolev space

L

( ) < 1 :

(6)

Unfortunately, the Sobolev semi-norm (5) cannot in general be calculated exactly. However, a simple upper bound is much more amenable to calculation. For  use H¨older’s inequality

k

i=1

jai j



1

 k 01

k

i=1

jai j

to obtain the upper bound

k

@f dxx i=1 @xi

kf kL ( )  k10

which holds for any f . As a specific example, consider the multivariate Gaussian probability density function, given by the product of univariate Gaussians, namely,

f (x) =

k

(22 )01=2 exp(0x2i =22 ):

i=1

Substituting this function in the upper bound, and performing some simple integrations, we obtain

kf kL ( )  k01 01=2M1 = 1 exp 0 k2 [(1 0 1= )log(22 ) + (1= )log ] where M is the th moment of the standard univariate Gaussian, namely,

M = (2)01=2

1 01

jxj exp (0x2 =2) dx:

From this result we infer that for  2 > 1=( 01) the Sobolev norm vanishes exponentially fast as k ! 1. As can be observed, the convergence rate increases with larger values of  . This fits with our intuition, whereby larger values of  correspond to smoother functions, characterized by a small value of the Sobolev semi-norm. In the remainder of the correspondence we will use the convention kf kL ( ) kf kL . When a domain other than is used, this will be explicitly indicated as in (5). For this space of functions we need a result which relates the variation of the density p x in the domain to its Sobolev semi-norm defined above. The main motivation for this type of result is the following. As stressed in Section I, the highresolution results are usually derived by assuming the quantization cells to be small, so that the the density p x may be assumed to be constant over each such cell. In contrast, when the quantization cells are no longer small, as is the case for finite N , one needs to take into account the variation of the density within each cell. Lemma II.1 below quantifies this notion, in terms of the Sobolev norm (5). A proof of the lemma is given in the Appendix, using some tools from the theory of functions (see [11]).

2

=

()



()

()

Lemma II.1: Let the density function p x belong to the Sobolev space W 1 , > k, defined over a bounded cubic region . Then

(1)

sup jp(x) 0 p^j  c j1j 0 kpkL 1 where p^ = j1 j 1 p(x) dxx and 1 c  2 log 2 1 0 k : x21

1

(1)

(7)

As may be observed, Lemma II.1 allows one to relate the variation of the density p x over a cell to the its Sobolev norm, defined with respect to that cell.

()

1

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

III. UPPER BOUND BY AXIS-PARALLEL PARTITIONING In this section we present an upper bound on the distortion of a

k-dimensional N -point vector quantizer, for sources obeying appropriate smoothness constraints. As is common practice in such proofs, we proceed by showing that the upper bound holds for a specific quantizer which will be constructed. Although the construction we present is clearly nonoptimal it will be shown to achieve the optimal asymptotic rates (but not constants) when N ! 1. Moreover, the method of proof in this section will be used to establish tighter upper bounds via an improved method discussed in Section IV. In order to simplify the proofs we present them first for the case of the l2 -norm. Corollary III.2 at the end of the section immediately yields the bound for general l -norms. In order to simplify the notation we will use the convention that the symbol kxk without any suffix refers to the l2 -norm of the vector x. Before introducing the main result of this section, however, we present a simple construction,1 which yields a nontrivial upper bound on the distortion of signals supported over the bounded kdimensional cube of side-length a. Moreover, the method of proof of Proposition III.1 below will serve as an introduction to the more elaborate proofs which follow.



Uniform Quantization: k for some integer n. 1. Let nk  N < n 2. Construct a quantizer Q by uniform quantization of each coordinate using n-level uniform quantizers with step-size a=n.

( + 1)

IR

Proposition III.1: Let  k be a k-dimensional cube with axisparallel sides, and assume Q is an N -point vector quantizer obtained by the uniform quantization scheme above. Then the distortion of Q is upper-bounded as follows:

Dk;N (r; 2; p)  j j k N 0 :

(8)

Proof: Denote by 1i each of the nk cubic subcells formed by the uniform partition. For each cell 1i we have kx 0 Q(x)k  12 di for x 2 1i , where di is the length of diagonal of the cell 1i , which is related to the diagonal of ; d, through di = d=n. Using these

1623

on the underlying distribution, which allow us to derive upper and lower bounds on the distortion, which possess the desired asymptotic behavior, and which hold for each value of N . Finally, we comment that the upper bound in Proposition III.1 can be easily reduced by a factor of r . We defer this construction to the proof of Theorem III.1 below. Theorem III.1 presented below is based on a more refined partitioning of the domain than the one presented by the uniform quantizer. In particular, the source support is divided into small cubes and further subdivided into subcubes, where the number of subcubes depends on a certain power of the density. A similar procedure was originally proposed by Zador [17], where the secondlevel partition was assumed to exist, while we provide an explicit construction. In contrast to the uniform quantizer, the new quantizer takes account of the density function of the signal in establishing the quantization regions. The construction is similar in spirit to the one of Bucklew and coworkers in [3] and [12], the latter reference being particularly pertinent to the present study. However, in these papers the derivation is asymptotic in the size of the codebook, which permits the replacement of the actual density p x in each cell i by a constant value pi , due to the smallness of the quantization regions in this limit. Since we are interested in results valid for finite N , we have to take into account the nonvanishing size of the quantization regions. As we show in the proof, this can be achieved through use of methods from the theory of functions, based on Lemma II.1. Moreover, the results of Theorem III.1 will be used in the lower bound proof derived in Section V.

+1



()

Dk;N (r; 2; p) =

i=1

 20r

1

n

i=1

kx 0 Q(x)kr p(x) dxx dir

1

( )

0

where

=



(9)

( + 1)

2

Remark 1: As can be observed, the proof of Proposition III.1 is extremely simple, making no assumptions on the underlying distribution. Moreover, we observe that the correct asymptotic rate of convergence, namely, O N 0 , is achieved. However, this result disregards the details of the underlying distribution, and thus cannot achieve the optimal asymptotic dependence on the density, of the form kpkk=(k+r) , appearing in the results of Zador [18] and Bucklew and Wise [3]. In this section we impose certain regularity constraints

(

(1 + 1=n)r N 0 + c2 kpkL (1 + 1=n)r N 0

Dk;N (r; 2; p)  c1 kpk

c1 = (1 0 )0 2 k =(r + 1) c2 = 0 c 21+ j j k

0

(10)

(r  2)

(11)

2 1 log 2 1 0 k= : If r < 2, the factor 1=(r + 1) in c1 should be eliminated.

where we have used d kj j in the last step. The result then k implies follows from the observation that nk  N < n n01 < =n N 01=k < N 01=k .

(1 + 1 )

2

c 

= 20r p

1

with

p(x) dxx

d rn p(x) dxx n i=1 1 = 20r k j j n0r



Theorem III.1: Let be a bounded cubic region and assume the density p 2 W 1 ; > k. Let N and k be positive integers and set n to be the largest integer for which nk  N= . Then for any <  < the optimal distortion is upper-bounded as follows:

observation we easily obtain

n

1

)

1 We are grateful to an anonymous reviewer for pointing out this simple construction to us.

Remark 2: The explicit form of the bounds in Theorem III.1 may seem somewhat complicated. However, this specific form is essential if comparison is to be made to the high-resolution limit N ! 1, in which case n ! 1 as well. Moreover, carefully balancing the first and second terms in (10), using the choice  o N 01=rk , we see that the second term indeed vanishes in the limit N ! 1 at a faster rate than the first one. The dependence of  on N is crucial for attaining the smallest constants possible in this limit.

= (

)

Remark 3: Recall that from Proposition III.1 we already have an upper bound which decays as N 0 . It is easy to see from H¨older’s inequality that kpk  j j so that it would seem that the more refined result (10) is not much better than the simple result (8). First, we note that from (1) we know that the dependence on the density is in fact the correct asymptotic dependence which cannot, of course, be derived from (8). Moreover, in situations where the density vanishes (or is very small) on a large portion of the domain, the norm kpk is in fact substantially smaller than j j , leading to further improvement with respect to the uniform quantization bound (see Remark 4 below).





1624

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

Proof of Theorem III.1: Consider a uniform partition of

into

m = nk equi-volume cubic cells, f1i gm i=1 . Each cell 1i is further split into mi = nki equi-volume cubic subcells, f1ij gjm=1 . A

1

quantization point is then placed at the center of each subcell ij , and the quantizer formed in this way is denoted by Q. The integers m and fmi gim=1 will be specified in the course of the proof. Let

pi =

( )= ( )

min p(x):

+

Dk;N (r; 2; p) 

=

m i=1 1 m

i=1 1 m

2 1i , the distortion can then be

p^i =

= =

m i=1 1 i=1 j =1 m m

1

kx 0 Q(x)kr jp(x) 0 pi j dxx

=

(13)

1 kx 0 Q(x)kr dxx j 1 ij j 1 1 i=1 j =1

pi dxx:

Note that the term inside the first bracket above is just the distortion for a random vector distributed uniformly in the cell ij . Assume that r  . In order to bound it we use H¨older’s inequality

1

k i=1

zi2

r=2

 kr=201

k i=1

jzi jr ;

r2

1

1

j1ij jr=k =

20r kr=2 m m j1ij jr=k p dxx D1  r + 1 i=1 j =1 1 i m 0r r=2 j1i jr=k p dxx = 2 r +k 1 r 1 i i=1 ni 0r r=2 m 1 = 2 r +k 1 p i dxx r n 1 i i=1 k : k+r

i=1 j =1 m m

1

kx 0 Q(x)kr jp(x) 0 pi j dxx

(dij =2)r

=1 1 =

1 =1 2

r+1 where we have set k let <  < and set

0

=

()

2

(15)

1

1

Ni =

m i=1

r n0 i kpkL (1 )

(16)

0 1 + 1. In order to determine ni we

(1 0 )ma i + mbi m aj

j =1

m bj

(17)

j =1

where d t e denotes the smallest integer larger than t, and

= p dxx 1 i bi = kpkL (1 ) :

ai (14)

where here and throughout the correspondence we use the notation

=

m m

D2  c 210r kr=2 j j m0

kx 0 Q(x)kr dxx = 20r kr=2 (r + 1)01 j1ij jr=k :

Substituting this result and using the relationship j i jr=k =nri we find that

kx 0 Q(x)kr jp(x) 0 pi j dxx

i=1 1

where we have used the inequality kx 0 Q x k  dij = for x 2 ij , the relationship ni j i j1=k =j ij j1=k , and Lemma II.1 in the final step. Since j i j j j=m for i ; ; 1 1 1 ; m, we find

from which we obtain by simple integration that

j1ij j01

m

1i and 1ij ,

jp(x) 0 pi j dxx 1 i=1 j =1 m m j1 jr=k = 20r dir ij r=k jp(x) 0 pi j dxx 1 i=1 j =1 j1i j m m r = 20r dir n0 jp(x) 0 pi j dxx i 1 i=1 j =1 m = 20r kr=2 n0i r j1i j jp(x) 0 pi j dxx 1 i=1 m + 0 +1  c 210r kr=2 j1i j nr kpkL (1 ) i i=1 

kx 0 Q(x)kr pi dxx

2

1 j1i j 1 p(x) dxx:

jp(x) 0 pi j dxx  2c j1i j 0 +1 kpkL (1 ) :

D2 =

kx 0 Q(x)kr pi dxx

m m

kpkL (1 )

Therefore,

We deal now separately with D1 and D2 . We then have

D1 =

0

Let di and dij be the diagonal lengths of the cubes respectively. Proceeding analogously to (14) we have

kx 0 Q(x)kr pi dxx

i=1 1  D1 + D2 :

where

1

kx 0 Q(x)kr p(x) dxx

+

= ( ) 1 1 jp(x) 0 pi j  jp(x) 0 p^i j + jp^i 0 pi j  2c j1i j

(12)

x21

Since p x jp x 0 pi j pi ; x upper-bounded as follows:

In order to bound D2 from above we first make use of Lemma II.1. Since pi p x0 for some x0 2 i , we immediately infer that for any x 2 i

= pi j1i j

(18)

The motivation for this choice arises from an attempt to minimize 0r the terms of the form m i=1 ci ni appearing in (14) and (16), under m the constraint that i=1 mi  N . Keeping in mind that mi nki , we then select ni to obey

=

nki

 Ni < (ni + 1)k :

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

Proceeding in the derivation, based on (14) and using we obtain

1625

= k=(k + r), m  N2 .

Setting

m = nk  N < (n + 1)k

m

a) r m ai ai  2 r i=1 ni i=1 Ni b) r m

2

i=1

2

we find that 1 < ( N2 )01 (n + 1)k , from which we infer that m01 < (1 + 1=n)k ( N2 )01 : The theorem then follows upon substitution in (20) and (24), keeping in mind that k1 0 1 < 1.

ai

(1 0 )m

a a

c)

m

d)

i=1 m

 (1 0 )0r=k 2r m0  (1 0 )0r=k 2r m0 =e) (1 0 )0r=k 2r kpk

a i

1 0 m :

i=1

p(x) dxx (19)

Step a) follows from the fact the choice nki  Ni < (ni + 1)k 01=k . Step b) then makes use of (17), while 1 implies n0 i < 2Ni in c) we have used the definition of . Step d) uses the fact that pi = minx21 p(x), and e) exploits the additivity property of the integral. Combining (14) and (19) we then have

D1  (1 0 )0 (r + 1)01 k kpk

m0 :

(20)

Using a similar argument for D2 and making use of (16) we obtain, using a sequence of inequalities similar to (19), replacing ai by bi and 1 0  by 

D2  c 0r=k 2kr=2j j

m

bi

i=1 0 r=k r= 2

 2 c k j j kpkL m0

m0 0 (21)

where (18) and H¨older’s inequality were used in the second step, and = (r + 1)=k 0 1= + 1. Note that from (21) we see that D2 is upper-bounded by a term proportional to j j , which depends exponentially on the dimension k, since j j = ak , where a is the side-length of . In order to mitigate the effect of this factor we observe that D2 can also be bounded in a slightly different form, using a sequence of inequalities similar to (14). Taking note of the fact that jp(x) 0 pi j  p(x) and using (9) we then have

D2 =

m

i=1

1

kx 0 Q(x)kr jp(x) 0 pi j dxx  20r kr=2 j j m0 (22)

where we have used the fact that ni  1. Combining (21) and (22) we see that D2 may be upper-bounded as follows:

d2 j j

D2  min d1 j j ; + m m

kpkL

j min 1; j j1+ 0 kpk  max(d1 ; d2 ) jm L

(23)

m where d1 = 20r kr=2 and d2 = 2c 0r=k kr=2 . Making use of the simple inequality min (a; b)  au b10u ; 0 < u < 1, and selecting u = 1=k, we then conclude that D2  2c kr=2 0r=k j j + ( 0 ) kpk m0 0 : (24) L

In order to obtain the final result we need to determine m. Using the requirement m i=1 Ni  N , we obtain from (17) the condition

It may be observed from Theorem III.1 that the second term in (10), depending on the Sobolev norm of the density, decays only marginally faster than the first term, due to the smallness of the term 1=k2 for large values of k. However, for very smooth functions, the Sobolev norm kpkL may itself be very small, as in the example presented in Section II for the normal distribution with large variance. Note also that this term depends on the side-length j j1=k rather than the volume itself. In view of future comparison of our results with the asymptotic bounds derived in [2], we note that for N ! 1 we have Corollary III.1: Let the conditions of Theorem III.1 hold. Then for r  2

If

r 1. Substituting in (10) then yields

For

Dk;N (r; 2; p)  c1 r N 0r + 2c2 0 N 0r01 : N > 2cc ( 1 )r+ one then finds that Dk;N (r; 2; p)  2c1 (c1 =2c2 )r r(r++1) :

Using the simple bound (8), we see that

Dk;N (r; 2; p)  N 0r  (c1 =2c2 )r r(r+) which for small  is much larger (by a factor of (1= )r ) than the bound obtained from (10). This example can be extended to cases where p(x) is very small, not necessarily zero, on large parts of the domain, giving rise to a similar conclusion. Before concluding this section we note that one can immediately obtain an upper bound for the distortion in the case of general l norms.

1626

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

Corollary III.2: Let the assumptions of Theorem III.1 hold and assume the distortion measure is given by kx 0 Q x kr for  > . Then

()

()

Dk;N (r; ; p)  k( 0

)

0

0

Dk;N (r; 2; p)

(

)

kx 0 y k : 2

IV. UPPER BOUND THROUGH EFFICIENT PARTITIONING The upper bound calculation in Section III assumed a rectangular partition of space, through axis-parallel cubes. It is well known, of course, that the optimal partition does not in general consist of such regions even if the underlying distribution is uniform, because this type of partition does not fill space efficiently. In this section we show how to use a more elaborate partitioning scheme in order to obtain a lower value for the upper bound derived in Section III. The main drawback of the method is that it is no longer constructive. Before describing our main results we need to introduce some background and basic ideas from the classic work of Rogers on the covering of space [15]. We recall that a class of, not necessarily k covers space if [ k . Let disjoint, sets f i g1 i i i=1 in k k  be a bounded axis parallel cube and let K  be a bounded convex set. Furthermore, denote by  K the convex set obtained from K by rescaling each axis by a factor of  . Let denote the cubic lattice of all points whose coordinates are integral multiples of j j1=k , the side-length of , and let bj ; j ; ;111; be an enumeration of the points of . We use the notation K a ; a 2 k , to represent a translation of K by the vector a. Using these definitions, Rogers derived the following important result, which is presented here in a slightly revised form, applicable to our situation.

1

IR

3



IR

1

3

IR

1

= IR IR (1 + ) (1+ ) =1 2 + 1

Lemma IV.1 (Rogers, [15, Theorem 3.2]): Let be a bounded axis-parallel k-dimensional cube and let K be a k-dimensional convex set such that K  12 j j1=k . Then there exist two L a finite sets of points f i gi=1 and fck gM k=1 , both belonging to , such that it is possible to form a covering of k by the system of translates

diam( )

1 1 IR (i = 1; 2; 1 1 1 ; L; j = 1; 2; 1 1 1) (k = 1; 2; 1 1 1 ; M ; j = 1; 2; 1 1 1):

(1 + )K + ai + bj (1 + )K + ck + bj Moreover, the integers

L and M

are given by

j1j jK j  M  0k j1j e0 jK j L=

where

(26)

0 <   1=k and  > 0 is arbitrary.

IR 1

1

(1 + 2(jK j=Vk j1j) =k )k k jjK1jj translates of the convex set (1 +  )K , where

k = 1 + k log(k log k): (27) 0 Proof: Let 1 be a a k-dimensional cube formed by increasing each side of 1 from a to a +  , where  = 2r(K ) and r(K ) is the radius of K . Consider the lattice 30 formed by translating 10 periodically along the axis directions, with period equal to a +  . Let R be a cover of IRk formed in accordance with Lemma IV.1 using 10 . From the periodicity of the covering and the choice of  we conclude from Lemma IV.1 that j10 j ( + 0k e0 ) jK j

1 0 1 0 = 2( ) 1 = ( + ) = 1 (1 + ) = log = 1 log 1 =k k (1 + 2(jK j=(Vk j1j) ) (1 + k log k log k) jjK1jj

balls suffice to cover , where <   =k and  > is arbitrary. Using jK j Vk rk we conclude that  jK jVk0k 1 1=k . Keeping 0 k in mind that j j a  j j =a , and choosing  k 01 and  =k k, we obtain upon substitution that the number of balls needed to cover can be upper-bounded by

=

1

which is the desired result. Remark 5: Observe that from the condition

2r(K ) = diam(K )  12 j1j =k in Lemma IV.2 we immediately conclude that 2(jK j=Vk j1j) =k  1=2, which upon substitution leads to the simpler, but coarser, bound of (3=2)k k j1j=jK j, which we will use in the sequel. 1

1

We now introduce a quantization procedure based on these results, followed by a proof of an upper bound achieved.

Covering based quantization procedure: 1) Split the domain uniformly into m nk cubic regions f i g. 2) To each cube i assign Ni quantization points, where Ni is given in (17). 3) Construct a cover of each cube i by balls, using the results of Lemma IV.2. 4) Place a quantization vector in each of the balls constructed in step 3).

1

(1 + ) +

+



=

1

1

The final step will be specified in the proof below. Note that the enumeration of the set of quantization points in step 4) above immediately yields a quantizer through the partition induced on by the Voronoi cells.





Theorem IV.1: Let be a bounded cubic region and assume the density p 2 W 1 ; > k. Let N and k be positive integers and set n to be the largest integer for which nk  N= . Then for any <  < , the optimal distortion is upper-bounded as follows:

( )

Lemma IV.1 essentially demonstrates that there exists a system of translates of the convex set  K such that when this system is periodically translated along the lattice axes bj , a cover of k is formed. We refer to this set of L M translates as the basic set. Finally, we observe that the proof of Lemma IV.1 is nonconstructive, as it is based on a random coding type of argument. Now, for our specific application we are interested in the number of translates of a convex set K needed to cover a bounded cubic region . Note that from Lemma IV.1 we cannot conclude that L M such translates suffice. For this purpose we introduce the following lemma, which we specialize to the case where K is a k-dimensional Euclidean ball.

L+M

diam ( )

1

1

where u + equals u for u  and zero otherwise. Proof: The result follows by a simple application of H¨older’s inequality which yields

kx 0 y k  k 0

1

be a bounded cubic region with axis-parallel Lemma IV.2: Let sides and K a k-dimensional ball such that K  12 j j1=k . Then it is possible to cover by at most

0

2

1

(1 + 1=n)r N 0 + c kpkL (1 + 1=n)r N 0

Dk;N (r; 2; p)  c1 kpk 2

0

(28)

where

c1 = (1 0 )0r=k (3=2)r (1+ 1=k log k)r k Vk0 ; c2 = 0r=k 21+(r+1)=k c (3=2)r (1 + 1=k log k)r k Vk0

with c



2 log 2

1

. 10k=

j j

(29)

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

Proof: We follow the procedure outlined above. Initially, split the domain into m = nk equi-volume subcubes 1i , where nk  N2 < (n + 1)k . Consider now a covering of each cube 1i by Ni equi-volume Euclidean balls fBij gjN=1 , where Ni is given in (17). Formally we have

1627

The analysis from this point is very similar to the one presented in the proof of Theorem III.1. In particular, a sequence of inequalities similar to (15) and (16) leads to an equation analogous to (16), in particular

D2  2dk c j j m0

N

Bij  1i : j =1 Now, from the cover fBij gjN=1 we select Ni points serving as codevectors for points x 2 1i . If the center of the ball Bij falls in the domain 1i it is selected as a code-vector; if not, select an arbitrary point in Bij . For each such point xj ; j = 1; 2; 1 1 1 ; Ni ; we then form a Voronoi cell, denoted by 1ij , composed of all the points in 1i closest to xj . It is clear by construction that any point x within the Voronoi cell of xj falls within the ball Bij . Following a similar line of thought to that pursued in Section III, we note that the distortion can be split as in (13) into a term involving a constant density and a term related to the density variation. We then consider the term involving D1 as defined in (13). Since 1ij  Bij , we have

D1 =

 

m N

i=1 j =1 1 m

i=1 m i=1

ri ri

kx 0 Q(xx)kr pi dxx

N

j =1

1

pi dxx

(30)

jKi j = (3=2)k k j1N j . Since jKi j = Vk rik we conclude that ri = (3=2) k j1i j : Ni Vk

Since according to Lemma IV.2 the cover is obtained by balls of radius (1 + )ri we immediately conclude that

i  3 (1 + 1=k log k) k Vk0 j1i j (31) 2 Ni where we set  = 1=(k log k), and where k is given in (27). Set dk (3=2)r ( k =Vk )r=k (1 + 1=k log k)r : (32) Using (30) and (31) we then obtain

D1  dk = dk

m

i=1 m i=1

Ni0r=k j1i jr=k Ni0r=k

1

p dxx 1 i 1= p i dxx

where = k=(k + r). Using an argument similar to (19) and (20) we conclude that

D1  (1 0 )0 dk kpk m0 : A similar equation holds for the term D2 defined in (13), namely, D2 =



m N

kx 0 Q(xx)kr jp(xx) 0 pi j dxx

i=1 j =1 1 m ri jp(xx) 0 pi j dxx: 1 i=1

D2  20 dk c j j kpkL m0

:

Finally, proceeding as in the derivation of (24) we then obtain the bound

D2  20 dk c j j + ( 0 ) kpkL m0 0 : Combining the bounds for D1 and D2 , using the value of dk

from (32), and arguing as in the final part of the proof of Theorem III.1 (following (24)), yields the desired result.

As an immediate consequence of Theorem IV.1, we show that the asymptotic behavior is very similar to that of the upper bound obtained by Bucklew [2]. From [2] we have

lim fDk;N (r; 2; p)N

g

 0(1 + r=k) k +k r

k + 1 r V 0 kpk k k

(Bucklew):

In this limit we obtain from Theorem IV.1 that

where i is the radius of each of the balls used to cover 1i . In order to proceed we need an upper bound on the radii i . Such a bound may be obtained from Lemma IV.2, by taking the convex body K to be a ball of volume Vk rik . In accordance with Lemma 1 j , from which we infer IV.2 (and Remark 5) set Ni = (3=2)k k jjK j that

i=1

Ni0 kpkL (1 ) :

Analogously to (21) we find

N !1

pi dxx

1

m

lim fDk;N (r; 2; p)N

N !1

g (3=2)r (1+1=k log k)r k Vk0 kpk

:

For large values of k; ( k )  1; and we find that our bound is larger by a factor of (3=2)r than the one derived by Bucklew. An interesting open research problem at this point is whether this additional factor can be eliminated by more careful analysis. As a final comment we note that one of the advantages of the approach taken in this correspondence is that upper bounds on the distortion rate function may be derived from it. It is well known that for stationary ergodic sources it is possible, using block encoding, to achieve a performance which is almost as good as the distortion rate function (see [8]), as the block size k increases without bound. Since the distortion obtained by block quantization is clearly an upper bound to the distortion rate function we may use our results in the appropriate limit to obtain a general upper bound for the distortion rate function. The interesting limit to consider in this case is

N

!1

k!1

1 log N = R k

(33)

where R is the rate. It is clear that such a limit cannot be obtained within the high-resolution framework, where the limit N ! 1 is taken for fixed k, which is not allowed to increase with N . In order to obtain this type of result we consider the limit (33) in the context of Theorem IV.1. Let

1 h(p) = 0 klim !1 k

pk (xx) log pk (xx) dxx

be the differential entropy rate of the source, where a subscript has been added to p to identify its dimensionality. Then, Gersho [6] has shown that for stationary ergodic sources, assuming kp1 (x1 )k1=(1+r ) < 1 for some r0 > r

lim kpk (xx)kk=k+r = erh(p) :

k!1

Assume further that kpkL  e k for > k and some value of (recall the discussion and calculation at the beginning of Section II). Now, since this term is raised to the power 1=k in (28) it is clearly

1628

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

finite under this condition. For the case of squared distortion, r we then find

D(R) 

= 2,

lim k1 Dk;N (2; 2; p)

log N R 9 e02(R0h(p))

4

2e

+ 9c 2a

4 e02(R0 (p))

2e

(34)

where a j j1=k is the side-length of the cubic support domain, and the dependence of on p has been explicitly noted. Consider the special case of a uniform distribution as a simple application of the bound. In this case the second term on the right-hand side (r.h.s.) of (34) vanishes, since the Sobolev norm (5) is identically zero. Inverting D R , in order to obtain the rate-distortion function R D , we find in this case

=

( )

( )

1 log 9 = 1 log 9a2 (35) 2 8eD 2 8eD where we have used h = log a for the uniform distribution on an R(D)  h(p) +

Remark 6: It should be noted that a situation may occur where the second term in (37) dominates the first, in which case the bound will be negative; clearly, a useless result. Note that this situation can occur only if the density is strongly oscillating, resulting in a large value for the Sobolev norm kpkL or if the number of code-vectors N is small. However, for sufficiently large (but finite!) values of N the second term becomes negligible, yielding a physically plausible lower bound. Proof of Theorem V.1: For any set function

dist(x; G ) yinf (39) 2G kx 0 y k: Let qG (x) = dist(x; G )r , noting that the quantization error kx 0 Q(x)kr can be expressed as qX (x) where X = fx1 ; x2 ; 1 1 1 ; xN g is the codebook provided by the optimal quantizer Q. Let A be a finite set of points such that X  A, from which we infer that

Dk;N (r; 2; p) =

interval of size a. This should be compared to the well-known upper bound [1]

1 2 (Shannon) R(D)  log (36) 2 D where  2 = Var(X ). Since for a uniform distribution on an interval of size a, one has  2 = a2 =12, it can be seen that the bound (35) is larger than the standard one in (36) by an additive factor of 0:5log (27=e) = 1:07. It would be interesting to see whether an improvement in the upper bound in Theorem IV.1 can lead to tighter bounds for the rate-distortion function, both in the case of the uniform distribution and other source distributions. V. LOWER BOUND

In this section we provide a lower bound for the distortion which holds for any N -point quantizer of a k-dimensional signal obeying the smoothness conditions imposed in Section III. Note that some form of smoothness constraint must be imposed on the source distribution, since otherwise the distortion can easily be shown to vanish. Consider, for example, an atomic distribution with a finite number of atoms. In this case, a simple placement of a quantization point at each atom yields zero distortion. A more elaborate discussion of singular measures is given in [3] in the high-resolution limit. The basic idea of the lower bound to be presented is as follows. Consider an optimal N -point vector quantizer with code vectors X fx1 ; x2 ; 1 1 1 ; xN g. Let A be a finite set of points which include X , i.e., X  A. Clearly, the distortion based on code vectors from A will serve as a lower bound to the distortion based on X . The point then is to construct the set A so that it leads to a distortion function which can easily be bounded from below. As will be seen in the proof, this construction is obtained by a two-stage partitioning of the space, similar in spirit to that used in the proof of the upper bound in Section III.

=



Theorem V.1: Let be a bounded cubic region and assume the density function p 2 W 1 . Then the distortion of any N -point quantizer Q is lower-bounded as follows:

( )

Dk;N (r; 2; p)  c1 kpk where

N0

0 c2 kpkL N 0 0

k 0 V k+r k c2 = c2 k c1 =

with c2 given in (11).

20

(37)

(38)

G  define a distance

=



kx 0 Q(x)kr p(x) dxx qX (x)p(x) dxx 



qA (x)p(x) dxx: (40)

The proof then proceeds by constructing a set A for which a useful lower bound can be derived. Consider a finite collection of N  m nk  N equi-volume disjoint cubic cells f i gm such i=1 2 that [im=1 i . Eachm cell i is further split into mi nki equivolume subcells f ij gj =1 and a point is placed at the center of each such subcell, as in the proof of Theorem III.1. For each cell i , let k , where the set Yi and the integer Ni are nki  Ni < ni defined as

=

= 1

1 =

1

1 ( + 1)

1

Yi = fy i;1 ; y i;2 ; 1 1 1 ; y i;m g;

= =

Ni =

=

mb i m b j =1 j

(41)

and where bi kpkL (1 ) with k+k r . The points in Yi are located at the centers of each of the mi subcubes described above. We then let Y [im=1 Yi and observe that this set is composed of at most m points. The set A is then defined as

2

A=X [Y

(42)

+2

m elements. Now, the optimal distortion and consists of at most N for the set of points A is achieved by partitioning the domain into Voronoi cells surrounding each one of the elements of A. Let X and Y represent the subdomains of obtained by the union of Voronoi cells corresponding to the points of X and Y , respectively. Observe that X differs, of course, from the original partition based on the Voronoi cells of X alone. Clearly X [ Y since the Voronoi cells of A cover the complete domain . We then have











=

qA (x)p(x) dxx = qA (x)p(x) dxx +



 DY + DX :

qA (x)p(x) dxx

(43)

Consider first the term DY , and let

10i = Y \ 1i

10ij = Y \ 1ij correspond, respectively, to the restriction of 1i and 1ij to the region

Y . Similarly to the proof of the upper bound in Theorem III.1, we let pi = max p(x): x21 Since over the domain Y we can replace qA by qY we have m DY = qY (x)p(x) dxx =

i=1 m i=1

1

pi

1

qY (x) dxx 0

m i=1

1

qY (x)jpi 0 p(x)j dxx

(44)

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999



( )= 

()

1

where we have used the fact that pi 0 p x jpi 0 p x j for x 2 i . Now, the first term in (44) can be bounded as follows. Observe that

qY (x) dxx =

1

m j =1 1

where y ij are defined in (41). Now, from [16, eq. (22)] we have a very useful result which helps to lower-bound the expression r 0 1 kx 0 z k dxx by an integral over a ball of volume j ij j centered at z . In the case of the l2 norm considered here we obtain

1

1

:

(45)

We then have m

m

in the ball Bi . In order to define the next level of partitioning, we construct a cube Gi of side-length r, centered at each point x i and containing the ball Bi . Note that by this construction the diagonal p length of Gi is larger than that of the cube i by a factor of k . From this point on, the procedure parallels that described above for the term DY . Specifically, each of the cells Gi is partitioned using a uniform grid of mi points into cubic subcells Gij ; j ; ; 1 1 1 ; mi . However, due to the fact that the side-length of the cell Gipis now larger than the side-length of the cell i by a factor of k, we get an additional factor of kr=2 in the bound for the term involving jpi 0 p x j in (44). The term analogous to (46) is unchanged, as it only depends, through (45), on the volume of the cells Gi restricted to X . Since each of the cells Gi intersects X , we conclude by an argument analogous to that leading to the lower bound for DY that

2

1

kx 0 y ij kr dxx

kx 0 y ij kr dxx  k +k r Vk0 j10ij j1+

1629

m

=1 2

1



()





k 0 DX  V kpk ; N 0 0 c2 kr=2 kpkL ( ) N 0 0 kx 0y ij kr dxx k+r k 1 i=1 i=1 j =1 where an additional factor of kr=2 multiplies the second term. a) k 0 m m  k + r Vk pi j10ij j1+ The proof is then completed by combining the results for DY and i=1 j =1 DX . Focusing initially on the first terms in DY and DX we have m m b)  k +k r Vk0 p dxx 20 p m0 + p N 0 1 i i=1 j =1



m m c)  k +k r Vk0 (2m)0 pi dxx  20 N 0 p + p 1 i=1 j =1



 k +k r Vk0 (2m)0 kpk ;

(46)  20 N 0 p + p



where = 20 N 0 kpk x kpk ; = p(x) dx : where we have used the constraint m  N and the inequality

(au + bu )  2( 0 ) (av + bv ) (u; v > 0); In (46), step a) follows from (45) and b) uses the constancy of pi 0 over 1i while c) makes use of H¨older’s inequality, and the fact that with a = p ; b = p ; u = 1; and v = 1= . Application of the set Y consists of, at most, 2m points. N In order to complete the bound for DY we need to deal with the the same inequality, together with the condition m  2 , yields the pi

1

qY (x) dxx =

pi

second term in (44). In view of the minus sign we need an upper bound on i

1

qY (x)jpi 0 p(x)j dxx:

Since 0i  i , we clearly obtain an upper bound by replacing 0i by i . However, we have already derived an upper bound for this type of term in the proof of Theorem III.1, the only difference being in the slightly different definition of Ni (cf. (17) and (41)). Thus we immediately conclude from (24) that

1

1

1

m

1

qY (x)jpi 0 p(x)j dxx  c2 kpkL m0 0

(47)

1 where c2 = 20 c2 with c2 given in (11). The additional factor of 20 arises from using (41) instead of (17). Combining (46) and (47) i=1

we conclude that

DY

 k +k r 20 Vk0

m0

kpk ; 0 c2 kpkL ( ) m0 0

:

We consider now the term DX in (43). In this case, the integration region consists of the union of Voronoi cells belonging to the quantization points X , considered as a subset of A. Let Bi be a ball of radius r centered at each x i 2 X ; i ; ; 1 1 1 ; N: We select r to be equal to the diagonal length of each of the cells i defined in (41). Keeping in mind that each point xi belongs to some cell j , this choice guarantees that the Voronoi cell of every point xi is contained

=1 2

1

1

result for the second term.

! 1 we obtain lim fDk;N (r; 2; p)N g  k +k r Vk0 20 kpk N !1

Note that in the limit N

:

This is the same result as that obtained by Yamada et al. [16], except that the constant is smaller by a factor of 0 , which becomes insignificant for large k.

2

VI. CONCLUDING REMARKS We have presented in this work an approach to the derivation of upper and lower bounds on the distortion of the optimal kdimensional N -point vector quantizer, applicable for every value of k and N . A major ingredient in the approach is the use of specific smoothness assumptions concerning the source distribution, without which finite bounds are not possible. One advantage of the approach is that the regime where the asymptotic behavior manifests itself can be characterized exactly in terms of the properties of the source distribution. Moreover, we have shown that up to a “small” constant, our bounds are of the same limiting form as those derived within the more standard asymptotic theory of high-resolution quantization. Clearly, there is much room left for improvement, especially in forming a more economical covering of the domain than the one proposed in Section IV, leading to more appropriate quantization regions. There also seems to be wide scope for finding more natural measures of the variation of the density over a quantization region,

1630

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

which will hopefully mitigate the effect of the higher order terms appearing in Theorems III.1 and IV.1.

where

a is the side-length of 1 and kz k1 = max1ik jzi j. Thus w(x; y )k kx 0 y kk2 01

APPENDIX We present below a proof of Lemma II.1 used to establish the upper bound in Section III. First, however, we present a lemma relating a continuously differentiable function f and its gradient through an integral operator. The statement and proof of the lemma, due to Sobolev, can be found in [11, Sec. 10.3].

()

Lemma A.1 (Sobolev): Let f x be a continuously differentiable function defined on a convex domain  k . Then the following identity holds:

1 IR

k

Bi (x; y) @f dyy f (x) = 1 f (y ) dyy 0 (48) k01 j1j 1 i=1 1 kx 0 y k2 @yi where Bi (x; y ); i = 1; 2; 1 1 1 ; k are given by k k Bi (x; y ) = w(x; y ) 0 kx 0 y k2 cos(lx;yy ; ei ) kj1j (i = 1; 2; 1 1 1 ; k):



( )

Here lx;yy stands for the vector y 0 x, w x; y is the l2 distance between x and the boundary of along the direction of lx;yy , and ei is the standard unit vector in the ith direction. For convenience, we restate Lemma II.1 and provide a proof based on Lemma A.1. We remark that a proof of this statement can be found in [11] for the case of the Lq norm where q is finite. Our proof generalizes this result to q 1 and leads to much smaller values for the constants, a fact which is crucial for our application.

1



 kkxx 00yykkk2 ak 1 k a kx 0 y k1 = kx 0 y kk cos( lx;yy ; ei ) 1

k = kx 0ay kk01 cos (lx1;yy ; ei ) : 1

In order to estimate the first integral appearing in (49) we split the cubic integration region into rectangular rings s , where

1

1

1s = fy 2 IRk : a20  kx 0 y k1 < a20 g \ 1 (s = 1; 2; 1 1 1): One can easily see from the definition that j1s j  20(s01) ak . We then have

w(x; y )k cos(lx;yy ; ei kx 0 y kk01 1

)  dyy  ak



=

f

Lemma 2.1: Let

1  IRk be a bounded cubic region and assume

2 W 1 (1); > k. Then sup jf (x) 0 f^j  c j1j 0 kf kL

Keeping in mind that k

f^ = j11 j 1 f (x) dxx and c  log2 2 101k= . Proof: Fix x and let i3 be the direction for which j cos( lx;yy ; ei )j

dyy dyy 1 kx 0 y k1(k01) 1 j1s j  ak (k01) 0 s=1 h2 1  2ak+ 20s( +10) s=1 20 k +  a : (51) 1 0 20( +10)

+1 0  = ( k1 0 1 ) > 0 and  > 1 we have

20 1 0 20( +10)

x21

= 210

where

 210

is maximal. We then have

 210

f (x) 0

Ax

k

1

2

k

@f @y 1 i=1 i



=1

1

 210 ( 1 0 21 )log 2 k x inequality e 0 1  x. Recalling

=1

A

2 log 2

1 1 0 k j1j

0

kf kL

establishing the desired result. (49)

where we have used H¨older’s inequality in the last step with the 1 definition 1 , and keeping in mind that > k  . Using a simple geometric argument based on triangle similarity, it is easy to see that for any x; y 2

1

1 w(x; y)  kx 0 y k2 a kx 0 y k1

1

1 0 20( 0 )

where we have used the that a j j ; 1 0 1 and substituting in (49) we obtain that

dyy

dyy

+ =1

1

1 0 20( 0 )

1 0 20( 0 ) ( 0 ) log 2  210 ( e 0 ) log 2 e 01

j1j 1 f (y ) dyy

Bi (x; y ) @f dyy k x 0 y kk2 01 @yi 1 i=1 k w(x; y )k cos(lx;yy ; ei ) @f dyy  kj11j kx 0 y kk2 01 @yi i=1 1 k lx;yy ; ei )j k @f  kj11j w(x; yk)x j0cos( ykk2 01 1 i=1 @yi w(x; y )k cos(lx;yy ; ei )  dyy  kj11j kx 0 y kk2 01 1



(50)

ACKNOWLEDGMENT The authors are grateful to Neri Merhav, Ram Zamir, Mikhail Solomjak, and Ran Bar-Sella for useful discussions and comments, and to Assaf Zeevi for very helpful comments on the manuscript. Special thanks to two anonymous reviewers for their insightful and constructive comments, which have greatly helped to improve the manuscript. Support from the Ollendorff Center of the Department of Electrical Engineering at the Technion is acknowledged.

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

REFERENCES [1] T. Berger, Rate Distortion Theory: A Mathematical Basis for Data Compression. Englewood Cliffs, NJ: Prentice-Hall, 1971. [2] J. A. Bucklew, “Upper bounds to the asymptotic performance of block quantizers,” IEEE Trans. Inform. Theory, vol. IT-27, pp. 577–581, 1981. [3] J. A. Bucklew and G. L. Wise, “Multidimensional asymptotic quantization theory with r th power distortion measures,” IEEE Trans. Inform. Theory, vol. IT-28, pp. 239–247, 1982. [4] J. H. Conway and N. J. A. Sloane, “Voronoi regions of lattices, second moments of polytopes, and quantization,” IEEE Trans. Inform. Theory, vol. IT-28, pp. 211–226, 1982. [5] J. H. Conway and N. J. A. Sloane, “A lower bound on the average error of vector quantizers,” IEEE Trans. Inform. Theory, vol. IT-31, pp. 106–109, 1985. [6] A. Gersho, “Asymptotically optimal block quantization,” IEEE Trans. Inform. Theory, vol. IT-25, pp. 373–380, 1979. [7] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression. Norwell, MA: Kluwer (Academic), 1992. [8] R. M. Gray, Source Coding Theory. Norwell, MA: Kluwer (Academic), 1990. [9] M. Gutman, “On uniform quantization with various distortion measures,” IEEE Trans. Inform. Theory, vol. IT-33, pp. 169–171, 1987. [10] G. Hardy, J. E. Littlewood, and G. Polya, Inequalities, 2nd ed. Cambridge, U.K.: Cambridge Math. Library, 1952. [11] L. V. Kantorovich and G. P. Akilov, Functional Analysis in Normed Spaces. New York: Pergamon, 1964. [12] F. Kuhlmann and J. A. Bucklew, “Piecewise uniform vector vector quantizers,” IEEE Trans. Inform. Theory, vol. 34, pp. 1259–1263, 1988. [13] T. D. Lookabaugh and R. M. Gray, “High-resolution quantization and the vector quantizer advantage,” IEEE Trans. Inform. Theory, vol. 35, pp. 1020–1033, 1989. [14] S. Na and D. L. Neuhoff, “Bennett’s integral for vector quantizers,” IEEE Trans. Inform. Theory, vol. 41, pp. 886–900, 1995. [15] C. A. Rogers, Packing and Covering. Camabridge, U.K.: Cambridge Univ. Press, 1964. [16] Y. Yamada, S. Tazaki, and R. M. Gray, “Asymptotic performance of block quantizers with a difference distortion measure,” IEEE Trans. Inform. Theory, vol. IT-26, pp. 6–14, 1980. [17] P. Zador, “Development and evaluation of procedures for quantizing multivariate distributions,” Ph.D. dissertation, Stanford Univ., Stanford, CA, 1964. [18] , “Asymptotic quantization error of continuous signals and quantization dimension,” IEEE Trans. Inform. Theory, vol. IT-28, pp. 139–149, 1982. [19] J. Ziv, “On universal quantization,” IEEE Trans. Inform. Theory, vol. IT-31, pp. 344–347, 1985.

1631

Calculation of Shell Frequency Distributions Obtained with Shell-Mapping Schemes Robert F. H. Fischer, Member, IEEE Abstract—In order to calculate the transmit power in shell-mappingbased transmission schemes, the frequencies of the shells are required. In this correspondence, a simple but general method for the calculation of these frequencies is derived. The method has approximately the same complexity as the shell-mapping encoder. As an example, the method is shown in detail for the shell-mapping scheme specified for the international telephone-line V.34 modem standard. Moreover, a very simple approximation is given which is tight for large constellations. Index Terms— Average transmit power, frequencies of signal points, shell mapping.

I. INTRODUCTION

AND

PRELIMINARIES

For efficient data transmission over strictly band-limited channels, a mapping scheme is required which can support a noninteger number of information bits per transmitted symbol. The straightforward approach is to base mapping on a frame of N symbols, i.e., to assign, say K , input bits to a point in the N -dimensional space. Using this approach a rate granularity of 1=N bit per symbol (bit/symbol) is obtained. In some situations it is further desirable to realize shaping gain, i.e., to reduce average transmit power compared to signaling using uniformly distributed symbols. This is achieved by selecting the points within a region more like an N -dimensional sphere than an N -cube [3]. Then, the transmitted symbols exhibit a (discrete) Gaussian-like distribution. Shell mapping, a very efficient mapping algorithm proposed, e.g., in [2], [4], [6]–[9] meets both above mentioned requirements. It was adopted in the international telephone-line modem standard ITU Recommendation V.34 [5]. We assume that the constituent signal constellation contains M 1 2q signal points. The signal set is partitioned into M groups (“shells”), indexed by s = 1; 2; 1 1 1 ; M , each containing 2q points. To each shell a cost C (s) is assigned. Without loss of generality, we assume that the ordering of shells is such that the costs increase monotonically, i.e., C (1)  C (2)  1 1 1  C (M ). Shell mapping operates on a frame of N symbols, and selects one particular shell for each of the N positions. The points within the shells are selected memorylessly by q “uncoded bits.” The task of the shell-mapping encoder is to map a binary K -tuple to an N -tuple of shell indices. For that purpose, shell mapping implicitly sorts the vectors of shell indices according to their total cost: N -tuples of shell indices with lower total cost are given a lower index than N -tuples with larger cost. In order to sort vectors with equal cost, different strategies are possible. For example, if N is a power of two, then the cost of the first N=2 positions can serve as a criterion. By performing a recursion on dimensions, the encoding problem is split into two problems, each with half the number of dimensions. The iteration continues until only N scalar, trivial problems are left. If N is not a power of two, then mapping can be done symbol-bysymbol, but taking the remaining dimensions into account as in [9, Algorithm 1]. Manuscript received April 29, 1998; revised December 3, 1998. The author is with Lehrstuhl f¨ur Nachrichtentechnik II, Universit¨at Erlangen-N¨urnberg, D-91058 Erlangen, Germany (e-mail: [email protected]). Communicated by K. Zeger, Associate Editor at Large. Publisher Item Identifier S 0018-9448(99)04369-2.

0018–9448/99$10.00  1999 IEEE

1632

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

Since the input is a binary K -tuple, in each frame only 2K of the possible combinations of shells with the smallest total cost are used. If the K -tuples are equally likely, then these 2K combinations are used equiprobable. As a consequence, shells with lower costs are used more often than shells with larger costs. In general, the shell frequencies will be different for each of the N positions. In this correspondence the shell frequencies are derived analytically. Let H (s; i); s = 1; 1 1 1 ; M; i = 1; 1 1 1 ; N; denote the number of occurrences of shell s in position i within all 2K combinations. Then, a signal point as; l ; l = 1; 1 1 1 ; 2q ; in shell s is transmitted with probability

0

fa g

Pr

q

= 2

s; l

1 0 1H 2

K

s; i)

(1)

(

in position i within the frame of N symbols. The transmit power is then proportional to the average energy a2 of the signal points, which is

a

2

:=

E fjas; l j

2

g = 20

q

1 20

2

M

K

ja j s; l

l=1 s=1

2 1

N

N

k = n the For a given M -tuple (k1 ; k2 ; 1 1 1 ; kM ) with M s=1 s coefficient ks tells how often shell s occurs in the n-tuples of shell indices. Over all n-tuples of shell indices for which kj elements are equal to j , 1  j  M , shell s occurs

n k k1 ; k2 ; 1 1 1 ; kM s times. Since all permutations of the n elements of an index vector occur, shell s occurs equally often in each of the n positions, i.e., n (ks =n) k1 ; k2 ; 1 1 1 ; kM times. Thus for a fixed position, over all possible combinations of n shells with total cost c, shell s occurs Snc (s)

n ks ; k1 ; k2 ; 1 1 1 ; kM n

:=

H (s; i):

=n

k

01)

(i

i=1

k

1

sM

=c

(2) The goal of this correspondence is the calculation of the histograms H (s; i). In Section II, we calculate partial histograms which give the number of occurrences of shells within all possible combinations of n-tuples that have some fixed total cost. Using this result, we compute H (s; i) for the shell-mapping scheme specified for the international voice-band modem standard ITU Recommendation V.34 [5] (Section III) as an example. Finally, we give an approximation which is tight in almost all cases. This is confirmed by numerical examples for V.34. Section IV offers some conclusions.

(5) times. Snc (s) can be viewed as a partial histogram of shells giving the frequencies of shells in all n-tuples of shells with total cost c. Table I shows a sample of the possible combinations for n = 8. The sum Snc (s) in each cell of Table I is given by (5). Using (4), the sum over one row is M

Snc (s) = gn (c):

(6)

s=1

II. PARTIAL HISTOGRAMS From the definition (3) of the multinomial coefficient, the following is true:

A. QAM Signaling Shell mapping is based on generating functions that give the cost spectrum of shells, i.e., the number gn (c) of n-tuples with a given total cost c. Because energy is proportional to area in two dimensions (and only in two dimensions) with QAM signaling, the simple linear cost function C (s) = s 0 1; s = 1; 1 1 1 ; M; is a good approximation [2], [9]. The generating functions Gn (x) = gn (i)xi are then

M

Gn (x) = (1 + x + x2 + 1 1 1 + xM 01 )n ; n = 1; 1 1 1 ; N: If N is a power of two, then Gn (x) is needed only for n equal to a power of 2. Starting with G1 (x) and using the iteration G2n (x) = (Gn (x))2 , all generating functions can be calculated iteratively by simple convolution (cf. [2]). With the notation for multinomial coefficients [1, p. 106]

n k1 ; k2 ; 1 1 1 ; kM

=

with s=1

gn (c) = k

=n

01)

(s

k

n

= i=1

n

k10 ; 1 1 1 ; kj0 ; 1 1 1 ; kl0 ; ki0 = n;

i=1

Hence, setting l

ks = n; ki 2 f0; 1; 2; 1 1 1g (3)

C (si ), is

n ; k1 ; k2 ; 1 1 1 ; kM

=:

0

c

M 0 1)n

(

=c

where the summation runs over all vectors (k1 ; k2 ; nonnegative integers with the given constraints.

111; k

M)

(4) of

i 0 1)ki = c

i=1

n k1 ; 1 1 1 ; kj 0 1; 1 1 1 ; kl + 1; 1 1 1 ; kM

with

the coefficient gn (c), which is the number of combinations of n shells with total cost c

=

j

(

i=1

M

n! k1 !k2 ! 1 1 1 kM ! M

M

ki = n;

with

i

simply

1 kn

n k1 ; 1 1 1 ; kj ; 1 1 1 ; kl ; 1 1 1 ; kM

=

1 1 1 ; k0

M

M

l + 1

1 kn

l

i 0 1)ki0 = c 0 j + l:

( i=1

j + m, m

0

1kn

(7)

2 ZZ, we have

Snc (s) = Snc+m (s + m)

(8)

i.e., the terms on the diagonals of the above table are identical (the matrix [Snc (s)] is Toeplitz). In view of the above relationship, the definition of Snc (s) can be formally extended to all indices s  c + 1. For s > c + 1 it is convenient to define Snc (s) = 0. Next, let Cn be a given integer. There are

zn (Cn )

C

01

:=

gn (c)

(9)

c=0

combinations of n shells with total cost less than Cn . Among these combinations, in each position the number of occurrences of shell s

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

1633

TABLE I NUMBER OF OCCURRENCES OF SHELL s IN ALL POSSIBLE COMBINATIONS OF 8-TUPLES WITH TOTAL COST c. TRAILING ZEROS ki IN THE POLYNOMIAL COEFFICIENTS ARE OMITTED, E.G., 7; 80; 1 := 7; 0; 1; 08; 0; 0; 0; 0

is (summing up the columns of the above table)

01

C C

Hn

(s) :=

to calculate c

Sn (s)

c

Sn (s)

01 0 +1

c= s s

0 =

=C

C

s

c s C

s

0 +1

0

C Sn

s

0

C

s

Sn

1

0

0

C

00

=

Sn

1

=

0

(s + mM )

s

gn (Cn

mM

0 0

+1

1

  s

M:

In other words, in order to calculate have to be aliased modulo M . Since M

1

M

(s) =

s=1

(s),

gn (Cn s=1 m=0 C

01

=

(10)

0

0 0 s

mM )

s

s

gn (c)

=

0

1 gn (c m=0

0 0 s

mM )

mM )

gn (c

(11)

+ 1)

0

gn (c)

.

For general cost functions (e.g., for one-dimensional constellations) the above derivations do not apply. In this case it is more appropriate to first calculate the number Snc (s) of occurrences of shell s in a given position and all possible n-tuples with total cost c. Again (6) holds, but now the matrix [Snc (s)] (cf. Table I), is no longer Toeplitz. But, following the above arguments, it is easy to see that for a general cost function C (s) the formula c

gn (c)

mM )

0) = z

gn (s s

the coefficients

0 0

0 0

Sn (s) C Hn

C Hn

0

(s )

mM );

s

c

Hn (s)

B. General Cost Functions

=1+M

=1

m=0

C Hn

0

(s ) + 1 1 1

=1 M

m=0 s

s

s

Sn

C

Sn

0

C

(s ) +

M

=

1

0

gn (c m=0

2M

s

m=0 s

gn (c

1

0 (s )

=1

1

=

with the definition

M

0

(s)

(s)

=1

=

c+1

= Hn

=

s

s

, which may be written as

m=0

=1

=

s

0

C Sn

c

Sn (s)

n (Cn )

=0

the histogram (s) comprises zn (Cn ) n-tuples of shell indices of total cost less than Cn . In order to find the number of occurrences of shell s within all possible combinations of n shells with total cost equal to c, we have

=

0

c

Sn

C (s)+C (s+m)

(s + m)

is still valid. From (6) and (12), the partial histograms iteratively by the following algorithm: : Let n = 1. =

:

Let

:

Calculate

c

0

c

c

Sn (s)

=

n

Sn

0;

1

c

Sn (s)

(12)

can be calculated

C (1)

.

C (s)+C (1)

(1);

8

c c

0 0

C (s)

+ C (1)

0

C (s)

+ C (1)


C (1)

0;

1634

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

times and in positions 5 to 8 shell

and c

Sn (s)

: :

=

gn (c)

Increment

. If

n

C C

(s) =

n

0)

c

8

Sn (s s ; C (s

Increment c. If

Finally, calculate Hn

0



c

N

g4 (c)

g1 (C (1)); c=0

)>C (1)

8

s; C (s)

 1 n

=

C (M )

C (1):

go to 3.

go to 2.

c

01

Sn (s);

1

c=s

III. FREQUENCIES

  s

OF

1

M;

  n

=

R4; 2

1

g4 (C4; 1 )

0

SHELLS

=

R8

0

z8 (C8 )

0

g4 (c) c=0

1

C

01

g4 (c) c=0

1

g4 (C8

0

01

g4 (C8 c=0

1

c=0

(1) is defined as 0.

0 )1 c



R4; 1 < g4 (C4; 1 ):

Partial Histogram: The term R4; 2 1 g4 (C4; 1 ) contributes R4; 2 1 C (s) to the number of occurrences in positions 1 to 4. From now on, in positions 5 to 8 all partial histograms will be multiplied by g4 (C4; 1 ). 5.1) Calculate Costs C2; 1 , C2; 2 of First and Second Quarter: The largest integer C2; 1 is determined, such that R2; a

=

0

R4; 1

01

C

g4 (C8

0

c)

c)

1

g2 (c) c=0

g2 (C4; 1

0

c)

is nonnegative. C2; 1 is the total cost of the first two ring indices and C2; 2 := C4; 1 0 C2; 1 is the total cost of the ring indices 3 and 4: Partial Histogram: The term

01

C

1

g2 (C4; 1

0

c)

contributes differently to positions 1, 2 and 3, 4, respectively. In positions 1 and 2 shell s occurs

01

C

g2 (C4; 1 c=0

0 )1 c

times and in positions 3 and 4 shell

01

C

g2 (c) c=0

1

c

S2 (s)

occurs

s

0 ) (s)

(C

c

S2

times. 5.2) Calculate Costs C2; 3 , C2; 4 of Third and Fourth Quarter: The largest integer C2; 3 is determined, such that R2; b

=

0

R4; 2

01

C

1

g2 (c) c=0

g2 (C4; 2

0

c)

is nonnegative. C2; 3 is the total cost of the ring indices 5 and 6 and C2; 4 := C4; 2 0 C2; 3 is the total cost of the ring indices 7 and 8. Partial Histogram: The term

01

C

g2 (c) c=0

1

g2 (C4; 2

0

c)

contributes differently to positions 5, 6 and 7, 8, respectively. In positions 5 and 6 shell s occurs

contributes differently to positions 1 to 4 and 5 to 8, respectively. In positions 1 to 4 shell s occurs C

c

0

R4; 2 ;

g2 (c)

is nonnegative. C4; 1 is the total cost of the first half of the ring indices and C4; 2 := C8 0 C4; 1 is the total cost of the second half of the ring indices. Partial Histogram: The term

01



c=0

Fig. 1 sketches the sorting of all 2K 8-tuples of shells and the decomposition according to the V.34 shell-mapping encoder. The corresponding assignment of partial histograms to each step of the encoding procedure is sketched in Fig. 2. For the calculation of the frequencies of the shells, the following steps, identical to shell-mapping encoding in V.34, are performed, 1) Initializing: The encoder input is set to R8 = 2K 0 1, i.e., all K shell-mapping bits are set to one. 2) Calculate Total Cost C8 : The largest integer C8 is determined for which z8 (C8 )  R8 . C8 is the total cost of the 8-tuple associated with index R8 , and z8 (C8 ) is the number of 8tuples of shells with total cost less than C8 . Partial Histogram: Here, for all positions the number of occurrences of shell s is given by H8C (s). 3) Calculate Costs C4; 1 , C4; 2 of First and Second Half: The largest integer C4; 1 is determined, such that1 R4

0 ) (s)

S4

A. V.34 Shell Mapper

01

(C

S4

+ R4; 1 ;

N:

The frequencies of the shells can be calculated using the above histograms. As an example we consider the shell-mapping algorithm used in ITU Recommendation V.34 [5], which has a frame size N = 8. The methods presented in this section apply to all kinds of shell mapping schemes using all types of cost functions. The main idea in calculating the frequencies of shells is to run the shell mapping encoder with the maximum input R0 = 2K 0 1 which yields specific intermediate results and the final shell indices s1 to sN , with 1  si  M . Then, for each step in the encoding procedure a partial histogram based on the quantities Snc (s) can be evaluated. Summing up these partial histograms gives the final histograms H (s; i).

C

1

occurs

s

times. 4) Calculate Index R4; 1 , R4; 2 of First and Second Half: The integers R4; 1 and R4; 2 are determined, such that R4

01

01

C

g4 (C4; 1 )

1

01

C

g2 (C4; 2 c=0

0 )1 c

times and in positions 7 and 8 shell c

S4 (s) g4 (C4; 1 )

times.

1

C

01

g2 (c) c=0

1

c

S2 (s)

s

occurs

C

0

S2

c

(s)

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

Fig. 1. Explanation of the sorting and decomposition of all 2K 8-tuples of shells (not to scale).

1635

1636

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

Fig. 2. Sorting of all 2K 8-tuples of shells and corresponding partial histograms (not to scale). The sum of column

i

is

H (s; i)

.

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

Fig. 3.

H (s; i)

as a function of

s

Fig. 4. Havg (s) as a function of distribution HM-B (s).

6.1) Calculate Index ter: The integers R2; a

=

R2; 1 R2; 1

,

and i.

s

.

M

R2; 2

and

M

= 9,

= 9,

K

K

= 18. Dashed line: Approximation

of First and Second Quarare determined, such that

+ R2; 1 ;

0

 R2; 2 ;

 R2; 1 < g2 (C2; 1 ):

Partial Histogram: The term R2; 2 1 g2 (C2; 1 ) contributes C R2; 2 1 S2 (s) to the number of occurrences in positions 1 and 2. From now on, in positions 3 and 4 all partial histograms will be multiplied by g2 (C2; 1 ).

.

Happ (s)

. Dash-dotted lines: Maxwell–Boltzmann

6.2) Calculate Costs R2; 3 , R2; 4 of Third and Fourth Quarter: The integers R2; 3 and R2; 4 are determined, such that R2; b

0

Happ (s)

= 12; 16; 20; 24. Dashed lines: Approximation

R2; 2

R2; 2 1 g2 (C2; 1 )

1637

=

R2; 4 1 g2 (C2; 3 )

0

+ R2; 3 ;  R2; 4 ;

0

 R2; 3 < g2 (C2; 3 ):

Partial Histogram: The term R2; 4 1 g2 (C2; 3 ) contributes C g4 (C4; 1 ) 1 R2; 4 1 S2 (s) to the number of occurrences in positions 5 and 6. From now on, in positions 7 and 8 all partial histograms will be multiplied additionally by g2 (C2; 3 ).

1638

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

7) Calculate Final Ring Indices s1 ; 1 1 1 ; s8 : The final ring indices si and si+1 are calculated from the index I = R2; (i+1)=2 and the cost T = C2; (i+1)=2 , i = 1; 3; 5; 7, according to

T < M: T

 M:

si = I + 1 si+1 = T 0 I + 1 si = T 0 M + I + 2 si+1 = M 0 I:

TABLE II AVERAGE ENERGY a2 OF THE SIGNAL POINTS IN V.34. a2 : TRUE 2 2 AVERAGE ENERGY, a; app : APPROXIMATE ENERGY, a; M-B ENERGY APPLYING THE MAXWELL–BOLTZMANN DISTRIBUTION. MAPPING PARAMETER K , M , AND q ACCORDING TO [5, TABLE 10, EXPANDED]

Partial Histogram: In positions 1 and 2 still R2; 1 + 1 pairs of shells are lacking. Because of the specific sorting in shell mapping, and because the last tuple (belonging to the input K 2 0 1) is known to be [s1 ; s2 ], in position 1 shells s1 0 R2; 1 through s1 occur once. In position 2 shells s2 through s2 +R2; 1 occur once. This completes the derivations for the first two positions. For positions 3 and 4, R2; 2 1 g2 (C2; 1 ) + R2; 1 + 1 pairs of shells are lacking. In position 3 shells s3 0 R2; 2 through s3 0 1 occur g2 (C2; 1 ) times and shell s3 R2; 1 + 1 times. In position 4, shells s4 + 1 through s4 + R2; 2 occur g2 (C2; 1 ) times and shell s4 R2; 1 + 1 times. Similarly to positions 3 and 4, the remaining 2-tuples in positions 5, 6 and 7, 8 can be determined. R2; 1 has to be replaced by R4; 1 or R2; 3 1 g4 (C4; 1 ) + R4; 1 , R2; 2 by R2; 3 or R2; 4 , C2; 2 = C4; 1 0 C2; 1 by C2; 3 or C2; 4 = C8 0 C4; 1 0 C2; 3 , and g2 (C2; 1 ) by g4 (C4; 1 ) or g4 (C4; 1 ) 1 g2 (C2; 3 ), respectively. B. Approximation In some applications, an approximation to the frequencies of shells is sufficient. In particular, the dependency on the position can often be ignored. Using the above derivations, an approximation can be calculated very easily. In shell mapping usually K  1 holds, e.g., in V.34, K can be as large as 31 [5]. As a consequence, the total number K 2 of combinations of shells is well approximated by zN (C8 ), where the integer C8 , C8  1, is chosen such that jzN (C8 ) 0 2K j is minimized (here, zN (C8 )  2K is admitted). The remaining or surplus combinations are neglected. Hence, the frequencies are simply C (s) of all N -tuples with total cost proportional to the histogram HN less than C8 . From (1) and (10), the approximate frequency of signal points as; l , l = 1; 1 1 1 ; 2q , in shell s, is thus simply given by 0q 2 Happ (s) := Pr fas; l g

 2q zN1(C )

1

8

m=0 independently of the position, where

C8 =

argmin

c=1; 2; 3; 111

gN (C8 0 s 0 mM )

(13)

zN (c) 0 2K :

If zN (C8 ) = 2K , then approximation (13) becomes exact. Another approximation can be given from a different point of view. For large M and K we expect the shell frequency distribution H (s; i) to approach the shell frequency distribution that minimizes the average cost (energy) for a given entropy (rate). This distribution, sometimes called a Maxwell–Boltzmann distribution [7], is

HM-B (s)

:=

0C (s) ; Prfsg = f () 1 e

where

f () =

s

e0C (s)

01

0

(14)

normalizes the distribution. The parameter  is chosen so that the entropy M 0 HM-B (s) log2 (HM-B (s)) s=1 of the distribution is equal to the desired rate. C. Numerical Examples In order to illustrate the results, examples valid for the V.34 shell-mapping scheme are given in Figs. 3, 4, and Table II. In Fig. 3, H (s; i) is plotted for M = 9 and K = 18. For comparison, the approximation Happ (s) (dashed line) is also given. As can be seen, the histograms differ slightly over the position i within the mapping frame of size N = 8. Due the specific sorting, for positions i = 1; 1 1 1 ; 4 shells with lower index occur slightly more often than for positions i = 5; 1 1 1 ; 8. The opposite is true for shells with larger index. The approximation Happ (s) is very close to

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

the average frequency distribution

Havg (s)

:= (1=N )

N

i=1

H (s; i):

The behavior of the distributions for different values of K is discussed in Fig. 4. For M = 9 and K = 12; 16; 20; 24 the average frequency distribution Havg (s) (solid lines) and the approximation Happ (s) (dashed lines) are compared to the Maxwell–Boltzmann distribution HM-B (s) (dash-dotted lines). Here, the parameter  is chosen so that the entropy is equal to K=8. Even for low K the approximation Happ (s) is very close to the true average frequency distribution Havg (s). The approximation improves as K increases. Unfortunately, the Maxwell–Boltzmann distribution HM-B (s) does not provide a good estimate of Happ (s). Shells with low index occur less often than expected from the optimal entropy–power tradeoff. Finally, in Table II the average energy a2 of the signal points in V.34 are summarized. For a symbol rate of 3200 Hz, the true average 2 energy a2 (cf. (2)), the approximate energy a; app based on Happ (s), 2 and the energy a; derived from the Maxwell–Boltzmann distriM-B bution are given for all possible data rates and associated mapping parameters K , M , and q [5, Table 10, expanded]. The underlying signal constellation is specified in [5, Fig. 5]. Again, the exact calculation and the approximation are very close. Obviously, the energies derived from the Maxwell–Boltzmann distribution underestimate the actual energies as they are lower bounds. The approximation (13) provides much better results.

IV. CONCLUSIONS In this correspondence, a simple but general method for the calculation of the frequencies of the shells in shell-mapping schemes was derived. As an example, the method was shown in detail for the shell-mapping scheme specified for the international telephone-line modem standard ITU Recommendation V.34. The method starts with partial histograms that give the number of occurrences of shells within all possible combinations of n-tuples of shells with some fixed total cost. These histograms can be calculated easily using the generating functions that are needed in the encoder in any case. Then, the shellmapping encoder is run with a specific input, namely, the maximum K -tuple. To each step of the encoding procedure a partial histogram can be assigned. Summing up these parts yields the final histograms. Thus the calculation has approximately the same complexity as the mapping encoder itself. With the knowledge of the frequencies of shells, the exact average transmit power can be calculated. Numerical examples are given for V.34. ACKNOWLEDGMENT The author wishes to acknowledge F. D. Neeser for valuable discussions and is indebted to the anonymous reviewers for their comments which improved the correspondence.

1639

[3] G. D. Forney and L.-F. Wei, “Multidimensional constellations—Part I: Introduction, figures of merit, and generalized cross constellations,” IEEE J. Select. Areas Commun., vol. 7, pp. 877–892, 1989. [4] P. Fortier, A. Ruiz, and J. M. Cioffi, “Multidimensional signal sets through the shell construction for parallel channels,” IEEE Trans. Commun., vol. 40, pp. 500–512, Mar. 1992. [5] International Telecommunication Union (ITU), Std. V.34, “A modem operating at data signalling rates of up to 28800 bit/s for use on the general switched telephone network and on leased point-to-point 2-wire telephone-type circuits,” Sept. 1994. [6] A. K. Khandani and P. Kabal, “Shaping multidimensional signal spaces—Part I: Optimum shaping, shell mapping; Part II: Shelladdressed constellations,” IEEE Trans. Inform. Theory, vol. 39, pp. 1799–1819, Nov. 1993. [7] F. R. Kschischang and S. Pasupathy, “Optimal nonuniform signaling for Gaussian channels,” IEEE Trans. Inform. Theory, vol. 39, pp. 913–929, May 1993. [8] G. R. Lang and F. M. Longstaff, “A Leech lattice modem,” IEEE J. Select. Areas Commun., vol. 7, pp. 986–973, Aug. 1989. [9] R. Laroia, N. Farvardin, and S. A. Tretter, “On optimal shaping of multidimensional constellations,” IEEE Trans. Inform. Theory, vol. 40, pp. 1044–1056, July 1994.

A Universal Lattice Code Decoder for Fading Channels Emanuele Viterbo, Member, IEEE, and Joseph Boutros, Member, IEEE

Abstract—We present a maximum-likelihood decoding algorithm for an arbitrary lattice code when used over an independent fading channel with perfect channel state information at the receiver. The decoder is based on a bounded distance search among the lattice points falling inside a sphere centered at the receved point. By judicious choice of the decoding radius we show that this decoder can be practically used to decode lattice codes of dimension up to 32 in a fading environement. Index Terms— Maximum-likelihood decoding, modulation, lattices, wireless channel.

I. INTRODUCTION Lattice codes are used in digital transmission as high-rate signal constellations. They are obtained by carving a finite number of points from an n-dimensional lattice in the Euclidean space R n . For the basic notations in lattice theory the reader can refer to [1]. Maximumlikelihood (ML) decoding of a lattice code used over an additive white Gaussian noise (AWGN) channel is equivalent to finding the closest lattice point to the received point. Many very efficient algorithms are now available for ML decoding some well-known root lattices [1]. Several Leech lattice decoders have been proposed with an ever-improving efficiency; a review of these decoders can be found in [2]. The above algorithms are strictly dependent on the special structure of the lattice being decoded (e.g., its being a binary lattice). Other algorithms for general nearest neighbor encoding in vector quantization are valid for any unstructured codebook. They do not take full advantage of the lattice structure which is useful for large

REFERENCES [1] I. N. Bronstein and K. A. Semendjajew, Taschenbuch der Mathematik. Th¨un, Frankfurt/Main, Germany: Verlag Harri Deutsch, 1987, in German. [2] M. V. Eyubo˘glu, G. D. Forney, P. Dong, and G. Long, “Advanced modulation techniques for V.fast,” Europ. Trans. Telecommun., vol. 4, pp. 243–256, May/June 1993.

Manuscript received April 1, 1996; revised January 2, 1999. E. Viterbo is with AT&T Shannon Laboratories, Florham Park, NJ 07932 USA, on leave from Politecnico di Torino, I-10129 Torino, Italy. J. Boutros is with Ecole Nationale Sup´erieure des T´el´ecommunications, 75634 Paris, France. Communicated by N. Seshadri, Associate Editor for Coding Techniques. Publisher Item Identifier S 0018-9448(99)04380-1.

0018–9448/99$10.00  1999 IEEE

1640

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

that is, we search for the shortest vector w in the translated lattice r 0 3 in the n-dimensional Euclidean space R n . We write x = u M with u 2 Z n ; r =  M with  = (1 ; 1 1 1 ; n ) 2 Rn , and w =  M with  = (1 ; 1 1 1 ; n ) 2 Rn . Note that  and  are real vectors. Then we have w = n v, i=1 i i where i = i 0 ui ; i = 1; 1 1 1 ; n define the translated coordinate axes in the space of the integer component vectors u of the cubic lattice Z n . The sphere of square radius C and centered at the received point is transformed into an ellipsoid centered at the origin of the new coordinate system defined by 

Fig. 1. Geometrical representation of the sphere decoding algorithm.

bit-rate applications [3]. As we will see in the following, when dealing with lattice codes for the fading channel we are faced with the problem of decoding a totally arbitrary lattice given its generator matrix. Recent work on multidimensional modulation schemes for the fading channel show how to construct lattice codes well adapted for such a channel [4]. These lattice codes are effective because they present a high modulation diversity L, i.e., any two code vectors always differ in at least L coordinates. In the case of independent fading channels, with perfect channel state information (CSI) given to the receiver, ML decoding requires the minimization of the following metric:

m(x j r ; ) = where r = 3 x + n is n = (n1 ; n2 ; 1 1 1 ; nn ) has

n

=1

jr 0 x j2 i

i

i

T

(3)

=1

j

=i+1

rij j

2

 C: (4)

Substituting qii = = 1; 1 1 1 ; n and qij = rij =rii i = 1; 1 1 1 n; j = i + 1; 1 1 1 ; n; we can write

rii2 for i n

Q( ) =

=1

n

qii i + j

=i+1

2

qij j

 C:

for

(5)

Starting from n and working backwards, we find the equations of the border of the ellipsoid. The corresponding ranges for the integer components un and un01 are

0 qC +   u  n

C + qnn n

n

nn

2 0 Cq 001q 01 +  01 + q 01  2  u 01  Cq 001q 01 +  01 + q 01  nn n

n

n

;n

n

;n n

nn n

n

n

n

;n

n

;n n

where dxe is the smallest integer greater than x and bxc is the greatest integer smaller than x. For the ith integer component we have

0 q1 C 0 ii

n

=i+1

l

j

ii

n

=i+1

i

j

=l+1

=i+1

2

qlj j

n

+ + i

j n

qll l +

l

n

+ +

n

qll l +

1 C0 q

u 

II. THE SPHERE-DECODER ALGORITHM

(2)

gij i j  C:

n

rii i +

i

i

w r

n

Q( ) =  RT R T = kR T k2 =

i

We consider first the Gaussian channel case so that we can assume i = 1; i = 1; 1 1 1 ; n. To the authors knowledge the following algorithm was first presented in [6] and further analyzed in [7] and [8]. We report here a simple derivation of the algorithm which can then be easily implemented using the flow chart of Fig. 2. The lattice decoding algorithm searches through the p points of the lattice 3 which are found inside a sphere of given radius C centered at the received point, as shown in Fig. 1. This guarantees that only the lattice points within the square distance C from the received point are considered in the metric minimization. In the following, it is useful to think of the lattice 3 as the result of a linear transformation, defined by the matrix M : R n ! R n , when applied to the cubic lattice Z n . The problem to solve is the following:

=1 j =1

Cholesky’s factorization of the Gram matrix G = MM T yields G = RT R, where R is an upper triangular matrix. Then

(1)

the received vector. The noise vector real, Gaussian distributed independent random variable components, with zero mean and N0 variance. The random independent fading coefficients = ( 1 ; 2 ; 1 1 1 ; n ) have unit second moment and 3 represents the component-wise product. x = (x1 ; x2 ; 1 1 1 ; xn ) is one of the transmitted lattice code points. The lattice points can be written as the set fx = uM g, where M is the lattice generator matrix corresponding to the basis fv 1 ; v 2 ; 1 1 1 ; v n g and u = (u1 ; 1 1 1 ; un ) is the integer component vector to which the information bits are easily mapped. Signal demodulation is assumed to be coherent, so that the fading coefficients can be modeled, after phase elimination, as real random variables with a Rayleigh distribution. In practice, a component interleaver is needed to obtain the desired independence of the fading coefficients i . The algorithm proposed in this correspondence enables to find the closest point of the lattice constellation in terms of metric (1) and practically solves the docoding problem at least for dimensions up to 32.

x

T

i

i

min 23 kr 0 xk = min 2 03kw k

T

n

n

kw k2 = Q( ) =  MM  =  G =

j

=l+1

qlj j

=i+1

qij j

2

qij j :

(6)

The search algorithm proceeds very much like a mixed-radix counter on the digits ui , with the addition that the bounds change whenever there is a carry operation from one digit to the next. In practice, the bounds can be updated recursively by using the following equations:

Si = Si (i+1 ; 1 1 1 ; n ) = i +

n

=i+1

qil l

l

Ti01 = Ti01 (i ; 1 1 1 ; n ) = C 0

= T 0 q (S 0 u )2 : i

ii

i

i

n

=i

l

n

qll l + j

=l+1

qlj j

2

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

1641

Fig. 2. Flowchart of the lattice decoding algorithm with fading. The function Q-Chol computes the qij terms of (5).

When a vector inside the sphere is found, its square distance from the center (the received point) is given by

d^2 = C 0 T1 + q11 (S1 0 u1 )2 :

This value is compared to the minimum square distance d2 (initially set equal to C ) found so far in the search. If it is smaller then we have a new candidate closest point and the search continues like this until all the vectors inside the sphere are tested. The advantage of this method is that we never test vectors with a norm greater than the given radius. Every tested vector requires the computation of its norm, which entails n multiplications and n 0 1 additions. The increase in the number of operations needed to update the bounds (6) is largely compensated for by the enormous reduction in the number of vectors tested especially when the dimension increases. In order topbe sure to always find a lattice point inside the sphere we must select C equal to the covering radius of the lattice. Otherwise, we do bounded distance decoding and the decoder can signal an erasure whenever no point is found inside the sphere. A judicious choice of C can greatly speed up the decoder. In practice, the choice of C can be adjusted according to the noise variance N0 so that the probability of a decoding failure is negligible. If a decoding failure is detected, the operation can either be repeated with a greater radius or an erasure can be declared. The kernel of the universal decoder p (the enumeration of lattice points inside a sphere of radius C ) requires the greatest number of operations. The complexity is obviously independent from the constellation size, i.e., the number of operations does not depend on the spectral efficiency of the signal constellation. The complexity presented in [7] shows that if d01 is a lower bound for the eigenvalues of the Gram matrix G, then the number of arithmetical operations is

O n2 2 1 + n 0 1 4dC

dC

4

:

(7)

For a fixed radius and a given lattice (which fixes d), the complexity of the decoding algorithm is polynomial. We would like to notice that this does not mean that the general lattice decoding problem is not NP-hard. In fact, it is possible to construct a sequence of lattices of increasing dimension with an increasing value of the exponent d. When we deal with a lattice constellation, we must consider the edge effects. During the search in the sphere we discard the points which do not belong to the lattice code; if no code vector is found we declare an erasure. The complexity of this additional test depends on the shape of the constellation. For cubic-shaped constellations it only entails checking that the vector components lay within a given range. For a spherically shaped signal set it is sufficient to compute the length of the code vector found in the search sphere in order to check if it is within the outermost shell of the constellation. III. THE SPHERE DECODER

WITH

FADING

For ML decoding with perfect CSI at the receiver, the problem is to minimize metric (1). Let M be the generator matrix of the lattice 3 and let us consider the lattice 3c with generator matrix

Mc = M diag ( 1 ; . . . ; n ) We can imagine this new lattice 3c in a space where each component has been compressed or enlarged by a factor i . A point of 3c can (c) (c) be written as x (c) = (x1 ; 1 1 1 xn ) = ( 1 x1 ; 1 1 1 n xn ). The metric to minimize is then

m(x j r ; ) =

n

i=1

ri 0 x(ic) 2 :

This means that we can simply apply the lattice decoding algorithm to the lattice 3c , when the received point is r . The decoded point x^(c) 2 3c has the same integer components (^u1 ; 1 1 1 ; u^n ) as x^ 2 3. The additional complexity required by this decoding algorithm comes from the fact that for each received point we have a different

1642

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

Fig. 3. Performance of the rotated lattice

Z 24;12 .

compressed lattice 3c . So we need to compute a new Cholesky factorization of the Gram matrix for each 3c , which requires O(n3 =3) operations. We also need Mc01 = diag (1= 1 ; 1 1 1 ; 1= n )M 01 to find the i ’s, but this only requires a vector-matrix multiplication since M 01 is precomputed. The complete flowchart of the algorithm is given in Fig. 2. The choice of C in this case is more critical. In fact whenever we are in the presence of deep fades then many points fall inside the search sphere and the decoding can be very slow. This is also evident from the fact that the Gram matrix of 3c may have a very small eigenvalue which gives a large exponent d in (1). This problem may be partially overcome by adapting C according to the values of the fading coefficients i . Fig. 3 shows the performance of the rotated lattice constellation Z 24;12 on the Rayleigh channel with a spectral efficiency of 2 bits/dimension. This lattice is a rotated version of the cubic lattice in dimension 24 with a diversity order equal to 12 given in [5]. For all rotated cubic lattices in [5], we can set the search radius C = 1 and thus the enumeration complexity increases as O(n6 ) if we do not take into account the fading. Fig. 3 compares the performance of the Z 24;12 constellation to the 16-QAM on a Gaussian channel which has the same spectral efficiency. We observe that such a high modulation diversity can bring the bit error rate within 2 dB from the Gaussian channel’s curve. To show the effectiveness of rotated constellations with respect to other TCM schemes especially designed for the fading chennel, we have also plotted the bit-error rate of the optimal 64-state TCM over an 8-PAM signal set. We recall that the asymptotic slope of the error curve reflects the diversity order of the coding scheme. Then, we observe that the diversity of the TCM scheme is much lower than the one of the rotated constellation.

IV. CONCLUSION Decoding arbitrary signal constellations in a fading environment can be a very complex task. When the signal set has no structure it is only possible to perform an exhaustive search through all

the constellation points. Some signal constellations, which can be efficiently decoded when used over the Gaussian channel, become hard to decode when used over the fading channel since their structure is destroyed. Fortunately, for lattice constellations this is not the case since the faded constellation still preserves a lattice structure and only a small additional complexity is required. The algorithm we presented was successfully run to simulate systems using lattice constellations of dimensions up to 32 which seem to be sufficent to approach the performance of the Gaussian channel when dealing with a Rayleigh fading one. REFERENCES [1] J. H. Conway and N. J. Sloane, Sphere Packings, Lattices and Groups, 2nd ed. New York: Springer-Verlag, 1993. [2] G. D. Forney Jr., “A bounded distance decoding algorithm for the Leech lattice with generalizations,” IEEE Trans. Inform. Theory, vol. 35, pp. 906–909, July 1989. [3] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression. Boston, MA: Kluwer, 1992. [4] J. Boutros, E. Viterbo, C. Rastello, and J. C. Belfiore, “Good lattice constellations for both Rayleigh fading and Gaussian channels,” IEEE Trans. Inform. Theory, vol. 42, pp. 502–518, Mar. 1996. [5] J. Boutros and E. Viterbo, “Rotated multidimensional QAM constellations for Rayleigh fading channels,” in Proc. 1996 IEEE Information Theory Workshop (Haifa, Israel, June 9–13, 1996), p. 23. [6] M. Pohst, “On the computation of lattice vectors of minimal length, successive minima and reduced basis with applications,” ACM SIGSAM Bull., vol. 15, pp. 37–44, 1981. [7] U. Fincke and M. Pohst, “Improved methods for calculating vectors of short length in a lattice, including a complexity analysis,” Math. Comput., vol. 44, pp. 463–471, Apr. 1985. [8] E. Viterbo and E. Biglieri, “A universal lattice decoder,” in 14 Colloq. GRETSI (Juan-les-Pins, France, Sept. 1993), pp. 611–614.

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

1643

Evaluating the Performance of Convolutional Codes Over Block Fading Channels Esa Malkam¨aki, Member, IEEE, and Harry Leib, Senior Member, IEEE Abstract— This correspondence considers union upper bound techniques for error control codes with limited interleaving over block fading Rician channels. A modified bounding technique is presented that relies on limiting the conditional union bound before averaging over the fading process. This technique, although analytically not very attractive, provides tight and hence useful numerical results. Index Terms—Block fading channel, error control, interleaving, union bound.

I. INTRODUCTION Fading of the received signal is most significant in radio communications, and diversity techniques are frequently employed to overcome this problem. Error control coding with interleaving provides a powerful form of diversity. The performance of error control codes over fading channels is commonly evaluated using the union upper bounds and assuming ideal interleaving, see, e.g., [1]. The union upper bounds for specific convolutional codes over memoryless channels using the transfer function of the code were introduced in [2]. For the ideally interleaved Rayleigh fading channel, the union bound can be calculated by summing the exact pairwise error probabilities averaged over the fading process [1] yielding tight results for high signal-to-noise ratios. The block fading channel model, considered in this correspondence, is especially suitable for wireless communication systems with slowly moving terminals [3], [4]. In this model, the fading process is constant over a block of N channel symbols and it is statistically independent between the blocks. Furthermore, due to a delay constraint, only limited interleaving is possible in practice. Error control coding over block fading channels has been considered in [3] and [5] using random coding techniques, see also [6]. Uninterleaved convolutional codes with short channel blocks were considered in [7]. Multifrequency trellis coding was considered in [8]. Recently, some code search results of convolutional codes for block fading channels were presented in [9], where simulations were used for the performance evaluation since the union bound approach was found to provide quite unfruitful results. In this correspondence, we consider a modification of the union bounding technique for error control codes with limited interleaving over block fading Rician channels that provides tighter, and hence useful, results. II. SYSTEM MODEL Rate kc =nc terminated convolutional codes will be considered in this correspondence. Extension to other block codes is straightforward. The model of the convolutional encoded transmission system to be analyzed is shown in Fig. 1. A block of Bkc bits from the data source are first encoded by a rate Rc = kc =nc convolutional code with constraint length kc K = kc (m + 1), where m is the memory Manuscript received June 18, 1998; revised February 10, 1999. E. Malkam¨aki was with the Communications Laboratory, Institute of Radio Communications (IRC), Helsinki University of Technology, Espoo, Finland, on leave from the Laboratory of Radio Communications, Nokia Research Center (Helsinki), FIN-00045 Nokia Group, Finland. H. Leib is with the Department of Electrical Engineering, McGill University, Montreal, Que., Canada H3A 2A7. Communicated by M. L. Honig, Associate Editor for Communications. Publisher Item Identifier S 0018-9448(99)04175-9.

Fig. 1. System model.

order of the code. Before encoding, mkc tail bits (m = K 0 1 tail vectors) are added to each block of Bkc bits to terminate the code trellis into a known state. The nc (B + m) encoder output bits are denoted by xlj , where l = 1; 2; 1 1 1 ; nc indicates the code generator polynomial and j = 1; 2; 1 1 1 ; B + m: In the analysis, we assume antipodal modulation, i.e., xlj = 61: The output symbols xlj are interleaved over L subchannels (or bursts in a TDMA system), see Fig. 1. To simplify the analysis, we assume that the number of subchannels L = nc and that each output of the encoder is transmitted via a different subchannel. Notice that in many cases this can be achieved by considering instead of the original code an equivalent code with L = nc : For instance, a rate 1=2 code interleaved over L = 4 subchannels can be expressed as an equivalent rate 2=4 code, for further details see [6]. The channel is assumed to be frequency nonselective block fading Rician. Assuming coherent detection, the received signal samples can be written as

ylj

=

p

Ec l xlj + nlj ;

l = 1; 1 1 1 ; L; j = 1; 1 1 1 ; B + m

(1) where l indicates the subchannel, or, equivalently, the generator polynomial used, j the sample within a subchannel, Ec is the energy per transmitted bit, and nlj ’s are zero-mean white Gaussian noise samples with variance N0 =2: The fading envelopes l of the L subchannels involved in each decoding process are assumed to be independent of each other, identically distributed, and constant over the subchannel. Here, l are assumed to be Rician distributed with the probability density function

f ( l ) = 2l e0( +s )=2 I0 l2s ;  

l  0;

(2)

where s2 is the power of the specular component, 2 2 is the power of the scattered component, and E [ 2l ] = s2 + 2 2 : The Rician factor

is defined as = s2 =22 : The Viterbi algorithm (VA), used for the decoding of convolutional codes, employs the samples ylj as well as the ideal channel state information (CSI), ^ l = l : The branch metrics are calculated as [10]

(jr) =

n

l=1

l x(ljr) ylj :

(3)

Notice that on each branch in the trellis the first bit comes from one subchannel, the second bit from another subchannel, etc. III. PERFORMANCE BOUNDS In this section, the bit and block error probabilities of terminated convolutional codes over block fading channels are upper-bounded using two different approaches. The first approach employs a standard technique used for ideally interleaved fading channels and yields simpler and analytically more tractable results which, however, are very loose. The second approach where the conditional union upper

0018–9448/99$10.00  1999 IEEE

1644

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

bounds are limited before averaging, yields much tighter results but requires L-fold numerical integration. A. Bit Error Probability

and

P2 (d) = 1L 2

Assuming first that the fading envelopes, = ( 1 ; 1 1 1 ; L ), are fixed, the conditional union upper bound on the bit error probability, using the same technique as in [11], is given by

Pb ( ) 

1

1

c(d)P2 (dj ):

kc d=d

The coefficients

1

c(d) = c(d1 ; 1 1 1 ; dL ) =

(4)

ia(d1 ; 1 1 1 ; dL ; i) =

i=1

1 i=1

ia(d; i)

1 1

d=d i=1

a(d; i)N i

L l=1

Dld :

(5)

The coefficient a(d; i) is the number of error events with distance vector d = (d1 ; 1 1 1 ; dL ) and i bit errors, the exponents of Dl denote the component distances from the all-zero path, i.e., the distances on each subchannel. The lower summation limits, df = (d1f ; 1 1 1 ; dLf ), are the free component distances associated with each subchannel. Notice that in general dfree 6= d1f + 1 1 1 + dLf : The conditional pairwise error probability P2 (dj ) is given by

L

2Ec

P2 (dj ) = Q

d 2 : N0 l=1 l l

(6)

Using the standard approach, the average bit error probability after decoding in a frequency nonselective Rician fading channel is obtained by averaging (4) over the fading vector yielding

Pb = =



Pb ( )f ( ) d 

1

1

kc d=d

1

1

kc d=d

c(d)

c(d)P2 (d)

where

f ( ) =



P2 (dj )f ( ) d (7)

L l=1

f ( l )

f ( l ) is given by (2). The average pairwise error probability P2 (d) can be upper-bounded using the well-known exponential upper bound Q(x)  12 exp (0x2 =2) as1 (see, e.g., [14]) and

P2 (d) 

1 2

L

1+

l=1 1 + + dl E c =N0

exp

l E c =N0 0 1 + d :

+ dl E c =N0 (8)

For the special case of Rayleigh fading, we can get a closed-form solution for the average pairwise error probability given that the nonzero component distances are all distinct or all the same [1, Chs. 7.4 and 7.5]

P2 (d) = 1

2

L l=1

l

1

0

dl E c =N0 ; 1 + dl E c =N0

for d1 = 6 1 1 1 =6 dL (9a)

1 An exact solution, which, however, requires numerical integration, is possible if the new integral expression [13] for the Q-function is used.

L L01

d1 E c =N0 1 + d1 E c =N0

0

l

d1 E c =N0

1+

1 + d1 E c =N0

l=0

L01+l l

1

2l

for d1 = 1 1 1 = dL

;

(9b) where

l

is defined as

l =

are obtained from the generalized transfer function of the code which defines the component distance properties of the code [8], [12]

T (D1 ; 1 1 1 ; DL ; N ) =

1

1

L

dl : dl 0 di

i=1

(10)

The analytical expressions for the average pairwise error probabilities given in (8) and (9) are very useful since they provide insight into the asymptotic behavior of the error probability in a block fading channel, which can be used in code design [8], [9], [15]. However, when these expressions are inserted in (7), the resulting upper bound is very loose as will be seen from the numerical results. This is because there is no dominant error event and even at high signal-tonoise ratios many terms in (7) contribute significantly to the sum. A much tighter upper bound is obtained by limiting the conditional union upper bound on the bit error probability (4) before averaging over the fading vector yielding2

Pb 



1

min

2

;

1

1

c(d)P2 (dj ) f ( ) d :

kc d=d

(11)

Due to the minimization, the order of integration and summation cannot be changed in (11) and the L-fold integration has to be carried out numerically. B. Block Error Probability Similarly, for the block error probability, there are two possibilities: we can either average the upper bound on the error event probability or we can limit it before averaging. The former represents the standard approach which yields simpler, analytically more tractable results, which, however, are loose, whereas the latter approach with limiting before averaging yields tighter results but require L-fold numerical integration. The block error probability Pblock ( ) for a block of B decoded bits and for a given fading vector can be upper-bounded as [17]

Pblock ( )  1 0 (1 0 PE ( ))B  BPE ( ) (12) where PE ( ) is the error event probability defined as (similar to [11]) 1 PE ( )  a(d)P2 (dj ) (13) d=d

where

a(d) =

1 i=1

a(d; i)

is obtained from the generalized transfer function (5) and P2 (dj ) is given by (6). The average block error probability after decoding in the block fading channel is

Pblock  1 0

B



(1

0 PE ( ))B f ( ) d

PE ( )f ( ) d  B

1

d=d

a(d)P2 (d)

(14)

2 Strictly speaking, the bit error probability should be limited to 1 since in some pathological cases the bit error probability can be larger than 1=2. However, simulations show that in practical cases the bit error probability of a Viterbi decoder is limited to 1=2, even for catastrophic codes; see also [16].

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

1645

where P2 (d) is given by (8) or (9). This upper bound is, however, very loose and as such useless. The tighter upper bound is obtained by limiting the upper bound on the error event probability (13) before averaging over the fading. Then the average block error probability becomes

Pblock  1 0



1

0 min 1;

1

d=d

a(d)P2 (dj )

B

f ( ) d : (15)

The L-fold integration has to be evaluated numerically due to the minimization in (15). The union upper bound approaches presented in this correspondence for specific convolutional codes are also applicable for other block codes, i.e., the limiting before averaging is essential. For block codes, the distance spectrum is usually called the weight spectrum. For block fading channels, component weight spectra are needed, i.e., each weight is divided into L components indicating the distances between codewords on each subchannel. IV. NUMERICAL RESULTS In this section, we plot some examples of the upper bounds on the average bit error probability Pb as well as the average block error probability Pblock in the block fading Rician channel as a function of E b =N0 , where E b = E c =Rc : The numerical integrations in this work were carried out using the NAG Fortran library routines. The multidimensional integrals over hyper-rectangular regions were calculated using the routine D01EAF, which uses an adaptive subdivision strategy. For further details, see the NAG Fortran Library documentation [18]. Although the tighter bounds require L-fold numerical integration, the increased numerical complexity becomes less of a problem with the growing computer power available today, e.g., about 30 CPU seconds are required to compute a curve of the modified bound (three-fold integral) in Fig. 3 on a Digital AlphaServer 8400 computer. A modified version of the FAST algorithm [19] was used to compute the component distance spectra, i.e., the coefficients a(d) and c(d): Simulations to verify the tightness of the bounds were also carried out. Up to 106 coded blocks were transmitted. The channel state information (CSI) was assumed to be ideal, i.e., the channel fading envelope was known by the receiver performing coherent detection. Soft-decision decoding with the Viterbi algorithm was used. The convergence of the traditional (7) and the modified (11) union bounds is shown in Fig. 2 for the average bit error probability of a rate 1=2 terminated convolutional code (B = 98) with constraint length K = m + 1 = 3 and generator polynomials (7; 5) in octal transmitted over a block fading Rayleigh channel with L = 2: The summation in (7) and (11) is truncated such that only the error events having the total distance d = d1 + 1 1 1 + dL  dmax are taken into account. The transfer function upper bound, i.e., the untruncated union bound (dmax = 1), is calculated using the upper p p bound3 Q( x + y )  Q( x) exp (0y=2): The generalized transfer function for this code is given, e.g., in [12]. The free distance of this code is dfree = 5 and there is only one path with this distance, i.e., the lowest curve in Fig. 2 only takes into account one error event and is, therefore, the same for both approaches. It is also a lower bound. It is clearly seen that the modified union bound on the bit error probability (11) converges rather fast, whereas the traditional union bound (7) calculated as the sum of the average pairwise error probabilities (9) does not. The simulation results show that the modified bound on the bit error probability is rather tight (about 3 dB) and does not get tighter even for high values of E b =N0 : This clearly indicates that 3 Also here, the upper-bounding can be avoided [13] to get slightly tighter results.

Fig. 2. Union bounds on the bit error probability (7) and (11) for different values of dmax , a rate Rc 1=2 terminated convolutional code with generators (7; 5) in octal, input block B = 98 bits, a block fading Rayleigh channel with L = 2 channel blocks per codeword, channel block N = 100 bits.

=

Fig. 3. Union upper bounds (dmax = 42) on the block error probability over a block fading Rician channel with L = 3 channel blocks per codeword, a rate Rc = 1=3 terminated convolutional code with generators (133; 145; 175) in octal, input block B = 94 bits, channel block N = 100 bits.

there is no dominant error event for block fading Rayleigh channels with limited interleaving, which is due to the fact that all the error events have the same diversity order [15]. This is different from the ideal interleaving case where the Hamming distance of an error event defines the diversity order for that event and, therefore, the event with minimum distance dominates the results. Fig. 3 depicts the union upper bounds on the average block error probability for a rate 1=3 terminated convolutional code (B = 94) in-

1646

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

terleaved over L = 3 channel blocks for four different Rician factors

= 0; 5; 10; and infinity (= additive white Gaussian noise (AWGN) channel). The code polynomials in octal are (133; 145; 175). Here the union bounds are truncated at dmax = 42.4 The traditional bounds (7) get tighter for increasing values of and finally coincide with the modified bound for = 1 (AWGN channel) with which no averaging is needed. The modified bounds (11) for the block error probability are surprisingly tight for all values of and they also get tighter for higher values of : Furthermore, they are clearly tighter than the bit error probability upper bounds. The convolutional codes used in these examples are optimized for the AWGN channel (dfree maximized), and they are not necessarily optimum for the block fading channel, see, e.g., [15] and [9]. V. CONCLUSIONS In this correspondence, the union bounding techniques for error control codes with limited interleaving over block fading Rician channels were considered. The traditional union bounding technique, which sums averaged pairwise error probabilities, was shown to yield very loose results especially for low Rician factors (a block fading Rayleigh channel). A modified union bounding technique was presented which limits the conditional union bound before averaging over the fading process and thus avoids the “explosion” of the union bound for low SNR. This modified bounding technique provides much tighter, and hence useful, numerical results, but requires L-fold numerical integration where L is the number of diversity subchannels. Examples were shown for terminated convolutional codes but the necessity for optimization before averaging in the block fading channels can be extended to other block codes as well. This was also clearly shown for random coding techniques in [5].

[9] R. Knopp and P. A. Humblet, “Maximizing diversity on block fading channels,” in Proc. IEEE ICC’97 (Montreal, Que., Canada, 1997), pp. 647–651. [10] J. Hagenauer, “Rate-compatible punctured convolutional codes (RCPC codes) and their applications,” IEEE Trans. Commun., vol. 36, pp. 389–400, Apr. 1988. [11] A. J. Viterbi and J. K. Omura, Principles of Digital Communication and Coding. Tokyo, Japan: McGraw-Hill, 1979. [12] D. N. Rowitch and L. B. Milstein, “Convolutional coding for direct sequence multicarrier CDMA,” in Proc. IEEE MILCOM’95 (San Diego, CA, 1995), pp. 55–59. [13] M. K. Simon and D. Divsalar, “Some new twists to problems involving the Gaussian probability integral,” IEEE Trans. Commun., vol. 46, pp. 200–210, Feb. 1998. [14] E. Biglieri, D. Divsalar, P. J. McLane, and M. K. Simon, Introduction to Trellis-Coded Modulation with Applications. New York: Macmillan, 1991. [15] E. Malkam¨aki and H. Leib, “Rate 1=n convolutional codes with interleaving depth of n over a block fading Rician channel,” in Proc. IEEE VTC’97 (Phoenix, AZ, 1997), pp. 2002–2006. [16] J. W. Modestino and S. Y. Mui, “Convolutional code performance in the Rician fading channel,” IEEE Trans. Commun., vol. COM-24, pp. 592–606, June 1976. [17] S. Kallel and C. Leung, “Efficient ARQ schemes with multiple copy decoding,” IEEE Trans. Commun., vol. 40, pp. 642–650, Mar. 1992. [18] NAG Ltd, NAG Fortran Library Manual, Mark 17, The Numerical Algorithms Group Limited, UK, 1995. [19] M. Cedervall and R. Johannesson, “A fast algorithm for computing distance spectrum of convolutional codes,” IEEE Trans. Inform. Theory, vol. 35, pp. 1146–1159, Nov. 1989.

On the Weight Distribution of Terminated Convolutional Codes Marc P. C. Fossorier, Member, IEEE, Shu Lin, Fellow, IEEE, and Daniel J. Costello, Jr., Fellow, IEEE

ACKNOWLEDGMENT The authors wish to thank the anonymous reviewers for their comments that helped to improve this correspondence. REFERENCES [1] J. G. Proakis, Digital Communications, 2nd ed. New York: McGrawHill, 1989. [2] A. J. Viterbi, “Convolutional codes and their performance in communication systems,” IEEE Trans. Commun. Technol., vol. COM-19, pp. 751–772, Oct. 1971. [3] G. Kaplan and S. Shamai (Shitz), “Error probabilities for the block¨ fading Gaussian channel,” Arch. Elek. Ubertragung, vol. 49, pp. 192–205, July 1995. [4] L. Ozarow, S. Shamai, and A. D. Wyner, “Information theoretic considerations for cellular mobile radio,” IEEE Trans. Veh. Technol., vol. 43, pp. 359–378, May 1994. [5] E. Malkam¨aki and H. Leib, “Coded diversity on block fading channels,” IEEE Trans. Inform. Theory, vol. 45, pp. 771–781, Mar. 1999. [6] E. Malkam¨aki, “Performance of error control over block fading channels with ARQ applications,” Ph.D. dissertation (Tech. Rep. T43), Helsinki Univ. Technol., Commun. Lab., Espoo, Finland, 1998. [7] G. Kaplan, S. Shamai (Shitz), and Y. Kofman, “On the design and selection of convolutional codes for an uninterleaved, bursty Rician channel,” IEEE Trans. Commun., vol. 43, pp. 2914–2921, Dec. 1995. [8] Y. S. Leung, S. G. Wilson, and J. W. Ketchum, “Multi-frequency trellis coding with low delay for fading channels,” IEEE Trans. Commun., vol. 41, pp. 1450–1459, Oct. 1993. 4 Enough terms have to be taken into account in order to have an upper bound. Notice that without truncation the traditional union bound does not necessarily converge.

Abstract—In this correspondence, the low-weight terms of the weight distribution of the block code obtained by terminating a convolutional code after x information blocks are expressed as a function of x. It is shown that this function is linear in x for codes with noncatastrophic encoders, but quadratic in x for codes with catastrophic encoders. These results are useful to explain the poor performance of convolutional codes with a catastrophic encoder at low-to-medium signal-to-noise ratios. Index Terms—Block codes, convolutional codes, soft-decision decoding, Viterbi decoding, weight distribution.

I. INTRODUCTION Any binary rate k=n convolutional code with constraint length L (or memory order L 0 1) whose associated trellis diagram terminates with a zero tail after encoding x blocks of k information bits into x blocks of n transmitted symbols generates an (n(x + L 0 1); kx) binary block code. In this correspondence, for each such (n(x + L 0 1); kx) block code, we express the low-weight terms of the corresponding weight distribution as a function of x. It is shown that this Manuscript received November 1, 1996; revised February 15, 1999. This work was supported by the National Science Foundation under Grants NCR94-15374 and NCR95-22939. M. P. C. Fossorier and S. Lin are with the Department of Electrical Engineering, University of Hawaii, Honolulu, HI 96822 USA. D. J. Costello, Jr. is with the Department of Electrical Engineering, University of Notre Dame, Notre Dame, IN 46556 USA. Communicated by N. Seshadri, Associate Editor for Coding. Publisher Item Identifier S 0018-9448(99)04382-5.

0018–9448/99$10.00  1999 IEEE

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

function is linear in x for codes with noncatastrophic encoders, but quadratic in x for codes with catastrophic encoders. This fact helps to explain why convolutional codes generated by a catastrophic encoder perform poorly. These results can be viewed as a simpler method than the one derived in [1] to obtain the low-weight terms of the weight distribution of a zero-tail terminated convolutional code. These results are also useful in deriving a good approximation of the union upper bound on the bit error probability for maximum-likelihood decoding (MLD) of convolutional codes with a noncatastrophic encoder by applying standard bounding techniques for block codes. Finally, since the weight distribution of a block code is independent of the mapping between information bits and codewords, these results are valid for terminated convolutional codes with encoder realizations in feedforward as well as in feedback forms. II. WEIGHT DISTRIBUTION OF THE (n(x + L 0 1); BLOCK CODE

1647

Theorem 2.2: For W2d

denote the number of codewords of weight i in the (n(x + L 0 1); kx) block code obtained by terminating with a zero tail a rate k=n convolutional code of constraint length L after x information blocks. We define: 1) Ni as the number of paths of weight i in the trellis diagram that diverge from the all-0 path at the origin x = 0 and remerge only once (note that Ni = 0 for all i < dH , the minimum free Hamming distance of the convolutional code) and 2) li; max as the maximum length (in information blocks) of the Ni paths of weight i previously defined. (We exclude catastrophic encoders here since li; max is unbounded for some i in this case.) Based on these definitions, we derive the following theorem. Theorem 2.1: For i 2 fdH ; is of the form

111

Wi (x)

=

;

2dH

Ni x

01g and 

li; max Wi (x)

,

x

0

(1)

ai

where ai  0. Proof: For x  li; max , Ni can equivalently be defined as the number of paths of weight i in the trellis diagram that remerge with the all-0 path at section x + L 0 1. Hence after adding one dimension (i.e., one information block) to the (n(x01+L01); k(x01)) code we obtain the (n(x + L 0 1); kx) code with, for i 2fdH ; 1 1 1 ; 2dH 0 1g Wi (x

0 1) +

(2)

Ni :

Applying a chain argument to (2), we obtain, Wi (x)

=

Wi (li; max

0 1) + ( 0 x

li; max

+ 1)Ni :

(3)

However, for the first li; max 0 1 steps in the recursion, i.e., for x < li; max , fewer than Ni paths are added since some of the paths corresponding to Ni have not terminated yet. This completes the proof after defining ai as the total number of codewords discarded in the first li; max 0 1 steps. Theorem 2.1 corresponds to the steady-state part of the weight distributions associated with the family of (n(x + L 0 1); kx) block codes. By evaluating (1) and (2) for x = li; max , we obtain Wi (li; max )

=

Wi (li; max

so that ai

= (li; max

(Nd 2

2

x

+ b2d

where a2d  0 and b2d  0. Proof: For x  maxf2ld ; max + becomes W2d

(x) =

(x

W2d

0 1) +

01

+L

; l2d

0

x

L

0 1) +

0 1) 0 Ni

Ni

=

Ni li; max

Wi (li; max

0 1)

:

0

ai

; max

(6)

01

; l2d

; max

+ Pd (x)

N2d

g

a2d

g,

(2) (7)

where Pd (x) represents the number of pairs of disjoint paths, of weight dH each, with the first path diverging from the all-0 path at x = 0. Then, for x  maxf2ld ; max + L 0 1; l2d ; max g (x) =

Pd

(x

Pd

0 1) + (

2 )

Nd

(8)

2 )

2 (x) = (Nd ) x

0 0

ad

(9)

:

Regrouping (7) and (9), it follows that

Wi (x)

=

2 )

since there are (Nd such possible pairs. By following the same approach as in the proof of Theorem 2.1, we obtain, for ad0  0

kx)

A. Noncatastrophic Encoders

Wi (x)

; max

ld

(x) =

Pd

Let

 maxf2

x

(4)

W2d

(x) =

W2d

(x

0 1) +

Based on (5), we observe that (1) is also satisfied for x = li; max 0 1. For x < li; max 0 1, Ni in (1) must be replaced by a smaller value which in general differs for each value of x. For i  2dH , Wi (x) is no longer a linear function of x, since not only paths of weight i contribute to Wi (x) in this case. Theorem 2.2 gives an example of this fact.

2 + (Nd ) x

0 0

ad

A chain argument completes the proof after defining b2d b2d

and a2d

= (Nd

2 ) =2 + N2d

0 a0

d

:

(10)

as (11)

as the total number of paths discarded in the initial steps.

Also, by algebraic manipulations of (7)–(10), we can show that (6) remains valid for x = maxf2ld ; max + L 0 1, l2d ; max g 0 1. For i > 2dH , Wi (x) can be derived by following a similar approach. For example, the rate 1=3 convolutional code with L = 3 and generators in octal form (5; 7; 7) (see [2, Table 11.1]) generates the family of (3(x + 2); x) block codes with dH = 8 and weight distribution starting with

01 ) = 5 0 14 ) = 13 0 65 ) = 34 0 244

W8 (x)

= 2x

W10 (x

x

W12 (x

x

W14 (x

x

W16 (x)

2 = 2x + 75x

0 807

for x  13, since l16; max = 14. In a recent paper [1], it was shown that the entire weight distribution of the (2(x + L 0 1); x) block code obtained by truncating a rate 1=2 convolutional code of constraint length L after x information blocks can be obtained by multiplying by itself (x + L 0 1) times a L01 2 2 2L01 square matrix. The elements of this matrix correspond to the Hamming weights of the transitions in the state diagram of the rate 1=2 convolutional code. However, in many cases, only the first terms of the weight distribution of a code are required. Consequently, the results of this correspondence can be used in conjunction with the results of [1] to efficiently determine the terms of interest in the weight distribution of a terminated convolutional code. For example, given a noncatastrophic encoder, we know from Theorem 2.1 that for i 2 fdH ; 1 1 1 ; 2dH 0 1g and x  li; max , Wi (x) is of the form Wi (x)

(5)

N2d

=

Ax

+ B:

(12)

Then the results of [1] can be used to evaluate the coefficients A and for two small values of x  li; max . More precisely, let us consider the example in [1] with L = 5, dH = 7, and l7; max = 4. Based on [1, Table 1], we have for x + 4 = 12 and x + 4 = 15

B

W7 (8) W7 (11)

= 8A + B = 13 = 11A + B = 19

(13)

1648

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 5, JULY 1999

so that A = 2 and B = 03. (Note that since l7; max = 4, even smaller values of x can be chosen to solve for A and B .) Then we can verify in [1, Table 1] that for x + 4 = 18, W7 (14) = 2 1 14 0 3 = 25. However, for x = 996, we can directly obtain W7 (996) = 1989 instead of raising a 16 2 16 matrix to the 1000th power. The same method can be used to determine any Wi (x) with i 2 fdH ; 1 1 1 ; 2dH 01g. Also, based on Theorem 2.2, three equations with

x  maxf2ld

; max

+L

0 1; l2

; max

d

g

are sufficient to determine W2d (x) in the form Ax2 + Bx + C . Indeed, the results of Theorems 2.1 and 2.2 can be generalized to express any Wi (x) in polynomial form. To this end, the results of [1] become useful to evaluate the coefficients of the corresponding polynomials and to verify that large enough values of x are chosen. B. Catastrophic Encoders In this section, we show that Wd (x) is proportional to x if the encoder is catastrophic. We consider a rate 1=2 convolutional code with a catastrophic encoder and assume that, in addition to the weightzero self-loop around the all-zero state, the state diagram contains a weight-zero self-loop around the all-one state. Although derived for this particular case, the result that Wd (x) is proportional to x2 holds for any convolutional code with a catastrophic encoder since for such codes, in addition to the all-zero state, at least one other state touches a weight-zero loop as in the case considered in the following proof. (If the length of the weight-zero loop is l0 > 1, a change of variable y = x=l0 is needed.) Also, extension to multiple weight-zero loops follows the same lines. However, the corresponding general expressions become significantly more complex. We define Ti (x) as the number of distinct paths of length x (in information blocks) such that the xth branch is the weight-zero selfloop around the all-one state and the codeword has weight i in the associated (n(x + L 0 1); kx) block code. (We assume here that the free distance dH is the weight of the lowest weight path that diverges from and remerges with the all-zero path.) Based on this definition, we obtain, for x  ld ; max

W4 (x) = x(x + 1)=2. As in Section II-A, the results of [1] can be used to evaluate the coefficients values of Wd (x) for catastrophic encoders. III. APPROXIMATION OF THE UNION BOUND ON THE BIT ERROR PROBABILITY OF CONVOLUTIONAL CODES For binary phase-shift keying (BPSK) transmission over an additive white Gaussian noise (AWGN) channel, the block error probability associated with MLD of an (n(x + L 0 1); kx) block code obtained by terminating a rate-k=n convolutional code with row distance dH can be bounded using the union bound by

Ps  with

(14)

where Nd is the number of paths of weight dH diverging from the all-zero path at x = 0 that do not contain the weight-zero self-loop around the all-one state, and ld ; max is defined with respect to these paths only. Also

Td (x) = Td (x 0 1) + Nd0

(15)

where Nd0  Nd is the number of paths of weight dH that diverge from the all-zero path at x = 0, that do not contain the weight-zero self-loop around the all-one state, and that pass through the all-one state, and where we assume Nd0 > 0. As in the proof of Theorem 2.1, we obtain

where



Td (x) = Nd0 x 0

(16)

Wd (x) = Wd (x 0 1) + Nd0 x 0 + Nd :

(17)

 0. It follows that

By applying the chain rule, we finally obtain

Wd (x) = Nd0 x(x + 1)=2 + (Nd

0 )x 0

(18)

where  0. For example, the trivial rate 1=2 convolutional encoder with one memory element whose input and output are summed to provide the two same output bits is catastrophic [3]. This code defines a family of (2(x + 1); x) block codes with dH = 4 and

i=d

Wi (x)Q~

p

1 0n e Q~ (x) = (N0 )01=2 x

2

Wd (x) = Wd (x 0 1) + Td (x) + Nd

N

i

=N

(19)

dn:

Also, if encoding is done in reduced echelon form, a good approximation of the union bound on the bit error probability Pb is obtained by scaling each term in the sum of (19) by i=(n(x + L 0 1)) [4]. Consequently, a good approximation of the union bound on Pb for medium to high signal-to-noise ratio (SNR) values can be derived based on Theorem 2.1 and (19) for zero tail terminated convolutional codes with a noncatastrophic encoder by considering only the values i 2 fdH ; 1 1 1 ; 2dH 0 1g [5]. For x large enough, the approximate bound on Pb becomes independent of x. On the other hand, the approximate bound on Pb increases linearly with x for terminated convolutional codes with a catastrophic encoder. This suggests that for these codes, the best error performance is achieved by terminating the trellis after encoding xN information blocks, where xN depends on the operating SNR. Although catastrophic encoders should be avoided when convolutional codes are decoded with the Viterbi algorithm, they can be used if the trellis terminates, since information sequences of finite length only are considered. In particular, for a given rate and constraint length, a catastrophic encoder may generate a convolutional code with a larger row distance drow than the free distance dfree of the best code generated by a noncatastrophic encoder1 [6]. However, based on the previous results, this gain in drow for the corresponding terminated codes with minimum Hamming distance drow is achieved at the expense of significantly larger low-weight coefficients in the weight distribution. As a result, despite a smaller row distance, convolutional codes with noncatastrophic encoders should give lower values of Pb than the equivalent codes with catastrophic encoders at low-to-medium SNR values. REFERENCES [1] J. K. Wolf and A. J. Viterbi, “On the weight distribution of linear block codes formed from convolutional codes,” IEEE Trans. Commun., vol. 44, pp. 1049–1051, Sept. 1996. [2] S. Lin and D. J. Costello Jr., Error Control Coding: Fundamentals and Applications. Englewood Cliffs, NJ: Prentice-Hall, 1983. [3] J. L. Massey and M. K. Sain, “Inverses of linear sequential circuits,” IEEE Trans. Comput., vol. C-17, pp. 330–337, Apr. 1968. [4] M. P. C. Fossorier, S. Lin, and D. Rhee, “Bit error probability for maximum likelihood decoding of linear block codes,” IEEE Trans. Inform. Theory, to be published. [5] M. P. C. Fossorier, S. Lin, and D. J. Costello, “Weight distribution and error performance of terminated convolutional codes,” in Proc. Conf. Information Sciences and Systems (Baltimore, MA, Mar. 1997), pp. 767–768. [6] D. J. Costello Jr., “Free distance bounds for convolutional codes,” IEEE Trans. Inform. Theory, vol. IT-20, pp. 356–365, May 1974. 1 In general d free trophic.

 drow , and dfree = drow if the encoder is noncatas-