On the Elementwise Convergence of Continuous

1

On the Elementwise Convergence of Continuous Functions of Hermitian Banded Toeplitz Matrices Pedro M. Crespo, Senior Member, IEEE, and Jesús Gutiérrez-Gutiérrez Abstract— Toeplitz matrices and functions of Toeplitz matrices (such as the inverse of a Toeplitz matrix, powers of a Toeplitz matrix or the exponential of a Toeplitz matrix) arise in many different theoretical and applied fields. They can be found in the mathematical modelling of problems where some kind of shift invariance occurs in terms of space or time. R. M. Gray’s excellent tutorial monograph on Toeplitz and circulant matrices has been, and remains, the best elementary introduction to the Szegö distribution theory on the asymptotic behavior of continuous functions of Toeplitz matrices. His asymptotic results, widely used in engineering due to the simplicity of its mathematical proofs, do not concern individual entries of these matrices but rather, they describe an “average” behavior. However, there are important applications where the asymptotic expressions of interest are directly related to the convergence of a single entry of a continuous function of a Toeplitz matrix. Using similar mathematical tools and to gain insight into the solutions of this sort of problems, the present paper derives new theoretical results regarding the convergence of these entries. Index Terms— Circulant matrices, covariance matrices, elementwise convergence, functions of matrices, MMSE, stationary stochastic time series, Szegö’s theorem, Toeplitz matrices.

behavior of the arithmetic average of the entries belonging to the main diagonal of g(Tn ) when n grows to infinity. This type of results has been widely used in information theory and signal processing (see e.g. [2]-[7]). However, there are engineering problems (see Section II) where their relevant parameters are given by a single entry of a certain matrix g(Tn ). In order to gain some insight into the asymptotic behavior of their solutions, we prove with tools similar to the ones used by Gray, new theoretical results concerning the convergence of the entries of a continuous function of a Toeplitz matrix. The main result is given in Theorem 5, stating that given a sequence {Tn }∞ n=1 of Hermitian banded Toeplitz matrices, the limit of an arbitrary entry g(Tn )i,j can be expressed as the inner product in the L2 [0, 2π] space2 between a function that depends only on g and another that depends only on the particular (i, j)-th position considered. The paper is organized as follows. The next section states some applications where the theory presented in the following sections plays an important role. Section III reviews some preliminary definitions and results in matrix analysis. Section IV is the main part of the paper and proves new results on elementwise convergence of continuous functions of Toeplitz matrices. Finally, Section V applies these new results to the asymptotic characterization of the MMSE expression for a decision feedback equalizer and a linear predictor numerical examples.

I. I NTRODUCTION

II. A PPLICATIONS

(n)

A square Toeplitz matrix is an n × n matrix Tn = (ti,j ) (n) where ti,j = ti−j are complex numbers. Toeplitz matrices arise in many theoretical and applied fields. In particular, there are many signal processing applications, where the Toeplitz matrix Tn is often Hermitian. Generally speaking, what is really relevant in many of the associated problems is a continuous function of the Toeplitz matrix, g(Tn ) := Un diag(g(λ1 (Tn )), . . . , g(λn (Tn )))Un−1 , rather than Tn by itself, and the characterization of its asymptotic behavior when n grows to infinity. The above expression is defined from an eigenvalue decomposition of Tn Tn =

Un diag(λ1 (Tn ), . . . , λn (Tn ))Un−1

The continuous function g and matrix Tn will depend on the particular application1 . The classic book of Grenander and Szegö [1] is a masterpiece on asymptotic behavior of continuous functions of Toeplitz matrices. Unfortunately, the level of mathematical sophistication for understanding this book is often beyond the level provided by a typical engineering background. In [2] and [3], Gray simplified the Szegö theory assuming more restrictive conditions. By doing so, he succeeded in conveying the main ideas of the Szegö theory to a wider audience. In the present paper we will also consider Gray’s assumptions in [3], namely, Hermitian banded Toeplitz matrices. It should be mentioned that the asymptotic results derived in [2] and [3], do not concern individual entries of continuous functions of Toeplitz matrices but rather they describe an “average” behavior. The most famous result being the Szegö theorem regarding the asymptotic This work was partially supported by the Spanish Ministry of Education and Science, by the European Regional Development Fund and by the European Social Fund through the MIMESIS project (no. TEC2004-06451C05-04/TCM) and the Torres-Quevedo program. Both authors are with CEIT and Tecnun (University of Navarra), Manuel de Lardizábal 15, 20018, San Sebastián, Spain. Tel.: +34 943212800; fax: +34 943213076. Email addresses: [email protected] (P. M. Crespo), [email protected] (J. Gutiérrez-Gutiérrez) 1 Notice that when g(x) = xq , with q an integer, g(T ) reduces to T q . n n

The novel results presented in this paper are related to the convergence of the entries of a continuous function of a Toeplitz matrix g(Tn ). In the sequel, we provide some examples where certain parameters of interest can be expressed as a specific entry of g(Tn ), with Tn and g depending on the particular application. Consequently, our results have direct application to the asymptotic characterization of these parameters when the order n grows without bound. A. Linear estimation The first example is a standard linear estimation problem where the observed stationary stochastic time series is given by yk = sk + nk . The process sk is a moving average (MA) process obtained at the output of a FIR filter with impulse response h = (h0 , h1 , . . . , hm ) when its input, xk , is a sequence of independent identically distributed (i.i.d.) random variables with zero mean and variance 1. The observation noise, nk , is a white Gaussian noise, independent of the 2 input process xk , with zero mean and variance σnoise . Figure 1 shows 3 the d-step linear estimator of order n − 1, where the weights αj are chosen to minimize E[|yk−d+1 − ybk−d+1 |2 ]. It can be shown [8], that the MMSE expression is given by the reciprocal of the (d, d)-th entry of the matrix g(Tn ), that is, MMSEn (d) =

1 g(Tn )d,d

(1)

In the above expression Tn is the n × n Hermitian banded Toeplitz matrix given by Tn = Hn∗ Hn (2) ⊤

where ∗ denotes conjugate transpose (i.e., Hn∗ = H n ) and Hn is the (n + m) × n Toeplitz convolution matrix with first and last columns given by (h0 , h1 , . . . , hm , 0, . . . , 0)⊤ and 2 The

Hilbert space of square Lebesgue integrable functions on [0, 2π]. = 1 and d = n correspond to the cases of a forward and backward linear predictor, respectively, where from the symmetry of the problem, both cases yield the same MMSE. 3d

2

Error

nk xk

{hl }

yk−d+1

yk

sk

α1

yk−n+1

α2

αn−2

where tr denotes trace, Tn is again defined by expression (2), and gθ is given by

gθ (x) = max θ −

αn−1

2 σnoise ,0 x

The parameter θ is such that ybk−d+1

Fig. 1.

d-step linear estimator of order n − 1. Equalizer Error

nk xk

yk

{hl }

Detector

x bk−n+d

n-taps FFF inf. FBF

Fig. 2.

1 tr(gθ (Tn )) (6) n It is well known, that as n grows to infinity the value of the normalized maximized mutual information will converge to the capacity of this channel [9]. The question that interests us here, however, is the behavior of the entries of the corresponding correlation matrix gθ0 (Tn ) as n → ∞, rather than the capacity itself. As in the previous examples, Theorem 5 in Section IV will answer this question. The constant θ0 is obtained by applying the Szegö theorem for Toeplitz matrices [1] to expression (6) when n → ∞. That is, Px =

xk−n

d

Px =

Communication system: discrete time channel and equalizer.

1 2π

Z

2π

0

max θ0 −

2 σnoise , 0 dω f (ω)

2

(0, . . . , 0, h0 , h1 , . . . , hm )⊤ , respectively. On the other hand, the function g is 1 g(x) = 2 (3) σnoise + x

B. Decision feedback equalization We consider the communication system shown in Figure 2, with discrete channel impulse response h = (h0 , h1 , . . . , hm ) and additive 2 . The input white Gaussian noise nk of zero mean and variance σnoise symbols xk are discrete i.i.d. random variables independent from the noise samples, taking values ±1 with E[xk ] = 0 and E[x2k ] = 1. The equalizer block comprises an n tapped delay line feedforward filter (FFF), and an infinite tapped delay line feedback filter (FBF). A delay of d units has been placed at the input of the FBF for didactic purposes. Observe that for d = 1 the equalizer is a standard decision feedback equalizer (DFE), whereas when d = n2 → ∞ the equalizer reduces to a linear equalizer (LE). The FFF and FBF tap weights are chosen to minimize E[|xk−n+d − x bk−n+d |2 ]. The resulting MMSE when n → ∞ is given by lim MMSEn (d) = lim g(Tn )d,d

n→∞

n→∞

(4)

where again Tn = Hn∗ Hn but in this case g(x) =

2 σnoise 2 σnoise +

x

(5)

C. Information-theoretic application Consider the discrete-time Gaussian channel of the previous example. Let the input of the channel be x = (x1 , x2 , . . . , xn+m ), a zero mean Gaussian random vector. Denote by y = (y1 , y2 , . . . , yn ) the corresponding output vector. It can be shown, that the entries of the (n) correlation matrix, Rx = (E[xi xj ])n i,j=1 , that maximize the mutual information between the input and output of the channel, I(x; y), subject to an average input power constraint n X k=1

E[|xk |2 ] = tr(Rx(n) ) ≤ nPx

can be written as Rx(n) = gθ (Tn )

where f (ω) = |H(ω)| , with H(ω) being the transfer function of the channel. Furthermore, according to Theorem 7, the core of this correlation matrix will be Toeplitz with values along its r-th diagonal4 1 2π

Z

2π

gθ0 (f (ω))erωi dω;

0

r∈Z

showing the known fact that for large n, the entries located along the middle part of an input sequence (vector) belong to a stationary random process with power spectral density gθ0 (f (ω)) [10]. However, this is not the case for those entries located either at the beginning or at the end of the input sequence. III. P RELIMINARY DEFINITIONS AND RESULTS A. Functions of a matrix We begin by introducing the concept of polynomial function of a matrix. Definition 1: If p(x) = αq xq + . . . + α1 x + α0 is a given polynomial with coefficients in C, then for every n × n matrix A we define p(A) := αq Aq + . . . + α1 A + α0 In where In denotes the n × n identity matrix. We now introduce the important concept of function of a matrix. [11] and [12] are complete references on this topic. Definition 2: We consider an n × n diagonalizable matrix A = U diag(µ1 , . . . , µn )U −1 . If g is a complex function on the set {µ1 , . . . , µn } of the eigenvalues of A, then we define the n × n matrix g(A) := U diag(g(µ1 ), . . . , g(µn ))U −1 . Observe that the last definition agrees with Definition 1. Furthermore, if g is a complex function on the set {µ1 , . . . , µn } of the eigenvalues of an n × n diagonalizable matrix A, there exists a polynomial of degree at most n − 1 such that g(A) = p(A). This is a direct consequence of the fact that the interpolation problem of determining a polynomial p(x) of degree at most n − 1 satisfying p(µk ) = g(µk ), 1 ≤ k ≤ n, has solution (see e.g. [13]). Therefore, Definition 2 is independent of the chosen eigenvalue decomposition of the matrix A. From now on if A is an n × n Hermitian matrix then we will indicate by λk (A), 1 ≤ k ≤ n, all the eigenvalues of A counted with their multiplicities and numbered in nonincreasing order. 4 r = 0, r > 0 and r < 0 denotes the main, the upper and the lower diagonals, respectively.

3

B. Asymptotically equivalent sequences of matrices

C. Circulant matrices

The following well known matrix norms (see e.g. [13]) will be used: Definition 3: The Frobenius norm kAkF and the spectral norm kAk2 of an n × n matrix A = (ai,j )n i,j=1 are defined, respectively, as kAkF :=

p

tr(A∗ A) =

n X n X i=1 j=1

!1 2

|ai,j |2

We now introduce a special type of Toeplitz matrices. Definition 5: An n × n circulant matrix is one having the form 0

c0

B B c B n−1 B C=B B cn−2 B .. B .

c1

and

kAk2 := max x6=0

x∗ A ∗ A x x∗ x

1 2

=

1 ∗

max λk (A A)

2

1≤k≤n

where x is a column vector. The next definition is due to Gray (see [2] or [3]). Definition 4: Let An and Bn be n × n matrices for all n ≥ 1. We say that the sequences {An } and {Bn } are asymptotically equivalent, and write An ∼ Bn , if ∃M ≥ 0 ;

kAn k2 , kBn k2 ≤ M

∀n ≥ 1

(7)

c1

c2

c0

c1

cn−1 .. . ···

c0 .. . cn−2

kAn − Bn kF √ =0 n Notice that from (7), if {An } and {Bn } are asymptotically equivalent sequences of Hermitian matrices then there exists a closed interval containing the spectra of these sequences, for instance, [−M, M ]. The next theorem concerning asymptotically equivalent sequences of Hermitian matrices, is due to Trench (see [2] or [14]). Theorem 1: Suppose two asymptotically equivalent sequences of Hermitian matrices {An } and {Bn }. Let [a, b] be a closed interval containing the spectra of these sequences. Then n 1X |g(λk (An )) − g(λk (Bn ))| = 0 lim n→∞ n

∀g ∈ C[a, b]

k=1

where C[a, b] denotes the set of all continuous complex functions on [a, b]. Corollary 1: Under the hypotheses of Theorem 1, if q ∈ [1, ∞) then 1X |g(λk (An )) − g(λk (Bn ))|q = 0 ∀g ∈ C[a, b] n→∞ n k=1 Proof: To shorten notation we write uk,n instead of |g(λk (An )) − g(λk (Bn ))|. Observe that n

lim

0 ≤ uk,n ≤ K,

1 ≤ k ≤ n,

n≥1

with K = 2 maxa≤x≤b |g(x)|. Let xn be the cardinality of the set {k ∈ {1, . . . , n}; uk,n > 1}, if q ≥ 1 (not necessarily an integer), then 0≤

n X k=1

uqk,n ≤

n X k=1

uk,n + xn K q ≤ (1 + K q )

n X k=1

uk,n

c2 c1 c0

C C C C C A

D. Banded Toeplitz matrices Suppose that f is a trigonometric polynomial of degree m, i.e., f (ω) =

and lim

cn−1 .. C . C C

Equivalently, an n×n matrix C = (ci,j )n i,j=1 is circulant if and only if i1 − j1 ≡ i2 − j2 (mod n) =⇒ ci1 ,j1 = ci2 ,j2 If C = (ci,j )n i,j=1 is an n × n circulant matrix, it is customary to write C = circ(c1,1 , . . . , cn,1 ).

m X

tk ekωi ;

tk ∈ C,

k=−m

n→∞

1

··· .. . .. . .. . cn−1

−m ≤ k ≤ m

where i denotes the imaginary unit. Therefore, if we consider tk = 0, |k| > m, then tk =

1 2π

Z

2π

f (ω)e−kωi dω

∀k ∈ Z

0

that is, f is the Fourier spectrum of the sequence {tk }∞ k=−∞ . The banded n × n Toeplitz matrix (ti−j )n i,j=1 generated by f , will be denoted by Tn (f ), that is, 0 B B B B B B B B B B B B B Tn (f ) = B B B B B B B B B B B B

t0 t−1 · · · t−m . t1 t0 . . .. . . . . . . . tm .. .. . . 0 .. . . . tm · · · . .. .

0 ··· .. .. . .

..

.

..

···

..

.

.

. t1 t0 t−1 · · · t−m . . .. .. .. .. . . . . ..

.. . 0

..

···

. ···

..

..

. . 0 tm · · ·

.

..

. t1

1

0 .. C . C C

C C C C C C C C C .. C . C C C C 0 C C t−m C C .. C . C C C t A −1

t0

We can also associate a sequence of circulant matrices {Cn (f )}∞ n=2m+1 to the function f in the following manner Cn (f ) := circ(t0 , t1 , . . . , tm , 0, . . . , 0, t−m , . . . , t−1 ) From [3], another equivalent definition of the circulant matrix Cn (f ) is Cn (f ) = Vn diag (f (0), f (2π/n) , . . . , f (2π(n − 1)/n)) Vn∗ (8) (n)

Consequently, from Theorem 1 we conclude that n 1X q lim uk,n = 0 n→∞ n k=1

where Vn = (vi,j )n i,j=1 is the n × n Fourier unitary matrix given by 2π(i−1)(j−1) 1 (n) i n vi,j = √ e− n

We are now in a position to pose as a theorem an interesting example of asymptotically equivalent sequences of matrices [3].

4

Theorem 2: If f is a trigonometric polynomial then

matrix, yields

Tn (f ) ∼ Cn (f ) A basic formula by Widom (see e.g. [15, p. 40]) yields the next result that characterizes the structure of a polynomial function of a banded Toeplitz matrix. Theorem 3: If f is a trigonometric polynomial of degree m and p(x) is a polynomial of degree q > 0, then p(Tn (f )) = Tn (p ◦ f ) + En 0

E (1)

0m(q−1)×n−2m(q−1)

0m(q−1)×m(q−1)

0m(q−1)×m(q−1)

0m(q−1)×n−2m(q−1)

E (2)

=

n n X X (n) (n) (n) (n) (g − p)(λk (Tn (f )))ui,k uj,k ≤ ǫ ui,k uj,k ≤ ǫ k=1

k=1

for every n ≥ max (i, j). Consequently, using the triangle inequality and applying Theorem 3, we have |g(Tn1 (f ))i,j − g(Tn2 (f ))i,j | ≤ |g(Tn1 (f ))i,j − p(Tn1 (f ))i,j |

∀n ≥ 2m(q − 1)

where En is an n × n matrix having the form

|g(Tn (f ))i,j − p(Tn (f ))i,j |

+|p(Tn1 (f ))i,j − p(Tn2 (f ))i,j |

+|p(Tn2 (f ))i,j − g(Tn2 (f ))i,j |

1

≤ ǫ + 0 + ǫ = 2ǫ

0n−2m(q−1)×m(q−1) 0n−2m(q−1)×n−2m(q−1) 0n−2m(q−1)×m(q−1)A

where the m(q − 1) × m(q − 1) matrices E (1) and E (2) are independent of n ≥ 2m(q − 1), and 0c×d denotes the c × d zero matrix. Thus, for large n, a polynomial function of a banded Toeplitz matrix is a banded matrix with larger bandwidth, and except for the upper left and lower right corners, is a Toeplitz matrix. Furthermore, the size and the entries of these corners do not depend on the order. E. Persymmetric matrices Toeplitz matrices belong to a larger class of matrices called persymmetric matrices. We finish this section with the definition of this kind of matrices [16]. Definition 6: We say that an n × n matrix A = (ai,j )n i,j=1 is persymmetric if it is symmetric about its northeast-southwest diagonal, i.e., if ai,j = an−j+1,n−i+1 ∀i, j = 1, . . . n. This is equivalent to requiring A = Jn A⊤ Jn , where Jn = (δi+j,n+1 )n i,j=1 and δi,j the Kronecker delta. Obviously, sums, scalar multiplications and powers of persymmetric matrices are also persymmetric matrices. Thus, polynomial functions of persymmetric matrices, and therefore, functions of persymmetric diagonalizable matrices, are also persymmetric. IV. M AIN RESULTS In this section we prove Theorem 5, the main result of this paper. It renders new results on the elementwise convergence of continuous functions of Hermitian banded Toeplitz matrices. In what follows, we will consider an arbitrary real trigonometric polynomial f of degree m. That is, f is the Fourier spectrum of some sequence {tk }∞ k=−∞ such that tk = t−k ∀k and tk = 0, |k| > m. Consequently, Tn (f ) and Cn (f ) will be Hermitian matrices. In [2] it was shown that the interval [min f, max f ] := [minω∈R f (ω), maxω∈R f (ω)] contains the spectra of the sequence {Tn (f )}. Observe that from expression (8), the eigenvalues of Cn (f ) also lie inside this interval, since they are given by f (2π(k − 1)/n), 1 ≤ k ≤ n. Given any g ∈ C[min f, max f ], the next theorem guarantees the convergence of the individual entries of the g(Tn (f )) matrix. Theorem 4: Let f be a real trigonometric polynomial. For all i, j ∈ N and g ∈ C[min f, max f ], the sequence {g(Tn (f ))i,j } converges when n → ∞. Proof: Let Un diag(λ1 (Tn (f )), . . . , λn (Tn (f )))Un∗ be an eigenvalue decomposition of Tn (f ). By the Stone-Weierstrass theorem, given ǫ > 0 there exists a polynomial p such that |g(x) − p(x)| < ǫ, ∀x ∈ [min f, max f ]. Therefore, using the Cauchy(n) Schwarz inequality and the fact that Un = (ur,s )n r,s=1 is a unitary

∀ n1 , n2 ≥ max (2m(q − 1), i + m(q − 1), j + m(q − 1)) with m and q being the degree of f and p, respectively. Thus {g(Tn (f ))i,j } is a Cauchy sequence and therefore it is convergent. Notice that by Theorem 3 if g is a polynomial then the sequence considered in the previous theorem will reach its limit after a finite number of terms. In the remainder of this section we assume that for all n there exists an eigenvalue decomposition of Tn (f ), Tn (f ) = Un diag(λ1 (Tn (f )), . . . , λn (Tn (f )))Un∗ , such that the eigenvector (n) matrices Un = (ui,j )n i,j=1 satisfy (σ(n)) 2

σ(n)|ui,j

| ≤ K,

∀n ∀i, j = 1, . . . , σ(n)

(9)

for some K ∈ R and for some strictly increasing function σ : N → N. For instance, in the trivial case where the Tn (f ) are diagonal, by taking Un = Vn , with Vn being the n × n Fourier matrix, the above bound holds for K = 1 and σ(n) = n. Notice however, that not all the eigenvalue decompositions of these Hermitian diagonal Toeplitz matrices satisfy the above condition. For example, by choosing Un = In (9) does not hold. A non-trivial example, where condition (9) is satisfied occurs for Hermitian tridiagonal Toeplitz matrices5 , i.e., f (ω) = t−1 e−ωi + t0 + t1 eωi with t0 ∈ R and t1 = t−1 . In this case, it is known [17] that Wn diag(α1,n , . . . , αn,n )Wn∗ is an eigenvalue decomposition of (n) Tn (f ) where the eigenvector matrix Wn = (wi,j )n i,j=1 is defined by √ n−i (−1)n+1 2 t−1 ijπ (n) wi,j = √ sin ∀i, j = 1, . . . , n |t−1 | n+1 n+1 and its eigenvalues are given by αj,n = t0 + 2|t−1 | cos

jπ n+1

∀j = 1, . . . , n

Therefore, (9) holds with K = 2 and σ(n) = n. It should be mentioned that we were not able to show (9) for more general sequences {Tn (f )}. However, based on computer simulations we believe that condition (9) holds for all sequences {Tn (f )} of Hermitian banded Toeplitz matrices. Since it is only a conjecture, the cited condition will need to be stated in each formal result in which it is required. We will denote by L2 [0, 2π] the Hilbert space of all complex valued functions defined on the closed interval [0, 2π] which are measurable and square integrable with respect to the Lebesgue measure, and with inner product defined by hh1 , h2 i :=

1 2π

Z

2π

h1 (ω)h2 (ω)dω 0

∀h1 , h2 ∈ L2 [0, 2π]

5 Similarly, for Hermitian banded Toeplitz matrices with at most three nonzero diagonals, i.e., f (ω) = t−m e−mωi + t0 + tm emωi with t0 ∈ R, m ≥ 1 and tm = t−m , we can consider K = 2m and σ(n) = mn.

5

The norm of this space will be denoted by k · k, that is, khk := hh, hi1/2 , ∀h ∈ L2 [0, 2π]. We are now ready to introduce the next theorem. It states that the limit of the sequence considered in the previous theorem can be expressed as an inner product of two functions in L2 [0, 2π]: g ◦ f and another function that does not depend on the chosen function g ∈ C[min f, max f ]. Theorem 5: Let f be a real trigonometric polynomial, satisfying condition (9). Given i, j ∈ N, there exists a unique function of minimum norm Si,j (f ) ∈ L2 [0, 2π], such that lim g(Tn (f ))i,j =

n→∞

1 2π

Z

g(f (ω))S i,j (f )(ω)dω

K0 X

K0 →∞

k=1

µk φk ◦ f

K0 X

Si,j (f ) = lim

K0 →∞

with 1 2π

k=1

Z

µk φk ◦ f

lim MMSEn (d) = lim g(Tn (f ))d,d

n→∞

lim lim g(Tn (f ))d,d = lim

d→∞ n→∞

n→∞

1 tr [g(Tn (f ))] n

The next theorem, formally establishes this relation in a more general context. Theorem 7: Let f be a real trigonometric polynomial satisfying condition (9). For fixed r ∈ Z it is verified that 1 2π

Z

2π

g(f (ω))erωi dω

(12)

0

for all g ∈ C[min f, max f ]. Proof: By Theorem 5, the above statement is equivalent to writing lim hg ◦ f, Si,i+r (f )i = hg ◦ f, e−rωi i

i→∞

∀g ∈ C[min f, max f ]

By the Stone-Weierstrass theorem, given an arbitrary g ∈ C[min f, max f ] and an ǫ > 0 there exists a polynomial p of degree q such that kg ◦ f − p ◦ f k < ǫ. Then by the triangle inequality we have

≤

|hg ◦ f, Si,i+r (f )i − hg ◦ f, e−rωi i|

|hg ◦ f, Si,i+r (f )i − hp ◦ f, Si,i+r (f )i| +|hp ◦ f, Si,i+r (f )i − hp ◦ f, e−rωi i|

(13)

+|hp ◦ f, e−rωi i − hg ◦ f, e−rωi i|

φk (f (ω))Si,j (f )(ω)dω 0

The theorem now follows by using expression (10). In next section, we will use Theorem 6 to obtain an approximation of the Si,j (f ) associated to the MMSE of the chosen examples. Before finishing this section we would like to relate the present results with the Szegö theorem for Toeplitz matrices [1], a theorem widely used in engineering applications. Using our notation this theorem for Hermitian banded Toeplitz matrices7 states that Z

n→∞

when d → ∞. In other words, for the particular g in expression (5) the following equality must hold

2π

1 1 tr [g(Tn (f ))] = n 2π

1 tr [g(Tn (f ))] n

where the sequence Tn (f ) is given by expression (2). On the other hand, according to expression (4) the above MMSELE can also be obtained as

i→∞ n→∞

µk = lim φk (Tn (f ))i,j n→∞ Proof: Let H the subspace defined in the proof of Theorem 5. By the Stone-Weierstrass theorem and the fact that uniform convergence implies convergence in L2 [0, 2π], the functions of the form p ◦ f , where p is a polynomial, are dense in H. Therefore, H is the closure of the subspace generated by the functions Lk ◦ f , with k ≥ 1, and by definition {φk ◦ f }∞ k=1 is an orthonormal base of the Hilbert space H. Since Si,j (f ) ∈ H

We first prove that the second term on the right side of the above inequality, equals zero whenever i ≥ i0 with i0 = max(m(q − 1) + 1, m(q − 1) + 1 − r) and m the degree of f . From Theorem 5 we have lim p(Tn (f ))i,i+r = hp ◦ f, Si,i+r (f )i

n→∞

On the other hand, by Theorem 3 if min(i, i + r) ≥ m(q − 1) + 1 then

2π

g(f (ω))dω

(11)

0

for all g ∈ C[min f, max f ]. 6 Notice

n→∞

lim lim g(Tn (f ))i,i+r =

with

n→∞

MMSELE = lim

(10)

0

Si,j (f ) = lim

lim

where the function g is given by expression (5) and f (ω) = |H(ω)|2 , with H(ω) the transfer function of the channel. Consequently, using expression (11),

2π

for all g ∈ C[min f, max f ]. Theorem 5 is an existence and uniqueness theorem. The proof of this theorem can be found in the appendix. We will call the infinite matrix S = (Si,j (f ))∞ i,j=1 the asymptotic invariant functional (AIF) matrix associated to the sequence of Toeplitz matrices {Tn (f )}. Observe that since the Tn (f ) are Hermitian matrices, S is also Hermitian. The following theorem gives a method to obtain an approximation of the entries of the AIF matrix. Theorem 6: Let f be a real trigonometric polynomial, satisfying condition (9), and {φk ◦f }∞ k=1 the orthonormal sequence of functions in L2 [0, 2π] obtained by applying the Gram-Schmidt procedure to 2(x−a) 6 the sequence {Lk ◦ f }∞ k=1 , where Lk (x) = Pk ( b−a − 1) and Pk (x) the Legendre polynomial of degree k − 1. Then, the function Si,j (f ), i, j ∈ N in Theorem 5 can be obtained as the following limit in L2 [0, 2π]

µk = hSi,j (f ), φk ◦ f i =

The motivation to look for these relations arises from the fact that for the channel described in Section II-B, the MMSE of a linear equalizer when the number of taps grows without limit can be written as [18] Z 2π 1 MMSELE = g(f (ω))dω 2π 0

p(Tn (f ))i,i+r

= =

{xk−1 }∞ k=1

that the sequence of powers can be used instead of {Lk (x)}∞ k=1 . 7 In the classical Szeg¨ o theorem f is a real bounded measurable function, not necessarily a real trigonometric polynomial.

Tn (p ◦ f )i,i+r Z 2π 1 p(f (ω))erωidω = hp ◦ f, e−rωi i 2π 0

for every n ≥ max(2m(q − 1), i + m(q − 1), i + r + m(q − 1)). Consequently, |hp ◦ f, Si,i+r (f )i − hp ◦ f, e−rωi i| = 0 with i ≥ i0 . For every i ≥ i0 , the remaining two terms on the right side of the

6

inequality (13) can be bounded by using the Schwarz inequality as −rωi

|hg ◦ f, Si,i+r (f )i − hg ◦ f, e

i|

≤

|hg ◦ f, Si,i+r (f )i − hp ◦ f, Si,i+r (f )i|

≤

(kSi,i+r (f )k + ke−rωi k) kg ◦ f − p ◦ f k

Sbi,j (f ) =

+|hp ◦ f, e−rωi i − hg ◦ f, e−rωi i|

≤

µk = lim φk (Tn (f ))i,j n→∞

lim lim g(Tn (f ))i,i = lim

i→∞ n→∞

n→∞

1 tr [g(Tn (f ))] n

for all g ∈ C[min f, max f ]. Finally, from the fact that the g(Tn (f )) are persymmetric matrices, all the results in this section also hold when their (i, j)-th entries are replaced by their (n − j + 1, n − i + 1)-th entries. Summarizing, for large n, a continuous function of a Hermitian banded Toeplitz matrix g(Tn (f )) is almost Toeplitz with diagonal entries given by (12) except for the upper left and lower right corners where their entries can be expressed as (10). This fact agrees with the results given recently in [19]. V. N UMERICAL E XAMPLE We now give a numerical example where the previous theory can be applied. We evaluate the asymptotic MMSE behavior of the linear estimator and equalizer considered in Section II, when the number of taps grows without bound. The impulse responses used for the MA filter (linear estimation application) or for the ISI channel (DFE equalization application) are the same and it is given by h = (h0 , . . . , h5 ) = (1, −0.23, 0.61, P −0.48, 0.9, 0.8)/K1 . The normalization constant K1 is such that 5k=0 h2k = 1. Consequently, the sequences of Toeplitz matrices Tn = Hn⊤ Hn arising in both cases are also the same, and they can P be written as Tn = Tn (f ), with f (ω) = |H(ω)|2 where H(ω) = 5k=0 hk ekωi is the transfer function of the MA filter or the channel, respectively. 4

4

4

3

3

3

2

2

2

1

0 0

0 0

4

6

Sb1,1 (f ) Fig. 3.

1 2π

Z

2π

g(f (ω))Sbi,i (f )(ω)dω

0

2 instead of expression (14), as the value of σnoise varies from 1 to −3 10 . The actual asymptotic MMSE values in expression (14) have been computed by their corresponding expressions (1) and (4) for n sufficiently large (n ≥ 300). Observe that from Theorem 7, an approximation for i = 150 may also be found by directly computing the expression Z 2π 1 g(f (ω))dω (15) 2π 0

which corresponds to the MMSE of the double-sided infinite-length linear estimator or the MMSE of the infinite linear equalizer [18]. A system theory interpretation of function S1,1 (f ) for the DFE equalization application is as follows. Let us consider g(f (ω)) =

2

4

6

0 0

2

4

6

Sb150,150 (f )

Approximation of Si,i (f ) for i = 1, i = 100 and i = 150.

From expressions (1) and (4) and Theorem 5, we have that Z

Observe that by Theorem 3 and the fact that φk is a real polynomial of degree k − 1, the sequence in the above equality will reach its limit µk after a finite number of terms. Furthermore, the µk are real, since the φk are real polynomials and the Tn are real matrices. Consequently, the functions Sbi,j (f ) and Si,j (f ) are also real. Figure 3 shows such approximation, Sbi,j (f ), for entries (i, i), i = 1, 100 and 150, of the main diagonal of the AIF matrix when K0 is set to 50. The function S150,150 (f ) is practically equal to the constant function 1, which by Theorem 7 means that entry (150, 150) has already reached the unique value taken by the entries (i, i) of the main diagonal of the AIF matrix for large i. Notice that entry (1, 1) is either related to the MMSE of an ideal linear predictor or the MMSE of an ideal DFE (i.e., assuming an infinite number of taps in one direction). On the other hand, entry (150, 150) is either related to the MMSE of a double-sided infinite-length linear estimator or the MMSE of an ideal linear equalizer (i.e., assuming an infinite number of taps in both directions). For i = 1, and 150, and the corresponding Sbi,i (f ) displayed in Figure 3, Table I shows the negligible relative error committed by using the approximation

1

Sb100,100 (f )

1 2π

µk φk ◦ f

where

(K + 1)ǫ

1

K0 X k=1

where we have used the fact that kSi,i+r (f )k ≤ K and kg ◦ f − p ◦ f k < ǫ. This finishes the proof. Notice that if we set r = 0 in Theorem 7, applying the Szegö theorem yields

2

matrix associated to the sequence of Toeplitz matrices {Tn (f )} can be approximated in L2 [0, 2π] as

2π

g(f (ω))S i,i (f )(ω)dω

(14)

0

with g defined in expression (3) or (5), gives the asymptotic value of the reciprocal of the MMSE of the i-step linear estimator in Figure 1, or the MMSE of an i − 1 delayed feedback filter of the DFE in Figure 2, respectively. Notice that the value of integral (14) will 2 depend on the variance of the noise σnoise , since g is a function of this parameter. A numerical approximation for Si,i (f ) can now be obtained from Theorem 6. This theorem states that the (i, j)-th entry of the AIF

2 σnoise + |H(ω)|2

2 σnoise

as a power spectrum density (PSD) of a stationary random process e(t), associated to the channel H(ω) with additive noise variance 2 σnoise . Then by expression (15), the MMSE of a LE can be thought to be the power at the output of a unit gain filter E(w), i.e., |E(ω)| = 1, when its input is e(t). Similarly, the MMSE of the DFE would again be the power at the output of the filter E(ω), but in this case with a transfer function gain |E(ω)|2 = S1,1 (f )(ω), which depends solely on the channel transfer function f (w) = |H(w)|2 (i.e., does 2 not depend on σnoise ). From this point of view, it is clear that the MMSE performance advantage of a DFE with respect to a LE is obtained by properly reducing the PSD of the input process e(t) by means of the filter |E(ω)|2 = S1,1 (f )(ω). Finally, as a by-product of this study and based on the fact that the asymptotic MMSE of a DFE can be alternatively computed by the well known expression [20]

MMSEDFE = exp

1 2π

Z

2π

ln g(f (ω))dω 0

7

TABLE I R ELATIVE ERRORS Linear estimator g(x) = 2 σnoise |dB

Linear predictor i=1 2.71 × 10−15 4.51 × 10−14 4.65 × 10−10 1.62 × 10−9

0 -10 -20 -30

exp

1 2π

Z

2π

ln g(f (ω))dω

=

0

with g(x) =

1 2π

Z

Double-sided estimator i = 150 2.50 × 10−16 1.00 × 10−16 4.75 × 10−12 2.94 × 10−11

2π

g(f (ω))S1,1 (f )(ω)dω 0

2 σnoise 2 σnoise +x

g(x) =

2 σnoise +x

the following equality holds for any α ∈ [0, ∞):

Equalizer

1

DFE i=1 2.09 × 10−15 4.31 × 10−14 4.65 × 10−10 1.62 × 10−9

LE i = 150 1.97 × 10−15 8.35 × 10−16 8.04 × 10−11 4.65 × 10−9

Step 1 We will prove that the sequence {gn } converges to g ◦ f in L2 [0, 2π] for every g ∈ C[min f, max f ]. Since kgn − g ◦ f k ≤ kgn − gˆn k + kˆ gn − g ◦ f k

α α+x

VI. C ONCLUSION In this paper we have studied the convergence of an arbitrary (i, j)th entry of a continuous function of a Hermitian banded Toeplitz matrix, g(Tn ), when the order n grows without bound. The main result of the paper states that these limits can be expressed as an inner product, in the L2 [0, 2π] space, between a function that depends only on g and another that depends only on the particular (i, j)-th position considered. The theory developed here is potentially of use in problems whose solutions are given by a particular entry of a certain matrix gα (Tn ), with gα being a continuous function that depends on a physical parameter α (e.g., in our applications α is the variance of an additive noise). Therefore, the obtained results provide a frequency domain insight on the influence of α in the asymptotic behavior of their solutions.

it suffices to show that the two terms on the right hand side of this inequality approach zero as n → ∞. We begin by showing that limn→∞ kgn − gˆn k = 0. n X g[λπn (k) (Tn (f ))] − g[λπn (k) (Cn (f ))] k=1 2 2π(k − 1) ×I∆ w − n

|gn (ω) − gˆn (ω)|2 =

=

n X g[λπ k=1

×I∆ w − Therefore: kgn − gˆn k2

=

2

n (k)

1 2π

Z

2π

0

(Tn (f ))] − g[λπn (k) (Cn (f ))]

2π(k − 1) n

|gn (ω) − gˆn (ω)|2 dω

A PPENDIX P ROOF OF T HEOREM 5

=

n 2 1 X g[λπn (k) (Tn (f ))] − g[λπn (k) (Cn (f ))] n k=1

Let {λk (Cn (f ))}n k=1 be the eigenvalues of Cn (f ) numbered in a nonincreasing order. From (8), there always exists a permutation πn such that λπn (k) (Cn (f )) are sorted in into the natural order defined by the trigonometric polynomial f , that is, λπn (k) (Cn (f )) = f (2π(k − 1)/n), k = 1, . . . , n. Based on this permutation, we define the following step functions in L2 [0, 2π]

=

n 1X |g[λk (Tn (f ))] − g[λk (Cn (f ))]|2 n

gn (ω) gˆn (ω) Sn(i,j) (ω)

= = =

n X

k=1 n X

g λπn (k) (Tn (f )) I∆

k=1 n X k=1

2π(k − 1) w− n

g λπn (k) (Cn (f )) I∆ w −

(n)

(n)

nui,πn (k) uj,πn (k) I∆ w −

2π(k − 1) n

2π(k − 1) n

where I∆ is the indicator function of the set ∆ = (0,

I∆ (ω) =

1 0

2π ], n

n X k=1

According to Theorem 2 and Corollary 1, the last series tends to zero as n goes to infinity. On the other hand, from the definition of the sequence of step functions {ˆ gn }, the continuity of f and g, and the fact that λπn (k) (Cn (f )) = f (2π(k − 1)/n), k = 1, . . . , n, it is easy to check that gˆn converges uniformly to g ◦ f in [0, 2π]. This in turn implies that gˆn → g ◦ f in L2 [0, 2π], as was to be proved. Step 2 Let H be the closure of the set {g ◦ f : g ∈ C[min f, max f ]} (i,j) in L2 [0, 2π]. In this step we will prove that the sequence {Sn } (i,j) contains a subsequence {Sτ (n) } such that there exists a unique function Si,j (f ) ∈ H satisfying (i,j)

lim hh, Sτ (n) i = hh, Si,j (f )i

that is,

∀ω ∈ (0, 2π ] n otherwise

Observe that with the above definitions, we have g(Tn (f ))i,j =

k=1

(n) (n)

g[λk (Tn (f ))]ui,k uj,k = hgn , Sn(i,j) i

For the sake of clarity, we split the proof into three steps:

n→∞

∀h ∈ H

Furthermore, kSi,j (f )k ≤ K where K is the constant considered in (9). Notice that according to (9), the subsequence of step functions (i,j) (i,j) {Sσ(n) } is bounded by K in L2 [0, 2π], i.e., kSσ(n) k ≤ K. Let (i,j) (i,j) Vσ(n) ∈ H be the orthogonal projection of Sσ(n) ∈ L2 [0, 2π] (i,j) into the closed subspace H. By the definition of Vσ(n) , we have (i,j)

(i,j)

hh, Vσ(n) i = hh, Sσ(n) i, ∀h ∈ H. In particular, we obtain

8

(i,j)

(i,j)

(i,j)

kVσ(n) k2 = hVσ(n) , Sσ(n) i and using the Cauchy-Schwarz inequal(i,j) ity we deduce that kVσ(n) k ≤ K. Since, H is a Hilbert space, (i,j) (i,j) the bounded sequence {Vσ(n) }, contains a subsequence {Vσ2 (σ(n)) }, that converges weakly to a unique Si,j (f ) ∈ H (see e.g. [21]). Consequently, by taking τ = σ2 ◦ σ we conclude that (i,j)

(i,j)

lim hh, Sτ (n) i = lim hh, Vτ (n) i = hh, Si,j (f )i

∀h ∈ H (16) On the other hand, from the Cauchy-Schwarz inequality it follows that n→∞

n→∞

(i,j)

(i,j)

|hSi,j (f ), Sτ (n) i| ≤ kSi,j (f )kkSτ (n) k ≤ KkSi,j (f )k Applying (16) we deduce that kSi,j (f )k2 = |hSi,j (f ), Si,j (f )i| ≤ KkSi,j (f )k and therefore, kSi,j (f )k ≤ K as was to be proved. Step 3 We are now ready to show the assertion of the theorem. By (i,j) Theorem 4, the sequence {g(Tn (f ))i,j } = {hgn , Sn i} is convergent. Consequently, its limit must be equal to the limit of any of (i,j) its subsequences, in particular limn→∞ hgτ (n) , Sτ (n) i. On the other hand, by the assertion proved in the second step, follows that (i,j)

lim hg ◦ f, Sτ (n) i

n→∞

=

hg ◦ f, Si,j (f )i

=

1 2π

Z

2π

g(f (ω))S i,j (f )(ω)dω 0

for all g ∈ C[min f, max f ]. Consequently, to finish the proof it is sufficient to show that (i,j)

(i,j)

lim hgτ (n) , Sτ (n) i = lim hg ◦ f, Sτ (n) i

n→∞

n→∞

(i,j)

We can write hgτ (n) , Sτ (n) i as (i,j)

(i,j)

(i,j)

hgτ (n) , Sτ (n) i = hg ◦ f, Sτ (n) i + hgτ (n) − g ◦ f, Sτ (n) i and by the Cauchy-Schwarz inequality yields 0

[4] J. Pearl, “On coding and filtering stationary signals by discrete Fourier transforms,” IEEE Trans. Inform. Theory, vol. 19, pp. 229–232, Mar. 1973. [5] ——, “Basis-restricted transformations and performance measures for spectral representation,” IEEE Trans. Inform. Theory, vol. 17, pp. 751– 752, Nov. 1971. [6] P. J. Sherman, “Circulant approximations of the inverses of Toeplitz matrices and related quantities with applications to stationary random processes,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 33, pp. 1630–1632, Dec. 1985. [7] P. M. Crespo and J. Gutiérrez-Gutiérrez, “On the application of asymptotically equivalent sequences of matrices to the derivation of the ¨ continuous time Gaussian channel capacity,” AEU-International Journal of Electronics and Communications, vol. 60, no. 8, pp. 573–581, 2006. [8] S. Haykin, Adaptive Filter Theory. Prentice-Hall, 1986. [9] B. S. Tsybakov, “Capacity of a discrete-time Gaussian channel with a filter,” Probl. Inform. Transm., vol. 6, pp. 253–256, July-Sept. 1970. [10] W. Hirt and J. L. Massey, “Capacity of the discrete-time Gaussian channel with intersymbol interference,” IEEE Trans. Inform. Theory, vol. 34, pp. 380–388, May 1988. [11] F. R. Gantmacher, The Theory of Matrices. New York: Chelsea Publishing, 1977, vol. 1. [12] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis. New York: Cambridge University Press, 1994. [13] ——, Matrix Analysis. New York: Cambridge University Press, 1990. [14] W. F. Trench, “Absolute equal distribution of the spectra of Hermitian matrices,” Linear Algebra Appl., vol. 366, pp. 417–431, 2003. [15] A. Böttcher and B. Silbermann, Introduction to Large Truncated Toeplitz Matrices. New York: Springer, 1999. [16] G. H. Golub and C. F. Van Loan, Matrix Computation. The John Hopkins University Press, 1987. [17] W. F. Trench, “On the eigenvalue problem for Toeplitz band matrices,” Linear Algebra Appl., vol. 64, p. 199–214, 1985. [18] S. U. H. Qureshi, “Adaptive equalization,” Proceedings of IEEE, vol. 73, no. 9, pp. 1349–1387, 1985. [19] A. Böttcher, J. Gutiérrez-Gutiérrez, and P. M. Crespo, “Mass concentration in quasicommutators of Toeplitz matrices,” J. Comput. Appl. Math., to appear. [20] J. Salz, “On mean-square decision feedback equalization and timing phase,” IEEE Trans. Commun., vol. 25, no. 12, pp. 1471–1476, December 1977. [21] E. Zeidler, Applied Functional Analysis: Main Principles and Their Applications, ser. Applied Mathematical Sciences. Springer-Verlag, 1995, vol. 109.

(i,j)

≤

|hgτ (n) − g ◦ f, Sτ (n) i|

≤

kgτ (n) − g ◦ f kkSτ (n) k ≤ Kkgτ (n) − g ◦ f k

(i,j)

From the assertion proved in the first step, it follows that kgτ (n) − g ◦ f k → 0. This completes the existence proof of the theorem. To show the uniqueness, observe that the Si,j (f ) in the above proof is the unique function in H satisfying (10). This is a consequence of the uniqueness of the weak limit considered in the second step. Therefore, Si,j (f ) is the orthogonal projection into the closed subspace H of any other function in L2 [0, 2π] satisfying (10). Consequently, Si,j (f ) is the unique function in L2 [0, 2π] with minimum norm for which expression (10) holds. ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers for their helpful comments and criticism. R EFERENCES [1] U. Grenander and G. Szegö, Toeplitz Forms and Their Applications. Berkeley and Los Angeles: University of California Press, 1958. [2] R. M. Gray, “Toeplitz and circulant matrices: A review,” Foundations and Trends in Communications and Information Theory, vol. 2, no. 3, pp. 155–239, 2006, [Original version: Tech. Report 6502-1, Stanford Electronic Laboratory, June 1971]. [3] ——, “On the asymptotic eigenvalue distribution of Toeplitz matrices,” IEEE Trans. Inform. Theory, vol. 18, no. 6, pp. 725–730, Nov. 1972.

Pedro M. Crespo (S’80-M’84-SM’91) was born in Barcelona, Spain. In 1978, he received the engineering degree in Telecommunications from Universidad Politécnica de Barcelona. He received the M.Sc. PLACE in Applied Mathematics and Ph.D. in Electrical PHOTO Engineering from the University of Southern CalHERE ifornia (USC), in 1983 and 1984, respectively. From September 1984 to April 1991, he was a member of the technical staff in the Signal Processing Research group at Bell Communications Research, New Jersey, USA, where he worked in the areas of digital communication and signal processing. He actively contributed in the definition and development of the first prototypes of xDSL (Digital Subscriber Lines transceivers). From May 1991 to August 1999 he was a district manager at Telefónica Investigación y Desarrollo, Madrid, Spain. From 1999 to 2002 he was the technical director of the Spanish telecommunication operator Jazztel. At present he is the head of the Electronics and Communications Department at the R&D center CEIT, San Sebastián, Spain. He is also a full professor at the Engineering School of the University of Navarra (Tecnun). Dr. Crespo is a Recipient of the Bell Communication Research Award of excellence. He holds seven patents in the areas of digital subscriber lines and wireless communications. His research interests include the general areas of digital communications, signal processing and information theory.

9

Jesus ´ Gutiérrez-Gutiérrez was born in Granada, Spain. He received his degree in Mathematics from the University of Granada, Spain, in 1999 and his Ph.D. degree in Electronics and Communications PLACE from the University of Navarra, Spain, in 2004. He is PHOTO currently with the Electronics and Communications HERE Department at the R&D center CEIT, San Sebastián, Spain. He is also an associate professor at the Engineering School of the University of Navarra (Tecnun). His current research interests include matrix analysis, random matrix theory, spectral theory and orthogonal polynomials applied to problems in communications.