Efficient implementation of the Volterra filter - Semantic Scholar

Ef®cient implementation of the Volterra ®lter M.J.Reed and M.O.J.Hawksford Abstract: An ef®cient implementation of the Volterra ®lter is presented which uses a frequency domain representation to reduce the number of computations. The multidimensional convolution of the Volterra ®lter is transformed to the frequency domain giving a transformed input matrix which is sparse and obtained directly from a one-dimensional Fourier transform. In addition to the sparse nature of the transformed input matrix, symmetries in both the Volterra ®lter and the frequency domain representation are exploited to increase the ef®ciency of the algorithm. The computational saving is demonstrated by comparing it with the direct implementation of the time domain representation and another technique which uses a frequency domain representation but does not utilise symmetry.

1

Introduction

The Volterra series represents a nonlinear system as a set of multidimensional convolutions [1] and is used to model a range of nonlinear systems such as the auditory system [2], mechanical systems [3, 4] and electromechanical loudspeakers [5]. Possible applications of such a Volterra series model include system prediction due to arbitrary excitation and nonlinear correction [1]. For either of the mentioned applications it is required to implement the Volterra model of the nonlinear system and this is performed using a signal processing algorithm termed a Volterra ®lter. However, the Volterra ®lter requires the implementation of computationally intensive multidimensional convolutions that make it unsuitable for applications requiring real-time performance. This paper describes a new technique for the implementation of the Volterra ®lter that improves implementation ef®ciency compared with existing techniques. One target application for this work is towards real-time implementation of nonlinear correction for loudspeaker transducers [6]. However, the technique can be generally applied to improve the ef®ciency of any Volterra ®lter implementation. Sandberg has shown that a good approximation to a wide range of nonlinear systems is given by a Volterra ®lter which is a doubly ®nite and discrete implementation of the Volterra series that is expressed as [7] yn

N M ÿ1 X X r1 i1 0

...

M ÿ1 X ir 0

1

where u is an arbitrary input signal to the model, y the output, hr is the rth Volterra kernel, N the maximum order

# IEE, 2000 IEE Proceedings online no. 20000183 DOI: 10.1049/ip-vis:20000183 Paper ®rst received 9th July and in revised form 25th November 1999 The authors are with the Department of Electronic Systems Engineering, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK IEE Proc.-Vis. Image Signal Process., Vol. 147, No. 2, April 2000

2 Expressing the Volterra ®lter in vector notation De®ne a column vector hr which represents all permutations of points in the kernel hr from eqn. 1

hr i1 ; i2 ; . . . ; ir

un ÿ i1 un ÿ i2 . . . un ÿ ir

of nonlinearity and M is the memory of the ®lter. The only general condition placed upon the input signal u is that the model is usually only valid over a ®nite input amplitude [7], for example in many systems the model is only valid at amplitudes below the occurrence of clipping. The implementation of eqn. 1 in the direct form for models of real systems is computationally burdensome as the number of points in the kernel spaces is usually large [8]. It is possible to improve the ef®ciency of the Volterra ®lter by approximating part of the kernel spaces using a mirror ®lter [9] constructed from linear ®lters combined with multipliers. However, then the model is only valid for a limited set of signals that excite the kernel spaces for which the approximation is valid. Consequently, for accurate system prediction or correction with unconstrained input signals the whole kernel space rather than partial approximations to it are required [6]. This paper presents an ef®cient frequency domain implementation of the Volterra ®lter incorporating the whole kernel space. Frequency domain techniques already exist [10, 11] but, unlike the method presented, they do not make use of symmetries in the ®lter to improve ef®ciency and do not extend the result to a general order of nonlinearity.

2

hr 0; 0; . . . ; 0

3

6 7 hr 0; 0; . . . 1 6 7 6 7 . 6 7 . hr 6 . 7 6 7 4 hr M ÿ 1; M ÿ 1; . . . ; M ÿ 2 5

2

hr M ÿ 1; M ÿ 1; . . . ; M ÿ 1 The format of eqn. 2 represents the multidimensional points of a Volterra kernel (a multidimensional impulse response function) as a single vector by representing the 109

points of the kernel in natural order. For example, a second-order kernel 2 3 h2 0; 0 h2 0; 1 h2 0; 2 6 7 4 h2 1; 0 h2 1; 1 h2 1; 2 5 h2 2; 0

h2 2; 1

h2 2; 2

De®ne a circulant input signal matrix C of the form 2 un un ÿ M ÿ 1 6 6 un ÿ 1 un 6 C6 .. 6 4 . un ÿ M ÿ 1

is represented as

un ÿ M ÿ 2

un ÿ M ÿ 2

h2 h2 0; 0; h2 0; 1; h2 0; 2; h2 1; 0;

un ÿ M ÿ 1

h2 1; 1; h2 1; 2; h2 2; 0; h2 2; 1; h2 2; 2T De®ne a row vector un representing the input u

un ÿ M ÿ 3

un un; un ÿ 1; un ÿ 2; . . . ; un ÿ M 1 All the rth-order products of the input from u(n) to u(n 7 M 1) may be expressed as the rth-order Kronecker product (denoted as the superscript r ) of un with itself

1 fik p oikM M

r times

where A is of dimension m n and ai;j is an element from A. Thus, eqn. 1 can be represented in vector notation as yn

N X r1

ur n hr

3

De®ne a vector y [y(n), y(n 7 1), y(n 7 2), . . . , y(n 7 (M 7 1))]T and a corresponding partitioned input matrix Ufrg 3 2 ur n 6 ÿÿ 7 7 6 7 6 ur nÿ1 7 6 6 ÿÿ 7 frg U 6 7 .. 7 6 7 6 . 7 6 4 ÿÿ 5 r unÿM ÿ1 where the symbol frg signi®es performing the rth-order Kronecker product of each row of U with itself. The resulting matrix Ufrg is of dimension M Mr . The output of a Volterra ®lter over M samples can now be given by y

N X r1

U frg hr

3 Implementation of the Volterra ®lter using Fourier transform relationships We will now show how Fourier transform relationships often applied to linear convolution can be extended to the multidimensional convolution of the Volterra ®lter giving rise to a sparse matrix operation. 110

7 . . . un ÿ 2 7 7 7 .. 7 5 . ...

un

i; k 0; 1; . . . ; M ÿ 1

where oM e72pj=M . De®ne the inverse Fourier transform as FH, the conjugate transpose of F. To make use of the orthonormality of the M M transform FH F I with the larger dimensional Volterra ®lter we make use of the corollary from the following theorem. Note that while the authors do not claim that the following theorem and corollary are novel, they are not aware of related published work. Theorem 1: Ar Br (AB)r Proof: ÿ ÿ Ar Br Arÿ1 A Brÿ1 B

5

By application of the mixed product rule [12] ÿ s ÿ ÿ A A Bs B As Bs AB

6

By repeated application of eqn. 6 on the right side of eqn. 5 and eqn. 5 on the ®rst term on the right side of eqn. 6 for s r 7 1, . . . , 2 the following is obtained Ar Br AB . . . AB ABr From theorem 1 the following corollary is obtained: Corollary 1: If A is orthonormal then Ar is also orthonormal. Corollary 1 follows from theorem 1 as if A is orthonormal AAH IM where IM is the M M identity matrix and by applying theorem 1 Ar AH r AAH r I r M

4

Thus, for a block of M samples, each order r of the direct form of the Volterra ®lter requires rM r1 multiplications.

3

De®ne the Fourier transform as the matrix F of dimension M M with the element fik of the ith row and kth column given by

un un . . . un ur n |{z} where is the Kronecker product between two vectors or matrices which is de®ned as [12, 13] 2 3 a0;1 B a0;nÿ1 B a0;0 B .. 6 7 6 a1;0 B 7 . 6 7 A B6 7 . . .. .. 4 5 amÿ1;0 B amÿ1;1 B anÿ1;mÿ1 B

. . . un ÿ 1

I Mr Ir M

As is the Mr Mr identity matrix then Ar is orthonormal. As the de®ned Fourier transform relationship is orthonormal then, from corollary 1, eqn. 4 can be expressed as y

N X r1

F H FC frg F H r Fr hr

7

where the input matrix U is replaced by the circulant matrix C. IEE Proc.-Vis. Image Signal Process., Vol. 147, No. 2, April 2000

To refer to the elements in the matrices or vectors which arise from the Kronecker product we de®ne the following notation. The ith row in a matrix B is de®ned as the row vector bi , the kth column as a column vector bk and the ith element in the kth column as bik . If the row vector a b b, where b is of dimension M, then de®ne the vector a to be partitioned as a b0 b0 ; b0 b1 ; . . . ; b0 bM ÿ111 b1 b0 ; b1 b1 ; . . . ; b1 bM ÿ111 . . . so that the k2 th element in the k1 th partition is given by bk 1 bk 2 . If a b b . . . b br then the vector a is recursively partitioned r times with each partition being partitioned M times. Now the kth element in a is labelled by de®ning the ordered set K hk1 . . . kr i hk : 0 k < Mi such that an element in a is given as aK ahk1 ...kr i bk1 bk2 . . . bkr

8

This notation is very useful for the Volterra ®lter as for example the point in the rth Volterra kernel hr (k1 . . . kr ) is given by the element hK in the vector hr de®ned in eqn. 2. De®ne the Fourier transform of the input signal as the column vector Y 2 3 y0 6 7 9 Y Fc0 4 ... 5 yM ÿ1

Consider the operation FC

frg

H r

(F ) frg

given in eqn. 7. Let

H r

Ar FC F

Hence, applying the notation of eqn. 8 to eqn. 18 an element biK of B is given by k i

k

The ®rst row of B is given by H r b0 cr 0 F

12

i

19 20

where ksum (k1 k2 . . . kr )M using modulus M addition. Thus, the Kth column of B is given by 2 3 k 0 wMsum 6 7 6 ksum 1 7 6 wM 7 7 bK yk1 . . . ykr 6 6 7 .. 6 7 . 4 5 ksum M ÿ1 wM p 21 M yk 1 . . . yk r f H ksum H where f H k sum is a column from F . From the de®nitions given in eqns. 10 and 11

Ar FB

22

so that using the set notation for the column number the element aiK in Ar is given by aiK fi bK

23

Substituting eqn. 21 into the above gives p aiK M yk1 . . . ykr fi fkHsum As F is orthonormal

10

11

ki

yk1 yk2 . . . ykr wMsum

(

fi fjH

and B C frg F H r

k i

biK yk1 wM1 yk2 wM2 . . . ykr wMr

fH j

1

if

ij

0

if

i 6 j

24

H

is the jth column of F . Therefore the elements where of the matrix Ar are given by ( p M yk1 yk2 . . . ykr if ksum i 25 aiK 0 if ksum 6 i The equation for the Volterra ®lter is now expressed as

and by applying theorem 1 b0 c0 FH r

13

cT0

is the time reversed form of the input signal c0 , As which we de®ne as always real, then by the Fourier transform theorem for reversed time signals [14] Y Fc0 c0 FH T

14

and together with eqn. 13 the following is obtained b0 YT r

15

The Fourier transform theorem for time shifted signals states that for a signal x(n) with a Fourier transform X (ejw ) [14] F

xn ÿ i$ejwi X ejw

16

Thus, as the ith row of C is a time shifted version of c0 by i places forward, eqn. 16 applied to eqn. 14 gives M ÿ1i 1i 2i ci F H y0 w0i M ; y1 wM ; y2 wM ; . . . ; yM ÿ1 wM

17

where wÅ is the conjugate of w. Substituting eqn. 17 into eqn. 13 gives M ÿ1i r 1i 2i bi y0 w0i M ; y1 wM ; y2 wM ; . . . ; yM ÿ1 wM

IEE Proc.-Vis. Image Signal Process., Vol. 147, No. 2, April 2000

18

y

N X r0

FH Ar F r hr

26

where Fr hr is the r-dimensional Fourier transform of hr . The matrix Ar is of dimension M Mr and is partitioned, by columns, into Mr71 parts. From eqn. 25 the matrix Ar is sparse as each column is non-zero in only one element. Additionally, note that each element in Ar is formed from r 7 1 multiplications from points in the one-dimensional Fourier transform of the input signal. Thus, for each order r in the implementation of eqn. 26 there are two onedimensional Fourier transforms of order M required and rMr complex multiplications. This can be compared to rMr1 multiplications required for the direct form of the Volterra series given in eqn. 4 so that, assuming the Fourier transforms require comparably little computation, the implementation of eqn. 26 will require less computation than the direct form of eqn. 4. 4

Reduction due to symmetries

There are two symmetries in eqn. 26 which can be used to reduce the complexity of the implementation as illustrated in Fig. 1 for an example of the p matrix A2 for M 4. To simplify the diagram the product 4 yi yk is replaced by the corresponding subscript numbers. The ®rst symmetry used 111

Fig. 1 Example of matrix A2 for M 2 showing only the non-zero points

Fig. 2 Block structure of the overlap and save technique

is that of each Volterra kernel which can always be represented as a symmetrical space [15]. This symmetry can be expressed using the set notation for a point in the Volterra kernel vector hr or a corresponding row in A represented by the set K hk1 , . . . , kr i. Any points in hr or columns in A which are represented by a set which contains the same elements, in any order, are points in symmetry. An example of the symmetry in a Volterra kernel is that it is always possible to make h2 (i, j) h2 (j, i) without loss of generality. The symmetrical points in a kernel can be summed and the repetitions removed [16], the corresponding columns in Ar can then also be removed. This changes the dimension of hr from Mr to M rÿ1 r

tional points for time samples greater than M 7 1 to zero. The input matrix size must also be doubled by including the previous M samples and these 2M samples are the elements of the column c0 in eqn. 7. By doubling the memory of the operation of eqn. 7 the effect of wrapping round the input signal for the last M samples is removed. Thus, although the ®rst M samples of the block are not a valid output for the continuous time signal the last M samples are. By overlapping blocks by M samples and `saving' only the last M samples the output signal is generated. The process is shown diagrammatically in Fig. 2. For linear systems the overlap and save technique has an alternative method termed òverlap and add' which uses the superposition property of linear systems. As the superposition property does not apply to nonlinear systems the overlap-and-add technique is not applicable to the technique in this paper. The implementation of the overlap-and-save technique increases the NOP for a block of size M (M a power of two) to N X 2M r ÿ 1 2r 28 NOP 8M log2 2M r r1

which gives a signi®cant reduction for r > 1. A second reduction in the number of points can be achieved due to the fact that the Fourier transform of a real sequence is conjugate-symmetric [14]. This implies that (for even M) only M/2 1 of the rows of Ar are required to construct the output y giving approximately a saving of one half for large M. The implementation of the rth-order stage of the ef®cient Volterra ®lter in eqn. 26 is carried out using two `real-valued' one-dimensional Fourier transforms and an additional r M rÿ1 r 2 complex multiplications. If the degree of the ®lter M is a power of two then the aforementioned `real-valued' onedimensional Fourier transforms can each be performed using a fast Fourier transform (FFT) in 2M log2 M real multiplications [17]. Thus, the number of operations (NOP) for a block of M samples using eqn. 26, with M a power of two and taking into account symmetries, is N X M rÿ1 2r 27 NOP 4M log2 M r r1 The above assumes that a complex multiplication is performed using four real multiplications and a multiply accumulate is carried out in one operation. 5 Implementation using overlap and save techniques The ef®cient implementation of the Volterra ®lter can be extended from a circulant input matrix to a generalised input signal by the overlap-and-save technique [14, 18]. First the memory of the ®lter, originally M, is doubled maintaining all the original points and setting any addi112

6 Comparison of direct and frequency domain methods The frequency domain technique developed in this paper will now be compared to the direct form of the Volterra ®lter given in eqn. 1 and another frequency domain technique developed by Im and Powers [11]. As all three methods are mathematically equivalent the comparison is only made on the basis of computational complexity. To compare the direct form and the frequency domain methods a second- and third-order ®lter section will be considered separately; the linear case, which is a ®rst-order Volterra system, has already been widely reported and investigated [14]. The direct form of an rth-order Volterra ®lter which makes use of the kernel symmetry requires r M M rÿ1 r operations for an input sample block of M samples. For the frequency domain techniques it is assumed that the multidimensional Fourier transform of the kernel spaces required for eqn. 26 has been performed as this can be computed in advance of the ®ltering operation. Additionally, for the frequency domain technique, M is set to the next largest power of two thus enabling the use of FFT techniques using block processing. The number of operations for the direct form and the frequency domain technique per input sample are given in Table 1 for a secondand third-order Volterra ®lter. The Table also gives the IEE Proc.-Vis. Image Signal Process., Vol. 147, No. 2, April 2000

Table 1: Number of operations per output sample for Mlength second- and third-order sections Filter memory Number of operations per sample M

Direct

Frequency domain Frequency domain no use of using symmetry symmetry [11]

Second-order section 2

6

68

32

4

20

131

47

8

72

258

78

16

272

513

141

32

1056

1024

268

64

4160

2048

524

128

16512

4096

1036

256

65792

8192

2060

12

388

106

4

60

1539

249

8

360

6146

728

16

2448

24577

2455

32

17952

98304

8982

64

137280

393216

Third-order section 2

34326

128

1.07 106 1.57 106

134166

256

8.48 106 6.29 106

530454

Fig. 3 Number of operations per sample for the direct and frequency

number of operations for a method by Im and Powers [11] which uses the frequency domain but has not utilised symmetries. Note that the use of symmetries gives a signi®cant reduction in the number of operations: for the third-order case with M 16 it gives a factor of ten or more improvement over the other frequency domain method. To test the ef®ciency of the technique the direct form and the frequency domain form using symmetry reductions were implemented on a workstation in C running under a UNIX environment. The processor time for each method was estimated using the time() function and the result was scaled to give the equivalent number of operations per sample. The results for a second- and third-order ®lter section are shown in Figs 3a and 3b, respectively thus demonstrating that the theoretical computational requirement accurately predicts the operation count of the practical implementation. Figs. 3a and 3b show the effect of imposing a block structure for the frequency domain techniques which utilise FFTs. The consequence of the block structure is that the NOP for the frequency domain techniques only changes as M equals a power of two whereas the direct form increases with each increase in M.

domain forms of second- and third-order sections

ÐÐ direct calculated frequency calculated ± ± ± direct measured ± ± frequency measured a Second-order sections b Third-order sections

input signal so that multidimensional transforms of the input are not required. The kernels of the Volterra ®lter and the Fourier transform of a real-time signal exhibit symmetries which can be exploited to further reduce the computational complexity of the technique. To demonstrate the ef®ciency of the technique it is compared with the direct time domain form and another frequency domain technique which does not exploit the symmetry properties. The comparison shows that using the symmetry properties gives approximately fourfold improvement for the second-order ®lter and tenfold for the third-order ®lter over existing frequency domain techniques for ®lter memories of 16 samples or above. Applications for this ef®cient ®lter include simulation of a nonlinear system and real-time nonlinear correction. 8

7

Conclusions

An ef®cient algorithm for performing the Volterra ®lter using a frequency domain representation is presented. The method is developed from utilising the Kronecker product of matrices to manipulate the multidimensional Volterra ®lter equation. By transforming the input matrix of the Volterra ®lter to the frequency domain a sparse matrix is obtained. The transformed input matrix can be obtained solely from the one-dimensional Fourier transform of the IEE Proc.-Vis. Image Signal Process., Vol. 147, No. 2, April 2000

1 2 3 4 5

References SCHETZEN, M.: `The Volterra and Wiener theories of nonlinear systems' (Wiley, New York, 1980) SHI, Y., and HECOX, K.E.: `Nonlinear system identi®cation by m-pulse sequences: Application to brainstem auditory evoked responses', IEEE Trans. Biomed. Eng., 1991, 38, (9), pp. 834±845 PARKER, G.A., and MOORE, E.L.: `Practical nonlinear system identi®cation using a modi®ed Volterra series approach', Automatica, 1982, 18, (1), pp. 85±91 WINTERSTEIN, S.R., UDE, T.C., and MARTHINSEN, T.: `Volterra model of ocean structures ±extreme and fatigue reliability', J. Eng. Mech., ASCE, 1994, 120, (6), pp. 1369±1385 KAIZER, A.J.M.: `Modeling of the nonlinear response of an electrodynamic loudspeaker by a Volterra series expansion', J. Audio Eng. Soc., 1987, 35, (6), pp. 421±432 113

6

REED, M.J., and HAWKSFORD, M.O.J.: `Nonlinear error correction of horn transducers using a Volterra ®lter', J. Audio Eng. Soc., 1997, 45, (5), pp. 414, (abstract only), presented at the 102nd Convention of the Audio Eng. Soc., Munich March 1997, preprint 4468 7 SANDBERG, I.W.: Ùniform approximation with doubly ®nite Volterra series', IEEE Trans. Signal Process, 1992, 40, (6), pp. 1438±1441 8 REED, M.J., and HAWKSFORD, M.O.: `Practical modelling of nonlinear audio systems using the Volterra series', J. Audio Eng. Soc., 1996, 44, (7/8), pp. 649±650 (abstract only), presented at the 100th Convention of the Audio Eng. Soc., Copenhagen May 1996, preprint 4264 9 KLIPPEL, W.: `The mirror ®lter ±a new basis for reducing nonlinear distortion and equalizing response in woofer systems', J. Audio Eng. Soc., 1992, 40, (9), pp. 675±691 10 MORHAC, M.: À fast algorithm of nonlinear Volterra ®ltering', IEEE Trans. Signal Process., 1991, 39, (10), pp. 2353±2356 11 IM, S., and POWERS, E.J.: À fast method of discrete third-order

114

12 13 14 15 16 17 18

Volterra ®ltering', IEEE Trans. Signal Process., 1996, 44, (9), pp. 2195±2208 BREWER, J.W.: `Kronecker products and matrix calculus in system theory', IEEE Trans. Circuits Syst., 1978, CAS-25, (9), pp. 772±781 STEEB, W.-H.: `Kronecker product of matrices and applications' (BI, Wissenschaftsverlag, 1991) OPPENHEIM, A.V., and SCHAFER, R.W.: `Discrete-time signal processing' (Prentice Hall, 1989) EYKHOFF, P.: `System identi®cation' (Wiley, 1974) REED, M.J., and HAWKSFORD, M.O.J.: Ìdenti®cation of discrete Volterra series using maximum length sequences', IEE Proc., Circuits Devices Syst., 1996, 143, (5), pp. 241±248 SORENSEN, H.V., JONES, D.L., HEIDEMAN, M.T., and BURRUS, C.S.: `Real-valued fast Fourier transform algorithms', IEEE Trans. Acoust. Speech Signal Process, 1987, ASSP-35, (6), pp. 849±863 IFEACHOR, E.C., and JERVIS, B.W.: `Digital signal processing, a practical approach' (Addison-Wesley, 1993)

IEE Proc.-Vis. Image Signal Process., Vol. 147, No. 2, April 2000