adjoint lms - Semantic Scholar

8 downloads 0 Views 167KB Size Report
While adjoint-LMS is still a stochastic gradient descent algorithm, it is not based on the instantaneous gradi- ent. Instead, consider the chain rule expansion. @J.
ADJOINT LMS: AN EFFICIENT ALTERNATIVE TO THE FILTERED-X LMS AND MULTIPLE ERROR LMS ALGORITHMS Eric A. Wan Oregon Graduate Institute of Science & Technology Department of Electrical Engineering and Applied Physics P.O. Box 91000, Portland, OR 97291

1. INTRODUCTION The Filtered-x LMS algorithm is currently the most popular method for adapting a lter when there exists a transfer function in the error path. Such instances arise, for example, in active control of sound and vibration. For multiple-input-multiple-output systems the Multiple Error LMS Algorithm is a generalization of Filtered-x LMS. The derivation of both algorithms rely on several assumptions, including linearity of the adaptive lter and error channel. Furthermore, in the Multiple Error LMS Algorithm the desirable order N computational complexity of LMS is lost, resulting in a prohibitive cost in certain DSP implementations. In this paper, we describe a new algorithm termed adjoint LMS which provides a simple alternative to the previously mentioned algorithms. In adjoint LMS, the error (rather than the input) is ltered through an adjoint lter of the error channel. Performance regarding convergence and misadjustment are equivalent. However, linearity is not assumed in the derivation of the algorithm. Furthermore, equations for single-inputsingle-output (SISO) and multiple-input-multiple-output (MIMO) are identical and both remain order N .

2. ALGORITHM SPECIFICATIONS An adaptive lter is speci ed as y(k) =

M1 X

w (k)x(k ? n) = w (k)x(k) T

n

n=0

(1)

where k is the time index, y is the lter output, x the lter input, and w the lter coecients. The vectors w(k) = [w0 (k); w1 (k);    w 1 (k)] and x(k) = [x(k); x(k ? 1);    x(k ? M 1)] provide for compact n

M

T

T

This work was supported in part by NSF under grant ECS9410823.

notation. It is also often convenient to write the lter operation by y(k) = W (q?1 ; k)x(k) where W (q?1 ; k) = P 1 ?1 with q? representing a time delay op=0 w (k )q ? erator (i.e., q x(k) = x(k ? n)). M

n

n

n

n

(a) x(k)

w

^ -1 C(q )

y(k)

d(k) C(q-1) e(k)

~ x(k) (b)

x(k)

w

y(k)

~ e(k)

d(k) C(q-1) e(k)

^ +1 C(q )

Figure 1: (a) Filtered-x LMS, (b) Adjoint LMS The standard ltered x-LMS is illustrated in Figure 1a where there exists a physical channel represented by C (q ?1 ; k) between the output of the lter and the available desired response. The output error is de ned as e(k) = d(k) ? C (q ?1 ; k)y(k)

(2)

and the ltered x-LMS algorithm expressed as

w(k + 1)

= w(k) + e(k)~x(k) x~(k) = C^ (q ?1 ; k)x(k)

(3) (4)

where x~ corresponds to the inputs ltered through a model C^ of the error channel ( controls the learning

rate). This algorithm can be derived from the standard LMS algorithm assuming linearity by simply commuting the order of the lter and the channel. Thus the original x input become ltered by the channel (channel model) before entering the lter and the error appears directly at the output of the adaptive lter. Properties of this algorithm are discussed in [6]. We now present an alternative algorithm called adjoint LMS. The equations are w(k + 1) = w(k) + e~(k ? M 2)x(k ? M 2) (5) e~(k) = C^ (q +1 ; k)e(k) (6) These equations di er from Equations 3 and 4 in that the error rather than the input is now ltered by the channel model as illustrated in Figure 1b (M 2 is the order of the FIR channel model). Furthermore, the ltering is through the adjoint channel model (q?1 is replaced with q+1 ). Graphically, an adjoint system is found for any lter realization by reversing the ow direction and swapping branching points with summing junctions and unit delays with unit advances. This is illustrated in Figure 2 for a FIR tapped delay line. However, the method applies to all lter realizations including IIR and lattice structures. The consequence of the noncausal adjoint lter is that a delay (equal to the channel model delay) must be incorporated into the weight update in Equation 5 to implement an on-line adaptation. q-1

y(k) c

c

1

~ e(k)

c

2

q c

1

c

3

c

2

q c

3

P

Q

M

L

P

M

L

P

Q

L

P

M

M

;

Q

;

;

Table 1: Comparison of computational complexity. Reference inputs, L = 8. Adaptive lter outputs, P = 4, Adaptive lter taps, M 1 = 6, Channel outputs, Q = 8, Channel model taps, M 2 = 6. and the channel by a P  Q transfer function matrix

C(q? ; k). Filtered x-LMS does not generalize directly since matrices do not commute and it makes no sense to lter the input X by C since dimensions may not 1

even match. The Multiple Error LMS algorithm, proposed by Elliott et. al. [1, 2], solves this by e ectively applying ltered x-LMS to all possible SISO paths in the MIMO system, and can be written as: w (k + 1) = w (k) + e (k)X~ (k) (7) lp

T

lp

lp

for 1 < l < L and 1 < p < P , and there is now a ltered matrix of inputs for each lter w formed as: lp

X~

T lp

(k) = x~ 1 (k) x~ 2 (k)    x~ (k) 

lp

lp

lpQ

x~ (k) = C (q ?1 ; k)x (k):

^ -1 C(q )

N

q

k



(8)

with each row in the matrix found by ltering the input through the corresponding secondary path:

q-1

q-1

Multiplications Adjoint LMS e~( )   2 = 192 weight updates   1  2 = 384 total 567 Multiplications Multiple Error LMS ltered inputs    2 = 1 536 weight updates   1  ( + 1) = 1 728 total 3 264

^ +1 C(q ) c

N

e(k)

lpq

Adjoint LMS is clearly a simple modi cation of lteredx LMS. For SISO systems the computational complexity of adjoint LMS and ltered x-LMS are identical. The real advantage comes when dealing with MIMO systems. In this case the adaptive lters are represented by an LP matrix of transfer functions W(q?1 ; k)

(9)

l

The implementation of Multiple Error LMS results in a total of L  P  Q lter operations. In the cases of adjoint LMS, however, we encounter no such problem. Equations generalize directly:

w

lp

Figure 2: FIR model of the channel and corresponding adjoint model. The adjoint system is found by reversing ow direction, swapping summing junctions with branching points and delays with advances.

pq

(k + 1) = w (k) + e~ (k ? M 2)x (k ? M 2) e~(k) = C^ (q+1 ; k)e(k); (10) lp

p

l

Here we note that the output error e is dimension Q (number of channel outputs) whereas the error e~ after ltering through the adjoint MIMO channel model is order P (number of primary lter outputs) as desired. The clear advantage of this form is that operations remain order N , where N is the total number of lter parameters (compare the weight update matrix operation in Equation 7 to the vector operation in Equation 10). Table 1 gives a comparison of multiplications for some speci c parameter values.

Filtered−x LMS

Misadjustment vs. Learning Rate

20

1.5

e^2(k)

15

Adjoint LMS Filtered x-LMS

10 1

0 0

50

100

150

200

250 k

300

350

400

450

Misadj.

5

500

Adjoint LMS 0.5

20

e^2(k)

15 10 0

5 0 0

50

100

150

200

250 k

300

350

400

450

500

Figure 3: A comparison of instantaneous squared error learning curves. Channel: C (q?1 ) = q?2 (1 ? 2q?2 ) (a delayed bandpass). Channel Model: C^ (q?1 ) = q?2 (1 ? 2:5q ?2 ). Input: x(k) = white noise Normal(0,1). Adaptive Filter: w, FIR with 6 taps. Desired Response: d(k) = W  (q?1 )C (q?1 )x(k) + n(k), where 1+ 5 ?1 is the optimal primary lter and n(k) W  = 1+ 75 ?1 is white noise Normal(0,0.2). Learning rate:  = 0:01. : q

:

q

3. INTERPRETATIONS In gradient descent algorithms, weights are moved in the direction of the negative gradient w , where J is a cost function. For ltering, J is typically a sum of a sequence of errors e2 (k) and the error gradient can be expanded as a sum of instantaneous error gradients: @J

@

@J X @e2 (k) = : @w @w =1 K

k

Standard LMS, ltered x-LMS, and Multiple Error LMS are all based on iteratively updating the weights in2 the direction of the instantaneous negative gradient w( ) . @e

k

@

While adjoint-LMS is still a stochastic gradient descent algorithm, it is not based on the instantaneous gradient. Instead, consider the chain rule expansion @J X @J @y(k) X @e2 (k) = @y(k) @ w = @w @ w: =1 =1 K

k

K

k

Individual terms in the two sums are not equivalent, only the total sum over all time. Adjoint LMS stochastically updates lter weights based on this new expansion which leads to the more computationally ecient

2

4

6

8

µ

10

12

14

16 −3

x 10

Figure 4: A comparison of misadjustment (excess MSE / min MSE) versus the learning parameter . form. ( ) is interpreted as the change in the total error over all time due to a change in the lter output at time k. This gradient term is precisely the ltered error term e~(k) in the equations for adjoint LMS ( ( ) w( ) = e~(k)x(k) in Equation 5). The derivation of this is detailed in [4, 5] in the original context of neural networks. In fact, an additional advantage of the algorithm is that it can be generalized to when both the primary lter and channel are modeled with nonlinear lters. On the other hand, an approach based on Filtered x-LMS does not apply, since in general nonlinear systems do not commute for even the SISO case. @J

@y k

@J

@y k

@y k @

4. SIMULATIONS AND CONCLUSIONS A SISO simulation illustrating the similar performance of adjoint LMS to ltered x-LMS is shown in Figure 3. Learning curves are remarkably alike even though the individual stochastic weight updates are not identical. Our conclusions are that adjoint-LMS has an equivalent rate of convergence and misadjustment to ltered x-LMS with a substantial computational savings over Multiple Error LMS. The only trade-o , in certain cases, is a slightly tighter restriction for stability on the learning parameter as might be expected due to the delayed weight update [3]. This causes a slight increase in misadjustment for large learning parameters as illustrated in Figure 4. Fortunately, this occurs beyond the desirable operating range for the algorithms. A second simulation for a MIMO noise cancellation experiment is detailed in Figure 5 and Figure 6. Again the similar performance of the Adjoint LMS to Multiple Error LMS is observed. Note that in this case,

Adjoint LMS exhibits the greater range for stability versus learning rate. While a full analysis is yet to be completed, experiments also indicate equivalent performance with regard to eigenvalue spread of the inputs and errors in channel modeling.

MSE vs. Learning Rate 0.12 0.11 Adjoint LMS Multiple Error LMS 0.1 0.09

MSE

Multiple Error LMS 1

0.08 0.07

0.8 e^2(k)

0.06

0.6 0.05

0.4

0.04

0.2 0 20

40

60

80

100

120

140

160

180

200

k

0.03 0.008

0.009

0.01

0.011

0.012 µ

0.013

0.014

0.015

Adjoint LMS 1

Figure 6: A comparison of MSE = E [e21 (k) + e22 (k) + e23 (k)] versus the learning parameter . Note in this example, Adjoint LMS exhibits a greater range of stability versus the learning parameter.

e^2(k)

0.8 0.6 0.4 0.2 0 20

40

60

80

100

120

140

160

180

200

k

Figure 5: A comparison of instantaneous squared error learning curves for a MIMO system with L = 2 inputs, P = 2 lter outputs, and Q = 3 secondary channel outputs. Inputs x1 (k) = cos(k=5), and x2 (k) = cos(k=8). Primary channel P(q ?1 ) from inputs to secondary channel output locations: P1 (q?1 ) = (:9q?1 ) ?1 (1 ? :9  q?1 ), P2 (q?1 ) = (:8q?1 ) ?1 (1 ? :9  q ?1 ), for c = 1; 2; 3. Desired output (disturbance) at channel output: d(k) = P(q?1 )x(k) + n(k), where n (k) is white noise Normal(0,0.01). Secondary channel model: C^ (q?1 ) = q? ? +1 , p = 1; 2, c = 1; 2; 3. Adaptive Filters w , all FIR order 6. Learning rate:  = 0:01. Such a system represent a simple propagation of harmonics which arrive at di erent sensors with increasing distance and attenuation. The secondary channel model corresponds to pure delays associated with di erent locations of the lter outputs. c

c

c

c

l

p

pc

c

lp

5. REFERENCES [1] S. Elliott, I. Stothers, P. Nelson. A Multiple Error LMS Algorithm and Its Application to the Active Control of Sound and Vibration. IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-35, No. 10, October 1987. [2] S. Elliott and P. Nelson. Active Noise Control. IEEE Signal Processing Magazine, October 1993.

[3] G. Long, F. Ling, and J. Proakis. The LMS algorithm with delayed coecient adaptation. IEEE Transactions on Acoustics, Speech, and Signal Processing., vol. 27, no. 9, pages 1397-1405, Sept. 1989. [4] E. Wan. Finite Impulse Response Neural Networks with Applications in Time Series Prediction. Ph.D. dissertation. Stanford University. 1993. [5] E. Wan, and F. Beaufays. Diagrammatic Derivation of Gradient Algorithms for Neural Networks. Neural Computation, Vol. 8, No. 2. 1995. [6] B. Widrow, and S. Stearns. Adaptive Signal Processing. Prentice-Hall, 1985.