Expected-Likelihood versus Maximum-Likelihood ... - IEEE Xplore

18 downloads 0 Views 363KB Size Report
QT & V, Qualcomm Inc. PO Box 1500. SPRI Building. 5775 Morehouse Drive. Edinburgh SA 5111. Mawson Lakes SA 5095. San Diego CA 92121-1714.
Expected-Likelihood versus Maximum-Likelihood Estimation for Adaptive Detection with an Unconditional (Stochastic) Gaussian Interference Model Y.I. Abramovich, ISRD, DSTO PO Box 1500 Edinburgh SA 5111 Australia

N.K. Spencer, CSSIP SPRI Building Mawson Lakes SA 5095 Australia

Abstract—We demonstrate that for small sample support, the maximum-likelihood criterion typically leads to solutions whose likelihood is too high to be generated by the true model parameters. We introduce a new class of “expected-likelihood” estimates that yield a likelihood that is “most probable” for the true parameters. When these new estimates are used by adaptive detection algorithms, they give significantly better detection performance compared with the maximum-likelihood ones.

A.Y. Gorokhov QT & V, Qualcomm Inc. 5775 Morehouse Drive San Diego CA 92121-1714 USA

Specifically, its Q

S

moment is [2] T

> a



#



\

Q

B \

g

h

*



X



Z

b

> +

c

 >

X



F

V #



* D

Y

.

X



#



\

* Q

>

`

_

^

> a



#



B \

g

h

*



b

c

(3) this p.d.f. is concentrated Most importantly, for around extremely low values ( , see Fig. 2). Therefore the ML estimate is extremely far from the true covariance matrix , even in terms of the likelihood ratio/function metric, which means that other types of estimates should be considered. Since the class of estimates , such that

k

m



n

1 Introduction









































D







 +



+

+



#



* D

M



#





* 



M

(4)







>



. 

B

u

in terms of proximity (“accuracy”) to the true set of parameters , is statistically equivalent to the ML estimate, two important proposals can now be made. Proposal 1 For ML estimation with finite sample support , searching for the global ML solution is not reis found that satisfies quired providing that a solution `

D

^

















(



*



X





z

x



 +





+

!

#



* D





+

. |

B

g

}

}



B Q



M



#







(5) since with probability such a search cannot result in a better accuracy. Proposal 2 Instead of ML estimates, we can employ “ex: pected likelihood” (EL) estimates B

g

}

u





…



 + + :

‰

Š

‹



(6)



/

0

1

0

7

8

` u

š

…







 ‡

#



* 

g 

#



* D



Ž





’

“ “

”

•

–

^ +



3





5

*

. 0

7

8

1

:





5





>

3



;

`

œ

“

”

•



š

–

^

#



™

˜

+

!

*



+

for the sample covariance



#

* 

.

B

™

where is any appropriate norm that is produced by the scenario-free p.d.f. . (such Specifically, for the continuous parameters in as loading factor) “

as a LF . While we have matrix

˜

(1) 





 



D



#

*







#







!

q







K





In most practical applications, adaptive detection (and/or detection-estimation) systems must operate with very small training sample sizes. Nevertheless, the traditional system design is based on the optimal decision rule derived for the known set of interference parameters, whereby they are replaced by their maximum-likelihood (ML) estimates computed from the training (secondary) data. Here we consider an -sensor array and the -variate (unconditional) stochastic Gaussian model of the interference [1], whose -variate p.d.f. is therefore completely specified by the -variate covariance matrix . The set of (identifiable) parameters that specify is a priori unknown, and the parameter estimates are computed from the i.i.d. training samples , with . Since any multiple of the likelihood by an arbitrary function of is function also a likelihood function (LF), we consider the likelihood ratio (LR) 

K

#



*

D









(2)

B



5









 

 +

+ :

: ‰

‰

Š

7

(7)

u

Ž …



.

‡



 ’

Ÿ



#





* 

. ‡

‡

!

#



* D



 





–

for the true (exact) covariance matrix p.d.f. for is specified only by 

D

K

M

#



*

O

B

1­4244­0132­1/05/$20.00 ©2005 IEEE

, the and .

.

F

+



H 

J

1130

that in other words, we try to select the set of estimates generate the LR value that is “most probable” for the exact u





…



covariance matrix. Obviously, slightly different “proximity conditions” can be suggested; however the main idea is clear: find the set of estimates that in terms of the LR metric is “as close as possible” to the (unknown) set of true parameters. as , the sequence of Since EL estimates , “may be as good as maximum likelihood from a practical point of view” [3] (p. 77), and asymptotic properties of the EL estimates are the same as for the ML ones. In the preasymptotic domain ), the EL estimates may have better detection ( performance than traditional ML ones within the adaptive matched filter (AMF) methodology. 





















































where 



is given by (6) and (7). In this case, 



4

6

 8

6

/

9





(11)





 









 

1



6

/

9

8

 (







2

2





, is described by a scenariowhere free (complex Wishart) p.d.f. [7]. The required p.d.f. could be found by direct Monte-Carlo simulations using (11), or analytically by applying an inverse Mellin transform to the moment function in (3). When the training data contains interference, whose power may be different to that of the primary sample (nonhomogeneous training), then the adaptive ACE detector [8] operates on statistics

=











%



2 EL vs ML AMF Detectors

















 







(12)





 



"



In order to clearly demonstrate the difference in detection performance of the new EL principle, let us consider a nonrestricted admissible set, which means that the conventional ML estimate belongs to this set. The well-known “nested” property of the ML ensures that for such models signal subspace dimension, as the one with the as in the “fast maximum likelihood” method [4], and the white-noise present (ie. diagonally loaded) covariance matrix model, the result is always the maximum possible or “zero loading”, respectively, so that the ML covariance matrix for any of these models is the same . When introducing finite interference subspace methods [4] or loaded sample matrix inversion (SMI) algorithms [5, 6], it is typical to consider additional restrictions, such as restricting the white-noise mismatch losses for the adaptive filter. While it is well-known that, say, properly loaded SMI generally has better SNR performance than plain SMI that uses the unloaded ML estimate [6], the connection between loading and the (maximum) likelihood principle has not been well understood. Let us first consider a fluctuating target masked by interference that is homogeneous in both the training (secondary) and primary data. In this case, the AMF detector operates on statistics:











#





 



 











 





for the ML-ACE detecwhere, once again, tor, whereas for the EL-ACE solution, the parameter is determined by







1





3

1















4

6

8







B



EL-ACE:

 

 



(13)





 1

 







8

(

%









 



!







%







is here specified by the scenarioAccording to (6), free p.d.f. that was derived in [9]: 





4

6

 8



2 $

2

&

2





"



C



 

 1

=









 



%

8

 (

%



!

%

(14) -



2



 ,

,

7

7

7

,



"

" "

%

,

%

,

/



%



9

*



E

 

1 

 

/

 %

/

1

3

5 1

3

5

1

3

5

1



3 ,

,

7

7

7

,

%





















 







(8)







"



 #







 

 



 



1

1

1

(15)

 

2





H



 

C







 

 ;

?



%

J

K

H

1

5 5

1



L 

M N

3

where

,

%

,

 



(16) 3

is Meijer’s



%

*



>

3

-function [10]. *

O

%

3 Detection Performance Comparison To analyze the detection performance of our suggested technique, we first have to specify a scenario and the parametric family of covariance matrix estimates . Con-sensor uniform linear antenna array, sider an independent Gaussian interference sources, each with 30 dB signal-to-white-noise ratio (SWNR), and obtraining samples. servations comprising The interference DOAs were chosen to be 









is the target signal wavefront (“steering where vector”), is the primary -variate sample vector that may or may not contain a target, and is a parametric set of covariance matrix estimates such that %



$















(

*

.







1

1



B



1



1





D

2

(9)



 '



 

1 





1 

 3



P

E

,



/

-

'

+



R

B

J

2

B

2

B

2

B

K

2

B

L

2

B

M

I

F

F



C

1 G

L

L 3

3

D

3



3

3

3



which leads to the well-known ML-AMF method. The proposed EL approach is to search for some such that 



(17)

so that the eigenspectrum of the interference covariance matrix





F



4

6

8

6

/

9

P





EL-AMF:









 









1

6

/

9

8

(

 



(10)

J







 









1131

J J





1 

O

O N





%

J



K



Q





Q



(18)

where

is the white-noise power, and 







Let us first analyze the detection performance for our example scenario with unknown target power and homogeneous interference conditions. For the clairvoyant case , the ROC has the well-known analytic expression









 





(19)

 



 













is similar to the eigenspectrum of the terrain-scattered space-time covariance matrix in a side-looking airborne radar with three antenna sensors and four repetition periods [11]. have been selected to represent Two target DOAs two extreme cases, namely 





 



























 





















 

 



 







(20) In the first case (“fast target” in STAP terminology), total interference mitigation is not accompanied by a significant degradation in target SWNR, whereas in the second case (“slow target”), such interference “nulling” leads to a dramatic signal power reduction. ), as well Note that for the clairvoyant detector ( as for the standard AMF detectors, this distinction does not affect the receiver operational characteristic (ROC) if the output SNRs are identical: 







   





 



 

 

 





Whether this property survives for the EL-AMF has yet to be explored. For this purpose, we selected the two well-known families of covariance matrix estimates, 



(22)

%



 

 

$



&







#

which is the diagonally loaded sample matrix family [5, 6], and 













%

% %

%

%

%







 



 $

) ,





#

 #



 





+

(23) (24)

 (















% %

% %

%

%

%





#



& -

#



which is the finite interference subspace approximation or the “fast maximum likelihood” (FML) estimate family [4]. Unlike the traditional loaded sample matrix inversion (LSMI) and FML techniques, whereby the loading factor in (22) or the interference subspace dimension in (23) are based on some additional considerations, for our proposed EL-AMF technique these parameters are to be optimized against the above criteria for every given . Monte-Carlo trials, Fig. 2 presents Calculated over the scenario-free p.d.f. for the “general” test (11). Together with the very similar p.d.f. for the “sphericity” test (14), and they give us the median values of respectively. For our case of , , these p.d.f.’s show that the probability of the exact covariance matrix generating a LR above 0.1 is about ! Clearly, the ML estimate is completely inappropriate in terms of the LR metric. This was actually the most important factor in our development of the EL estimate. 



%





























































































)

/















































#







( 

















(21)



 







































 

5







 $

where is the threshold and is the output SNR. Our analysis starts from a comparison of the traditional ML-AMF and clairvoyant methods with our ELAMF method. We use the clairvoyant ROC function (25) to Monte-Carlo simulations with false-alarm validate our rates set at , and trials for each signal scenario. The signal powers for the two (extreme) target scenarios in (20) were chosen so that the output SNRs are the same (21) (0 – 35 dB). Fig. 3 illustrates the ROC for the target with the high ratio in (20) (“fast target”); figures calculated for the low-ratio (“slow”) target are practically the same. We first note the ideal coincidence between the theoretically calculated ROCs for the clairvoyant case (curves labeled “theoretical”) and those obtained by Monte-Carlo simulations (“ideal”). This high accuracy validates the other MonteCarlo results. The most important observation involves the comparison of the performance of the standard AMF (ML-AMF) technique with our EL-AMF method (separately optimizing loading factor and interference subspace dimension). We see that EL-AMF has a significant improvement; for and , standard AMF example, for suffers a 5 dB loss in SNR compared with the clairvoyant case, while the new EL-AMF has only 1 – 1.5 dB losses. As expected, over the entire range of false-alarm and detection probabilities analyzed, only a negligible difference is found between the optimum loading and optimum subspace methods, again proving the common nature of these two techniques [4]. Also, we demonstrated that the ROCs are exclusively specified by the output SNR, irrespective of the target and interference signal correlation in (20). Let us now turn our attention to the EL-AMF technique for the nonhomogeneous scenario (12); in this case, . Fig. 4 presents the ROC for the “fast” target; the ROC for the “slow” target is practically indistinguishable. For this model, the original approximately 3 , ) for ML-AMF dB loss factor (for with respect to the clairvoyant case (Fig. 4) is reduced to less than 1 dB by the optimum choice of loading factor or interference subspace dimension within the EL-AMF technique. Fig. 1 shows a sample p.d.f. for the EL-optimum loading factor. While it may be argued that data-independent diagonal loading may result in similar performance, the most important distinction between EL-AMF and the standard LSMI algorithm is the direct data-dependent choice of the load-













(25)

5



 

4







 

3











1











 





















1132

















4 Summary and Conclusions We have introduced an approach called “expected likelihood” (EL), whereby we seek the estimate that statistically generates the same LR as the exact covariance matrix. This is feasible in practice since the p.d.f. for the LR generated by the exact covariance matrix does not depend on the maand , and so it can trix itself, but only the parameters be precalculated. We have used two well-known families of covariance matrix estimates: the diagonally loaded sample matrix (ie. the loaded unconstrained ML solution) and the finitesubspace interference approximation of the ML solution. For these estimates respectively, the traditional ML criterion drives the loading factor to zero, and the interference subspace dimension to its maximum. For the EL-AMF (ACE) technique, we seek the loaded solution that generates the median LR value produced by the exact covariance matrix. For finite-rank approximations that have only a finite number of solutions, we simply find the one that is closest to the upper LR bound, if no solution within the bounds is available. The Monte-Carlo simulation results demonstrate that our EL criterion for the proper families (diagonally loaded, finite interference rank) gives a reasonable improvement in detection performance compared with the ML criterion, which for small sample support produces solutions far away from the exact ones. We emphasize that the introduced families include the standard (unconstrained) ML covariance matrix estimate, while the major distinction stems from the attempt to get a statistically close LR to that of the exact covariance matrix, rather than just the (ultimate) maximum LR value. This is an important distinction from some optimum search over a restricted set of covariance matrices, such as the class of Toeplitz covariance matrices, for example. Any reliable a priori structural information on the covariance matrix should always lead to a detection improvement, however we chose the most generic families specifically to underline the difference in criteria (EL versus ML), rather than any possible difference in covariance matrix description.

References [1] B. Ottersten, M. Viberg, P. Stoica, and A. Nehorai, “Exact and large sample maximum likelihood techniques for parameter estimation and detection in array processing,” in Radar Array Processing, S. Haykin, J. Litva, and T. Shepherd, Eds. Berlin: Springer-Verlag, 1993, pp. 99–151, chapter 4, Springer Series in Information Sciences, vol 25. [2] Y. Abramovich, N. Spencer, and A. Gorokhov, “Twoset adaptive GLRT and AMF detection: General

1133

match nonhomog EL−SMI, ULA, M=12, m=6, N=24, 30dB SNR, 30dB output SNR, 10000 trials, low−ratio targe

sample probability

ing factor, based on the EL rather than on considerations typically used to justify the LSMI technique [12].

0.06 0.04 0.02 0

1

1.5

2 2.5 3 optimum loading

3.5

4

Figure 1: Optimum loading histogram for the EL-AMF method in the nonhomogeneous interference case. approach and Gaussian model study,” IEEE Trans. Aero. Elect. Sys., submitted 31 Jan 2005. [3] B. Porat, Digital Processing of Random Signals. New Jersey: Prentice-Hall, 1994, 5th edition. [4] K. Gerlach, “Outlier resistant adaptive matched filtering,” IEEE Trans. Aero. Elect. Sys., vol. 38, no. 3, pp. 885–901, July 2002. [5] Y. Abramovich, “A controlled method for adaptive optimization of filters using the criterion of maximum SNR,” Radio Eng. Electron. Phys., vol. 26 (3), pp. 87–95, 1981. [6] O. Cheremisin, “Efficiency of adaptive algorithms with regularised sample covariance matrix,” Radio Eng. Electron. Phys., vol. 27 (10), pp. 69–77, 1982. [7] M. Siotani, T. Hayakawa, and Y. Fujikoshi, Modern Multivariate Statistical Analysis. Ohio: Amer. Sci. Press, 1985. [8] L. McWhorter, L. Scharf, and L. Griffiths, “Adaptive coherence estimation for radar signal processing,” in Proc. ASILOMAR-96, vol. 1, Pacific Grove, CA, USA, 1996, pp. 536–540. [9] Y. Abramovich, N. Spencer, and A. Gorokhov, “Bounds on maximum likelihood ratio — Part I: Application to antenna array detection-estimation with perfect wavefront coherence,” IEEE Trans. Sig. Proc., vol. 52, no. 6, pp. 1524–1536, June 2004. [10] I. Gradshteyn and I. Ryzhik, Tables of Integrals, Series, and Products. NY: Academic Press, 2000. [11] R. Klemm, Space-Time Adaptive Processing: Principles and Applications. UK: IEE, 1998. [12] O. Cheremisin, “Loading factor selection in the regularised algorithm for adaptive filter optimisation,” Radioteknika i Elektronika, vol. 30 (12), 1985, English translation should be found in Soviet Journal of Communication Technology and Electronics.

−3

sample probability

4

x 10

general non−sphericity test, ULA, M = 12, N = 24, 10000000 trials P (LR < 0.0142) = 10

−1

P (LR < 0.0257) = 0.5

3 P (LR < 0.0084) = 10

P (LR < 0.0437) = 1 − 10−1

−2

P (LR < 0.0055) = 10−3

2

P (LR < 0.0038) = 10

P (LR < 0.0647) = 1 − 10−2 P (LR < 0.0842) = 1 − 10−3 −4 P (LR < 0.1035) = 1 − 10

−4

1 0

−0.04

−0.02

0

0.02

0.04

0.06 LR

0.08

0.1

0.12

0.14

0.16

Figure 2: P.d.f. for the general test. −4

matched, homog, EL−AMF, ULA, M = 12, m = 6, N = 24, 30dB SNR, 10000 trials, high−ratio target, PFA=10

detection probability

1 0.8 0.6 0.4

EL−AMF (loading) ML−AMF ideal EL−AMF (subspace) theoretical

0.2 0

0

5

10

15 output SNR (dB)

20

25

Figure 3: EL-AMF ROC for the high-ratio target with a false-alarm probability of A

K

2

30



.

matched, nonhomog, EL−SMI, ULA, M = 12, m = 6, N = 24, 30dB SNR, 10000 trials, high−ratio target, P =10−4 FA

detection probability

1 0.8 0.6 0.4

EL−AMF (loading) ML−AMF ideal EL−AMF (subspace) theoretical

0.2 0

0

5

10

15 output SNR (dB)

20

25

Figure 4: Nonhomogeneous EL-AMF ROC for the high-ratio target with a false-alarm probability of

1134

A

30

K

2



.