A support vector data description approach to ... - Semantic Scholar

3 downloads 0 Views 216KB Size Report
J. Munoz-Marf, L. Bruzzone, and G. Camps-Vails, “A Support Vector Domain Description Approach to Supervised. Classification of Remote Sensing Images,” ...
A support vector data description approach to target detection in hyperspectral imagery Wesam A. Saklaa, Adel A. Saklab, Andrew Chana Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77840 b Department of Electrical and Computer Engineering, University of South Alabama, Mobile, AL 36688-0002 a

ABSTRACT Spectral variability remains a challenging problem for target detection and classification in hyperspectral imagery (HSI). In this paper, we have applied the nonlinear support vector data description (SVDD) to perform full-pixel target detection. Using a pure target signature, we have developed a novel pattern recognition (PR) algorithm to train an SVDD to characterize the target class. We have inserted target signatures into an urban hyperspectral (HS) scene with varying levels of spectral variability to explore the performance of the proposed SVDD target detector in different scenarios. The proposed approach makes no assumptions regarding the underlying distribution of the scene data as do traditional statistical detectors such as the matched filter (MF). Detection results in the form of confusion matrices and receiveroperating-characteristic (ROC) curves demonstrate that the proposed SVDD-based algorithm is highly accurate and yields higher true positive rates (TPR) and lower false positive rates (FPR) than the MF. Keywords: automatic target recognition, support vector data description, hyperspectral imagery, target detection.

1. INTRODUCTION Automatic target recognition (ATR) has experienced significant strides with the advent of hyperspectral imaging (HSI) sensors. ATR systems should be able to detect, classify, recognize, and/or identify targets in an environment where the background is cluttered and targets are at long distances and may be partially occluded, degraded by weather, or camouflaged1. HSI sensors provide plenty of spectral information to uniquely identify materials by their reflectance spectra. A material’s reflectance spectrum contains the reflectance values of the material as a function of wavelength. Although it is theoretically possible for two completely different materials to exhibit the same spectral signature, targets in ATR applications are typically man-made objects with spectra that differ considerably from the spectra of natural background materials2. In HSI target detection applications, the targets are sparse and typically occupy less than 1% of the total pixels in a hyperspectral scene, rendering traditional spatial processing techniques impractical. Consequently, most HSI detection algorithms exploit the spectral information of the scene, an approach known as nonliteral exploitation in the HSI literature3. One of the main challenges in HSI processing is spectral variability, which refers to the phenomenon that spectra observed from samples of the same material will never be identical. In other words, spectra of the same material are not fixed due to the inherent variations present in the material. Further spectral variability is introduced by external factors such as atmospheric conditions, sensor noise, and illumination variations2, 3, 4. While many detection algorithms have been developed over the years, spectral variability poses challenges for these algorithms. Although the statistical detectors are easily mathematically tractable and can work well in some situations, they are only optimal under the assumption of the multivariate normality of the data. The quadratic Neyman-Pearson detector requires the covariance matrix of the target class, which is not available if one is given a single spectral signature obtained from a library2, 3. The MF and adaptive matched filter (AMF) algorithms assume that the target and background covariance matrices are identical. In real-life scenarios, the multivariate normality assumption is often violated because a hyperspectral image may contain multiple types of terrain, thus causing detection performance to suffer5. Kernel methods have become increasingly popular in a variety of PR applications. The recently-developed SVDD has its roots in statistical learning theory and is an emerging non-parametric approach for describing a set of data6, 7. The SVDD

Automatic Target Recognition XIX, edited by Firooz A. Sadjadi, Abhijit Mahalanobis, Proc. of SPIE Vol. 7335, 73350C · © 2009 SPIE · CCC code: 0277-786X/09/$18 · doi: 10.1117/12.818642

Proc. of SPIE Vol. 7335 73350C-1

is connected with support vector machines (SVMs) and is capable of providing accurate descriptions of a dataset via the use of kernels8. The SVDD differs from the SVM in that it only considers samples belonging to the class of interest in order to provide a tight boundary around the data. It has been successfully applied in the areas of facial expression analysis9, gene expression data clustering10, image retrieval11, remote sensing image classification12, and HSI anomaly detection13, 14. In this paper, we will use the SVDD to perform target detection in hyperspectral imagery. Given a target signature that we wish to identify in HSI scenery, we have developed a novel PR algorithm for training an SVDD for the target class. Experiments on urban HSI scenery confirm that the proposed SVDD-based method yields more accurate detection results than the matched filter (MF) detector. Section II provides formulation of the SVDD. Section III outlines the proposed algorithm for novel training of the SVDD target class. Section IV provides the experiments and results, and conclusions and future work are discussed in section V.

2. SUPPORT VECTOR DATA DESCRIPTION The support vector data description models a class of data by fitting a hypersphere with center a and radius R around all or most of the samples. Assume that we are given a set of training samples { xi , i = 1K N } . The SVDD aims to minimize the volume of the hypersphere by minimizing R . The task then becomes minimization of the following error function7: 2

F ( R, a , ξ i ) = R + C ∑ ξ i , 2

(1)

i

with the added constraints that most of the training samples xi lie within the hypersphere. These constraints are postulated as follows: xi − a

2

≤ R + ξ i , i = 1K N 2

(2)

The C parameter in (1) controls the tradeoff between the volume of the hypersphere and the number of target objects rejected15. The solution of (1) is obtained by solving the Lagrangian dual problem:

{

}

max ∑ a i ( xi ⋅ xi ) − ∑ ai a j ( xi ⋅ x j ) ai

i

i, j

(3)

subject to 0 ≤ ai ≤ C . After solving (3), only a subset of training samples will satisfy the equality given by (1). These are the xi with corresponding nonzero ai and are called the support vectors since they are the only samples needed to provide the hypersphere boundary around the data. To determine whether a new object y lies in the SVDD, the distance from the center of the sphere to y must be less than R . Hence, y is deemed as belonging to the class when the following inequality is satisfied15: 2

( y ⋅ y ) - 2 ∑ ai ( y ⋅ x i ) + ∑ ai a j ( x i ⋅ x j ) ≤ R i

2

(4)

i, j

In many cases, fitting a hypersphere around the data in the original feature space does not provide a tight boundary. The nonlinear version of the SVDD implicitly maps the data from the input space to a higher-dimensional Hilbert feature space through a mapping function Φ ( x ) . As a result, the problem becomes fitting a hypersphere around the data in the higher-dimensional feature space, which translates to a tighter, more accurate description of the boundary in the original feature space7. In the nonlinear SVDD, the inner products ( xi ⋅ x j ) found in (3) are replaced by a kernel function

K ( xi ⋅ x j ) satisfying Mercer’s theorem8. Accordingly, equation (4) becomes the following: K ( y ⋅ y ) - 2 ∑ ai K ( y ⋅ xi ) + ∑ ai a j K ( xi ⋅ x j ) ≤ R i

i, j

Proc. of SPIE Vol. 7335 73350C-2

2

(5)

Several different choices of kernel functions exist. We will use the well-known Gaussian radial basis function (RBF). The RBF kernel has only one free parameter to be tuned and is shown to yield tighter boundaries than other kernel choices7, 12, 15. The RBF kernel is given by the following: 2

K ( x , y ) = exp( − x − y / σ ) 2

(6)

In (6), σ is a free parameter that is adjusted to control the tightness of the boundary and is typically optimized through cross-validation12, 13. Using the fact that K ( y ⋅ y ) = 1 for the RBF kernel in (6), we can define a bias term that incorporates all constant terms in equation (5). The bias term is given by the following: b = 1 + ∑ ai a j K ( xi ⋅ x j ) − R

2

(7)

i, j

After incorporating the bias term of (7) into equation (5) and some algebraic manipulation, we have the following SVDD decision function:

⎛ ⎝

SVDD ( y ) = sgn ⎜ ∑ ai K ( y ⋅ xi ) − i

b⎞



(8)

2⎠

Thus, an input signature y is predicted to be a target if its output is positive and predicted to be background if its output is negative.

3. SVDD TARGET CLASS TRAINING ALGORITHM 3.1 Target Class Training Sample Generation In target detection scenarios, we do not have access to a collection of samples characterizing the target class; we are typically given a pure target signature that is obtained from a spectral library. In this work, we investigate the creation of N training samples pertaining to the target class by drawing them from a normal (Gaussian) distribution as follows: x ~ N K [ t ,σ I ] 2

(9)

In (9), x is a generated training sample signature, t is the pure target signature, and σ is the per-band noise variance. Hence, the generated target class training samples are drawn from a K-dimensional Gaussian distribution with μ = t and 2

whose covariance matrix Γ = σ I . 2

The number of training samples N and the variance σ used in the generation of the training samples are free parameters that will have an influence on the trained SVDD. Large values of N are not desirable since the SVDD training time increases quadratically with N 13. In empirical trials of varying N , we have found that 200 training samples provide reasonable training times and have fixed N at this value for the experiments in this paper. With respect 2

to the per-band variance σ , small values of σ may prove insufficient for describing the spectral variability of the 2

2

target class; in contrast, large values of σ will provide a loose description of the target class and will allow signatures from the background class to be described by the target SVDD, thus generating false alarms. In this work, we have used 2

different values of σ to explore the impact on the detection process, as will be outlined in section IV. 2

3.2 SVDD Parameter Selection As shown in section II, use of the SVDD with the RBF kernel requires selection of the free parameters C and σ . In empirical trials of varying the value of C between 0.001, 0.01, and 0.1, we found that all values yielded identical results. In this work, we have used a value of C = 0.01 . Proper tuning of C is not critical in practical applications of the SVDD15.

Proc. of SPIE Vol. 7335 73350C-3

The kernel parameter σ has to be carefully tuned for successful operation of the SVDD. If σ is chosen to be too small, a large number of support vectors will be chosen, thus over fitting the training samples. In contrast, if σ is chosen to be too large, a relatively small number of support vectors will be chosen, thus under fitting the training samples and allowing for too loose of a description of the target class7, 13. From a set of candidate σ values, our goal is to select the best value for a given target and scene. In the proposed method, we construct a set of σ values between 0 and 2 in increments of 0.01, yielding 200 total candidate σ values. Values above 2 were found to yield an insufficiently small number of support vectors, thus leading to a poor description of the target class. Thus, values above 2 are not considered. For each value, an SVDD was trained and applied to an independent validation set consisting of 200 target signatures and 1000 background signatures. The 200 target signatures are generated according to the model in equation (9). 1000 pixels are randomly selected from the scene and used as background signatures since targets occur with such low probability. The σ value that yields the highest F-measure on the validation set is chosen as the optimal value. The F-measure is a measure of a detection test’s accuracy that considers both the precision and recall of the test and is defined as follows16: F = 2⋅

precision ⋅ recall

(10)

precision + recall

with the precision and recall given by the following: precision =

recall =

true positives

(11)

true positives + false positives true positives

(12)

true positives + false negatives

4. EXPERIMENTS & RESULTS 4.1 Data A data cube of urban scenery has been acquired using a CASI17 sensor that produces 36 bands ranging from 433 nm to 965 nm with a spectral resolution of 15 nm. The spatial resolution of the scene is 200 x 200. The scene contains 200 pixels that have been randomly selected and replaced with corrupted signatures of the spectral signature of a tank, which is used as the target signature. The signatures have been corrupted by drawing them from a multivariate Gaussian distribution and are of the following form: y c = t + n ~ N K [ t ,σ I ] 2

(13)

where yc is a corrupted target signature, t is the pure target signature, n is additive stochastic noise, and σ is the per2

band noise variance. By inserting the corrupted signatures that have been generated in this fashion, we have introduced target spectral variability into the scenes. The variance σ that has been used in our experiments has been varied to achieve signal-to-noise ratios (SNR) of 10 dB, 15 dB, and 20 dB. The SNR is defined here as the RMS of the pure target signature divided by the standard deviation of the noise. Thus, from the original scene, three data cubes have been generated with each one containing corrupted targets with a specific SNR. Hence, each data cube contains a different level of target spectral variability ranging from light (20 dB) to moderate (15 dB) to heavy (10 dB) spectral variability. 2

4.2 Procedure The steps involved in applying our proposed method are as follows: 1.

Create five SVDD training sets, each with a σ yielding a SNR of 8 dB, 10 dB, 15 dB, 20 dB, and 25 dB. 2

Proc. of SPIE Vol. 7335 73350C-4

2.

For each training set, generate the independent validation set using the respective value of σ to create the target samples.

3.

Optimize the kernel parameter σ using the proposed method.

4.

Train an SVDD using the optimized σ value and apply it to the scene.

2

4.3 Results We have applied our algorithm presented in section III to the urban HSI data. Because target detection scenarios are essentially binary decision problems, we can measure the results of our SVDD detection scheme via a confusion matrix or contingency table as shown in Table 116. Recall that the confusion matrix has four categories: true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN). Table 1. Confusion Matrix Actual Background

Actual Target

Predicted Background

TN

FN

Predicted Target

FP

TP

We have generated confusion matrices for both the validation data and the data cube. Table 2 through Table 6 are the detection results on the urban scene containing the lightest target spectral variability of 20 dB. Each table corresponds to a particular σ used to generate the target training set. In each cell of the confusion matrix, the left number corresponds to the validation data set, and the right number (bold) corresponds to the scene. Recall that the validation set contains 1200 total samples—1000 background signatures and 200 target signatures. The HSI scene contains a total of 40000 pixels—39800 background signatures and 200 target signatures. 2

Table 2. Detection Results With SNR = 8 dB

Predicted Background Predicted Target

Actual Background

Actual Target

962/38641

25/0

38/1159

175/200

Table 3. Detection Results With SNR = 10 dB

Predicted Background Predicted Target

Actual Background

Actual Target

986/39420

1/0

14/380

199/200

Table 4. Detection Results With SNR = 15 dB

Predicted Background Predicted Target

Actual Background

Actual Target

975/38896

0/0

25/904

200/200

Table 5. Detection Results With SNR = 20 dB

Predicted Background Predicted Target

Actual Background

Actual Target

909/36985

0/0

91/2815

200/200

Proc. of SPIE Vol. 7335 73350C-5

Table 6. Detection Results With SNR = 25 dB Actual Background

Actual Target

Predicted Background

884/35616

0/0

Predicted Target

116/4184

200/200

The tables above show that the proposed method is able to accurately predict all targets, yielding a perfect TPR. However, the FPRs are influenced by the particular value of σ used to generate the training set. Table 7 shows the 2

FPRs for the validation data and entire scene for each of the σ values in Table 2 through Table 6. 2

Table 7. FPRs (%) For Scene With Lightest Spectral Variability Training Set SNR

Validation Data

Entire Scene

8 dB

3.80

2.91

10 dB

1.40

0.96

15 dB

2.50

2.27

20 dB

9.10

7.07

25 dB

11.60

10.51

Notice that the FPR is lower on the entire scene than on the validation data in all detection scenarios. This shows that the proposed algorithm used to find the optimal σ generalizes quite well to the entire data set. We also see that the lowest FPR is achieved using SNR = 10 dB for the training set. 1 0.9 0.8

Detection Rate

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.005

0.01

0.015 0.02 0.025 False Alarm Rate

0.03

0.035

0.04

Figure 1. MF ROC Curve for Scene with 20 dB Target Spectral Variability.

For sake of comparison, we applied the MF detector to the scene and generated the empirical ROC curve. As shown in Figure 1, the MF performs well on this scene and is able to detect most targets with a low FPR. However, notice that the MF is able to detect all 200 targets with a FPR of ~3.25%, while the proposed SVDD technique is able to detect all targets with a FPR shy of 1% using the SNR = 10 dB training set as shown in Table 7.

Proc. of SPIE Vol. 7335 73350C-6

Let us now examine the detection performance of the proposed technique on the other two scenes containing more target spectral variability. Table 8 and Table 9 are the confusion matrices for the HSI scenes containing target spectral variability with SNR = 15 dB and SNR = 10 dB , respectively. For sake of brevity, we only provide the results using the target training set that performs best. Table 8. SVDD Detection Results For Scene With SNR = 15 dB

Predicted Background Predicted Target

Actual Background

Actual Target

992/39521

4/0

8/279

196/200

Table 9. SVDD Detection Results For Scene With SNR = 10 dB

Predicted Background Predicted Target

Actual Background

Actual Target

992/39521

4/4

8/279

196/196

In both scenes, the SVDD target detector achieves a FPR of 0.70 %. While all targets were detected in the scene with moderate spectral variability, four targets were missed in the scene containing heavy spectral variability, yielding a TPR of 98%. We also applied the MF detector to these scenes and generated the corresponding ROC curves shown in Figure 2. The steeper curve corresponds to the 15 dB scene. 0.9 0.8 0.7

Detection Rate

0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.01

0.02

0.03

0.04 0.05 0.06 False Alarm Rate

0.07

0.08

0.09

0.1

Figure 2. MF ROC Curves for Scenes with 15 dB and 10 dB Target Spectral Variability.

For the scene with 15 dB target spectral variability, one will notice that at a FPR of 10%, the TPR is ~ 85%. Hence, for this particular threshold, the MF detector can detect 170 out of the 200 targets while allowing for ~ 3980 FPs. However, to obtain a FPR of ~0.70 % as we did with the SVDD detector, the MF yields a TPR of ~ 75 %, detecting 150 out of the 200 targets. The difference in improvement using the SVDD versus the MF is even more apparent in the scene with heaviest target spectral variability. At a FPR of ~10%, the MF yields a TPR of ~69%, detecting 138 out of the 200 targets. At a FPR of 0.70 %, the MF yields a TPR of ~64%, detecting 127 out of the 200 targets.

Proc. of SPIE Vol. 7335 73350C-7

5. CONCLUSION We have applied the SVDD to full-pixel target detection in HSI. We have introduced a novel PR algorithm for training an SVDD for the target class and selecting the optimal value of the RBF kernel parameter σ . The proposed algorithm generalizes well to unseen data, using only the pure target signature and 1000 randomly selected scene signatures. Experiments on urban HSI scenery illustrate that the SVDD-based detector yields a higher TPR and lower FPR than the MF in varying scenarios of target spectral variability. Future work will address the problem of automatically selecting a proper value of N and σ for the target training class generation. We also intend to investigate a more efficient selection of the kernel parameter σ rather than performing a linear search over all the candidate σ values. The selection of features that maximize separability is crucial in PR systems. Because of their success in a variety of PR applications, we will investigate the potential of the discrete wavelet transform (DWT) coefficients as features in the context of SVDD-based HSI target detection. 2

REFERENCES 1

S. M. Yamany, A. A. Farag, and S.-Y. Hsu, “A fuzzy hyperspectral classifier for automatic target recognition (ATR) systems,” Pattern Recognition Letters, vol. 20, pp. 1431-1438, 1999. 2 D. Manolakis and G. Shaw, “Detection algorithms for hyperspectral imaging applications,” IEEE Signal Processing Magazine, vol. 19, pp. 29-43, 2002. 3 D. Manolakis, D. Marden, and G. Shaw, “Hyperspectral image processing for automatic target detection applications,” Lincoln Laboratory Journal, vol. 14, pp. 79-114, 2003. 4 G. Shaw, and H. Burke, “Spectral Imaging for Remote Sensing,” Lincoln Laboratory Journal, vol. 14, pp. 3-28, 2003. 5 N. Henz and T. Wagner, “A new approach to the BHEP tests for multivariate normality,” J. Multivar. Anal., vol. 62, pp. 1-23, 1997. 6 V. N. Vapnik, Statistical Learning Theory, New York: Wiley, 1998. 7 D. M. J. Tax, and R. P. W. Duin, “Support Vector Data Description,” Machine Learning, vol. 54, pp. 45-66, 2004. 8 B. Scholkopf and A. Smola, Learning with Kernels-Support Vector Machines, Regularization, Optimization and Beyond. Cambridge: MIT Press, 2002. 9 Z. Zhihong, F. Yun, G. I. Roisman et al., “One-class classification for spontaneous facial expression analysis,” International Conference on Automatic Face and Gesture Recognition, pp. 281-286, 2006. 10 J. Ruirui, L. Ding, W. Min et al., “The Application of SVDD in Gene Expression Data Clustering.” International Conference on Bioinformatics and Biomedical Engineering, pp. 371-374, 2008. 11 C. Lai, D. M. J. Tax, R. P. W. Duin, E. Pekalska, and P. Paclik, “A study on combining image representations for image classification and retrieval,” Int. J. Pattern Recognit. Artif. Intell., vol. 18, pp. 867-890, 2004. 12 J. Munoz-Marf, L. Bruzzone, and G. Camps-Vails, “A Support Vector Domain Description Approach to Supervised Classification of Remote Sensing Images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 45, pp. 26832692, 2007. 13 A. Banerjee, P. Burlina, and C. Diehl, “A support vector method for anomaly detection in hyperspectral imagery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 44, pp. 2282-2291, 2006. 14 A. Banerjee, P. Burlina, and R. Meth, “Fast Hyperspectral Anomaly Detection via SVDD,” IEEE International Conference on Image Processing, vol. 4, pp. 101-104, 2007. 15 D. M. J. Tax, and R. P. W. Duin, “Support vector domain description,” Pattern Recognition Letters, vol. 20, pp. 1191-1199, 1999. 16 D. Jesse, and G. Mark, “The relationship between Precision-Recall and ROC curves,” ACM International Conference on Machine Learning, pp. 233-240, 2006. 17 ITRES Research, http://www.itres.com, accessed in 2007.

Proc. of SPIE Vol. 7335 73350C-8