Classifying Hyperspectral Remote Sensing ... - Semantic Scholar

0 downloads 0 Views 233KB Size Report
and Neural Networks III, edited by Harold H. Szu, Proceedings of SPIE Vol. ..... [15] C.-I Chang, S.-S Chiang; J. A. Smith, I. W. Ginsberg, “Linear spectral random ...
Invited Paper

Classifying Hyperspectral Remote Sensing Imagery With Independent Component Analysis Qian Du*1, Ivica Kopriva 2, Harold Szu 3 1

Department of Electrical and Computer Engineering Mississippi State University, Mississippi 39762 2 Department of Electrical and Computer Engineering The George Washington University, Washington DC 20052 3 Office of Naval Research, Arlington, Virginia 22217

ABSTRACT In this paper, we investigate the application of independent component analysis (ICA) to remotely sensed hyperspectral image classification. We focus on the performance of Joint Approximate Diagonalization of Eigenmatrices (JADE) algorithm, although the proposed method is applicable to other popular ICA algorithms. The major advantage of using ICA is its capability of classifying objects with unknown spectral signatures in an unknown image scene, i.e., unsupervised classification. However, ICA suffers from computational expensiveness, which limits its application to high dimensional data analysis. In order to make it applicable to hyperspectral image classification, a data preprocessing procedure is employed to reduce the data dimensionality. Noise adjusted principal component analysis (NAPCA) is used for this purpose, which can reorganize the original data information in terms of signal-to-noise ratio, a more appropriate criterion than variance when dealing with images. The preliminary results demonstrate that the selected major components from NAPCA can better represent the object information in the original data than those from ordinary principal component analysis (PCA). As a result, better classification using ICA is expected. Keywords: Independent Component Analysis. Principal Component Analysis. Noise Adjusted Principal Component Analysis. Unsupervised Classification. Hyperspectral Imagery. Remote Sensing.

1. INTRODUCTION In many real applications of remote sensing image classification, it may be impossible or very difficult to get prior information about class signatures, where unsupervised methods have to be applied. The spatial resolution of remote sensing imagery is relatively rough. Since the area covered by a single pixel is quite large, its reflectance is generally considered as the linear mixture of all the materials resident in the area covered by this pixel. So we have to deal with mixed pixels instead of pure pixels as in conventional digital image processing. Linear spectral unmixing analysis is a popularly used approach in remote sensing image processing to handle mixed pixels, which uncovers endmember distribution in an image scene.1-4 Let L be the number of spectral bands and r a column pixel vector with dimension L in a hyperspectral image. An element ri in the r is the reflectance collected at the i-th band. Let M denote a signature matrix. Each column vector in M represents an endmember signature, i.e., M = [m1 , m2 , , m p ] , where p is

L

the total number of endmembers resident in the image scene. Let α be the unknown abundance column vector of size p × 1 associated with M, which is to be estimated. The i-th item α i in α represents the abundance fraction of mi in pixel r. According to the linear mixture model, r = Mα + n

50

(1)

Independent Component Analyses, Wavelets, Unsupervised Smart Sensors, and Neural Networks III, edited by Harold H. Szu, Proceedings of SPIE Vol. 5818 (SPIE, Bellingham, WA, 2005) · 0277-786X/05/$15 · doi: 10.1117/12.603189

where n is the noise term. Our task in unsupervised mixed pixel classification is to estimate the M and α. When M is known, the estimation of α can be accomplished by least squares approaches. But when M is also unknown, i.e., unsupervised analysis, the task is much more challenging. Independent component analysis (ICA) is a powerful tool for unsupervised classification, which has been successfully applied to blind source separation.5-10 Its basic idea is to decompose a set of multivariate signals into a base of statistically independent sources with the minimal loss of information content so as to achieve detection and classification. The standard linear ICA-based data model with additive noise is u = As + v

(2)

where u is an N dimensional data vector, A is an unknown mixing matrix, and s is an unknown source signal vector. Three assumptions are made on the unknown source signals s: 1) each source signal is an independent identically distributed (i.i.d.) stationary random process; 2) the source signals are statistically independent at any time; and 3) at most one among the source signals has Gaussian distribution. The mixing matrix A although unknown is also assumed to be non-singular. Then the solution to the blind source separation problem is obtained with the scale and permutation indeterminacy, i.e., Q = WA = PΛ , where W represents the unmixing matrix, P is generalized permutation matrix and Λ is a diagonal matrix. These requirements ensure the existence and uniqueness of the solution to the blind source separation problem (except for ordering, sign, and scaling). Comparing to many conventional techniques, which use up to the second order statistics only, ICA exploits high order statistics that makes it a more powerful method in extracting irregular features in the data.11 Several researchers have explored the ICA to remote sensing image classification.12-17 In general, when we intend to apply the ICA approach to classify optical multispectral/hyperspectral images, the linear mixture model in Eq. (1) needs to be reinterpreted so as to fit in the ICA model in Eq. (2). Specifically, the pixel vector r is denoted as u in Eq. (2), the noise term n is denoted as v in Eq. (2), the endmember matrix M in Eq. (1) corresponds to the unknown mixing matrix A in Eq. (2), and abundance fractions in α in Eq. (1) correspond to source signals in s in Eq. (2). Moreover, the abundance fractions are considered as unknown random quantities specified by random signal sources in ICA model (2) rather than unknown deterministic quantities as assumed in the linear mixture mode (1). With these interpretations and abovementioned assumptions, we will use the model (2) to replace model (1) thereafter. The advantages offered by using model (2) in remote sensing image classification are: 1) no prior knowledge of the endmembers in the mixing process is required; 2) the spectral variability of the endmembers can be accommodated by the unknown mixing matrix A since the source signals are considered as scalar and random quantities; and 3) higher order statistics can be exploited for better feature extraction and pattern classification. The strategy of ICA algorithms is to find a linear transform W (i.e., the unmixing matrix of size L × L ) z = Wu = WAs + Wv = Qs + Wv

(3)

such that components of the vector z are as statistically independent as possible. Based on the assumption that source signals s are mutually statistically independent and non-Gaussian (except one that is allowed to be Gaussian), the vector z will represent vector of the source signals s up to the permutation, scale and sign factors. There are several different types of ICA algorithms developed recently.11 Here, we select the popular Joint Approximate Diagonalization of Eigenmatrices (JADE) algorithm 6, which is based on the usage of the fourth order statistics (cumulants). The higher order statistical dependence among data samples is measured by the higher order cross-cumulants. Smaller values of the cross-cumulants represent less dependent samples. In JADE the fourth order statistical independence is achieved through minimization of the squares of the fourth order cross-cumulants between the components zi of z in (3). The fourth-order cross-cumulants can be computed as

[

] [ ]

[ ]

[

C 4 ( z i , z j , z k , z l ) = E z i z j z k z l − E z i z j E [z k z l ] − E [z i z k ]E z j z l − E [z i z l ]E z j z k

]

(4)

for 1 ≤ i , j , k , l ≤ L , where E[⋅] denotes mathematical expectation operator. The optimal unmixing matrix W * is the one that satisfies W * = arg min



(

off W T C 4 ( z i , z j , z k , z l ) W

)

(5)

i, j ,k ,l

Proc. of SPIE Vol. 5818

51

where off(⋅) is measure for the off-diagonality of a matrix. An intuitive way to solving the optimization problem expressed in Eq. (5) is to jointly diagonalize the eigenmatrices of C 4 ( z i , z j , z k , z l ) in Eq. (4) via Givens rotation sweeps. In order for the first and second order statistics not to affect the results, the data u in Eqs. (2) and (3) should be pre-whitened (mean is zero and covariance matrix is the identity matrix). For the purpose of mathematical tractability, the mixing matrix A and unmixing matrix W in ICA model are square matrices of size L × L . The major drawback of ICA is its high computational complexity. For instance, computational complexity of the JADE algorithm is on the order of L4. When it is applied to the hyperspectral imagery, where the L can be as high as 200, an ICA algorithm execution is extremely time-consuming and may be even prohibitory. Therefore, in order to make an ICA algorithm applicable to hyperspectral image classification, a data preprocessing procedure has to be taken to reduce the data dimensionality of a hyperspectral image. The common approach is using the ordinary principal component analysis (PCA) for dimension reduction, which reorganizes the original data information in terms of variance. Here, we propose to use noise adjusted principal component analysis (NAPCA) for this purpose. NAPCA can reorganize the original data information in terms of signal-to-noise ratio, which is a more reasonable criterion for handling image data.18 It will be demonstrated that the selected major components from NAPCA can better represent the original object information than those from PCA. Therefore, the following ICA can provide better object classification.

2. PCA AND NAPCA Principal components analysis (PCA) is a widely used technique for data compression. Its basic idea is to project the original high dimensional data into a new space, where the data is reorganized based on the variance criterion. Let K L× L be the data covariance matrix with dimensionality L. Assume V is the eigenvector matrix of K L× L such that V T K L×L V = Λ

where

Λ

(6)

is the eigenvalue matrix of K L× L . Then the principal components can be calculated by z PCA = V T (z − m )

(7)

where m is the data mean vector. This transformation in Eq. (7) is called PCA. We know that variance is not a good criterion for image processing because it is not directly related to image quality. Instead, signal-to-noise ratio (SNR) is a more reasonable criterion to reorganize the data. Green et al. proposed a noise-adjusted analysis, which is referred to as noise-adjusted principal components analysis (NAPCA) in this paper [18]. Its basic idea is to conduct the noise-whitening to the original data, then perform the PCA to the noise-whitened data. Since the noise variance is unity in the noise-whiten data, the resultant principal components are in the order of signal-to-noise ratio. Let K n be the noise covariance matrix, and F is the noise-whitening matrix such that FT K nF = I

(8)

where I is an identity matrix. Transforming K L× L by F, i.e., FT K L×L F = Σ n _ adj

where

Σ n _ adj

(9)

is the data covariance matrix after noise whitening. Finding matrix G such that G T Σ n _ adj G = I

(10)

then the principal components using NAPCA can be computed by z NAPCA = G T F T (z − m ) .

52

Proc. of SPIE Vol. 5818

(11)

The major task to performing NAPCA is to have an accurate noise covariance matrix Σ n , which in general is difficult. Some simple and relatively effective techniques have been proposed in [19]. The following method is used for its simplicity and effectiveness.

K L× L can be decomposed as K L×L = D K E K D K

(12)

where D K is a diagonal matrix D K = diag{σ 1 , σ 2 , L, σ L } with {σ l2 } being diagonal elements of K L× L which is the l =1 variance of each band, and L

EK =

with

ρmn

 1  ρ  21  ρ 31   ρ  L1

ρ12 1

L K O O

ρ13 ρ 23

O M M O ρ L ρ 32

ρ 1L ρ 2L

M

ρ L ( L −1)

L2

     ρ ( L −1)L  1 

(13)

being the correlation coefficient at the (m,n) -th entry of K L× L .

Similarly, in analogy with the decomposition of K L× L , its inverse K −1L× L can be also decomposed as K −L1× L = D K −1 E K −1 D K −1

(14)

L where D K is a diagonal matrix given by D K = {ς 1 , ς 2, L, ς L } with {ς l2 } being the diagonal elements of K −1L× L and l =1 −1

−1

E K −1 =

L K 1 ξ O O M M O O ξ L ξ( )

 1  ξ  21 ξ 31   ξ  L1

ξ12

ξ13 ξ 23

32

L L −1

L2

ξ 1L ξ 2L

     ξ ( L −1)L  1 

M

(15)

ξ mn is the correlation coefficient at the (m,n) -th entry of K −1L× L . It turns out that the ς l to the σ l by the following formula with

ς l = σ l−1 (1 − r 2L −l )

−1/ 2

=

in Eq. (15) can be related

1

(16)

σ (1 − r L2 −l ) 2 l

where r L − l is the multiple correlation coefficients of the l-th band Bl on the other L-1 bands {Bl }l =1,l ≠ k obtained by L

2

using the multiple regression theory. So,

ς l2

is the reciprocal of a good noise variance estimate of band Bl . It should be

noted that the D K −1 in Eq. (14) is not an inverse of the DK in Eq. (12) and nor is E K advantage of using does not.

ςl

over

σl

is that as shown in Eq. (16),

ςl

−1

removes its correlation on other

 1 1 Therefore, the noise covariance matrix Kn can be estimated by K n = diag  2 , 2 ,  ς 1 ς 2 matrix.

in Eq. (14). The major

ς l 's for l ≠ k

while

σl

L, ς1  , which is a diagonal 2 L



Proc. of SPIE Vol. 5818

53

3. EXPERIMENT The HYDICE image scene of size 64× 64 shown in Figure 1(a) was collected in Maryland in 1995 from the flight altitude of 10,000 ft with approximately 1.5 m spatial resolution. The atmospheric water bands with low signal-tonoise ratio were removed reducing the data dimensionality from 210 to 169. The image scene has 128 lines and the number of pixel in each line M is 64, so the total number of pixel vector N is 64 × 64 = 4096 . This scene includes 15 panels arranged in a 5× 3 matrix. Each element in this matrix is denoted by pij with row indexed by i = 1, ,5 and

L

column indexed by j = a, b, c . The three panels in the same row pia , pib , pic were made from the same material of size 3m × 3m , 2m × 2m , 1m × 1m , respectively, which could be considered as one class, Pi . The ground truth map in Figure 1(b) shows the precise locations of the panel center pixels. One pixel from each panel at the leftmost column was selected as target signature. These 5 panel classes have very close spectral signatures. In addition to the panels, there were two nature objects: grass field as the background and tree line to the left. Figure 2 showed the first 40 principal components from PCA. We can see that they were not ordered in terms of image quality. For instance, PC21 had noise only and PC35 had information about P1, but PC21 had higher rank than PC35. Figure 3 showed the first 40 principal components from NAPCA. All the first 35 PCs had some information, and the PCs starting from 36 were noisy. More important, the major different panel information exists within the first 16 PCs. Compared to the first 16 PCs of PCA, these information was more distinct for panel classification. The JADE classification results using the PCA for dimension reduction were shown in Figure 4, where first 20, 30, and 40 PCs were used. Only the independent components (ICs) related to panels were shown here. Obviously, using more PCs could improve the classification. However, even the first 40 PCs were used, P2 and P3, and P4 and P5 could not be well separated from each other because their spectral signatures are close. Figure 5 showed the JADE classification using the PCs from NAPCA, where using only 30 PCs all the five panel classes could be correctly classified.

4. CONCLUSION Independent component analysis (ICA) is a popular tool for unsupervised classification. But its very high computational complexity impedes the application to high dimensional data analysis. The common approach is to use principal component analysis (PCA) to reduce the data dimensionality before applying the ICA classification. When dealing with image data, we find out that it may be more appropriate to use noise adjusted principal component analysis (NAPCA) for this preprocessing step. The principal components from NAPCA are ranked in terms of image quality. As a result, important object information can be well compacted into the major components. When an ICA algorithm is operated on these major components, better classification is expected. The major difficulty in NAPCA is the estimation of noise covariance matrix. The discussed estimation method based on interband correlation is very simple and seems to be effective in our HYDICE experiment.

REFERENCES [1] [2] [3]

54

R. B. Singer and T. B. McCord, “Mars: large scale mixing of bright and dark surface materials and implications for analysis of spectral reflectance,” Proc.10th Lunar Planet. Sci. Conf., pp.1835-1848, 1979. J. B. Adams and M. O. Smith, “Spectral mixture modeling: a new analysis of rock and soil types at the Viking lander 1 suite, ” J. Geophysical Res., vol. 91, no. B8, pp. 8098-8112, 1986. J. J. Settle and N. A. Drake, “Linear mixing and estimation of ground cover proportions,” International Journal of Remote Sensing, vol. 14, pp. 1159-1177, 1993.

Proc. of SPIE Vol. 5818

[4]

[5] [6]

J. B. Adams, M. O. Smith, and A. R. Gillespie, Image spectroscopy: interpretation based on spectral mixture analysis. Remote Geochemical Analysis: Elemental and Mineralogical Composition, edited by C.M. Pieters and P.A. Englert, pp. 145-166 (Mass: Cambridge University Press), 1993. C. Jutten and J. Herault, “Blind separation of sources, Part I: an adaptive algorithm based on neuromimetic architecture,” Signal Processing, vol. 24, no. 1, pp. 1-10, 1991. J. F. Cardoso and A. Souloumiac, “Blind beamforming for non gaussian signals,” IEE Proceedings-F, vol. 140, no. 6, pp. 362-370, 1993.

P. Comon, “Independent component analysis  a new concept?” Signal Processing, vol. 36, pp. 287-314, 1994. J. F. Cardoso and B. Laheld, “Equivalent adaptive source separation,” IEEE Transactions on Signal Processing, vol. 44, no. 12, pp. 3017-3030, 1996. [9] A. Cichocki and R. Unbehauen, “Robust neural networks with online learning for blind identification and blind separation of sources,” IEEE Transactions on Circuit and Systems, vol. 43, no. 11, pp. 894-906, 1996. [10] H. Szu, “Independent component analysis (ICA): an enabling technology for intelligent information/image technology (IIT),” IEEE Circuits Devices Magazine, vol. 10, pp. 14-37, 1999. [11] A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis, John Wiley & Sons, 2001. [12] H. Szu and J. Buss, “ICA neural net to refine remote sensing with multiple labels,” Proceedings of SPIE, vol. 4056, pp. 32-49, 2000. [13] T. Yoshida and S. Omatu, “Pattern recognition with neural networks,” Proceedings of International Geoscience and Remote Sensing Symposium, pp. 699-701, 2000. [14] T. Tu, “Unsupervised signature extraction and separation in hyperspectral images: A noise-adjusted fast independent component analysis approach,” Optical Engineering, vol. 39, no. 4, pp. 897-906, 2000. [15] C.-I Chang, S.-S Chiang; J. A. Smith, I. W. Ginsberg, “Linear spectral random mixture analysis for hyperspectral imagery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 40, no. 2 , pp. 375-392, 2002. [16] C. H. Chen and X. Zhang, “Independent component analysis for remote sensing study,”Proceedings of SPIE, vol. 387, pp. 150-158, 1999. [17] S. Fiori, “Overview of independent component analysis technique with an application to synthetic aperture radar (SAR) imagery processing,” Neural Networks, vol. 16, pp. 453-467, 2003. [18] A. A. Green, M. Berman, P. Switzer, and M. D. Craig, “A transformation for ordering multispectral data in terms of image quality with implications for noise removal,” IEEE Trans. Geosciences Remote Sensing, vol. 26, 1988, pp. 65-74. [19]. R. E. Roger, and J.F. Arnold, “Reliably estimating the noise in AVIRIS hyperspectral imagers,” Int. J. Remote Sensing, vol. 17, no. 10, 1951-1962, 1996. [7] [8]

p11, p12, p13 p21, p22, p23 p31, p32, p33 p41, p42, p43 p51, p52, p53

(a)

(b)

Figure 1. (a) A HYDICE panel scene that contains 15 panels; (b) Ground truth map of spatial locations of the 15 panels.

Proc. of SPIE Vol. 5818

55

PC1

PC2

PC3

PC4

PC5

PC6

PC7

PC8

PC9

PC10

PC11

PC12

PC13

PC14

PC15

PC16

PC17

PC18

PC19

PC20

PC21

PC22

PC23

PC24

PC25

PC26

PC27

PC28

PC29

PC30

PC31

PC32

PC33

PC34

PC35

PC38

PC39

PC40

PC36

PC37

Figure 2: The first 40 principal components from PCA. 56

Proc. of SPIE Vol. 5818

PC1

PC2

PC3

PC4

PC5

PC6

PC7

PC8

PC9

PC10

PC11

PC12

PC13

PC14

PC15

PC16

PC17

PC18

PC19

PC20

PC21

PC22

PC23

PC24

PC25

PC26

PC27

PC28

PC29

PC30

PC31

PC32

PC33

PC34

PC35

PC36

PC37

PC38

PC39

PC40

Figure 3: The first 40 principal components from NAPCA. Proc. of SPIE Vol. 5818

57

IC8

IC10

IC17 4(a) From 20 PC components

IC18

IC19

IC17

IC25

IC26 4(b) From 30 PC components

IC27

IC28

IC16

IC32

IC33 4(c) From 40 PC components

IC35

IC36

Figure 4: Classification results using PCA for data dimension reduction.

IC2

IC7

IC12 5(a) From 20 PC components

IC14

IC17

IC6

IC15

IC18 5(b) From 30 PC components

IC20

IC22

IC9

IC16

IC20 5(c) From 40 PC components

IC25

IC29

Figure 5: Classification results using NAPCA for data dimension reduction.

58

Proc. of SPIE Vol. 5818