Fusion of Support Vector Machines for ... - Semantic Scholar

15 downloads 161679 Views 539KB Size Report
data, support vector machines (SVM), synthetic aperture radar ...... B. Waske visited the Department of Electrical and Computer ... 3/4, pp. 352–365, Jun. 2005.
3858

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 12, DECEMBER 2007

Fusion of Support Vector Machines for Classification of Multisensor Data Björn Waske, Student Member, IEEE, and Jón Atli Benediktsson, Fellow, IEEE

Abstract—The classification of multisensor data sets, consisting of multitemporal synthetic aperture radar data and optical imagery, is addressed. The concept is based on the decision fusion of different outputs. Each data source is treated separately and classified by a support vector machine (SVM). Instead of fusing the final classification outputs (i.e., land cover classes), the original outputs of each SVM discriminant function are used in the subsequent fusion process. This fusion is performed by another SVM, which is trained on the a priori outputs. In addition, two voting schemes are applied to create the final classification results. The results are compared with well-known parametric and nonparametric classifier methods, i.e., decision trees, the maximumlikelihood classifier, and classifier ensembles. The proposed SVM-based fusion approach outperforms all other approaches and significantly improves the results of a single SVM, which is trained on the whole multisensor data set. Index Terms—Data fusion, multisensor imagery, multispectral data, support vector machines (SVM), synthetic aperture radar (SAR) data.

I. I NTRODUCTION

L

AND COVER classification is one of the widest used applications in the field of remote sensing. The detailed knowledge of land cover is an important input variable for several environmental monitoring systems, e.g., in the fields of subsidy control, urban sprawl, or land degradation. Many of these applications include multisource remote sensing data. By combining different data sources, e.g., synthetic aperture radar (SAR) and optical imagery, the overall classification accuracy is increased compared to the quality of a single-source classifier [1]–[4]. Regarding upcoming missions with increased revisit times and better spatial resolutions like, e.g., TerraSAR-X, Radarsat-2, and Rapideye, such multisensor data processing approaches become even more attractive. Hence, appropriate data fusion and classification of multisource data sets is an important ongoing research topic in the field of remote sensing. Manuscript received December 10, 2006; revised March 19, 2007. This work was supported in part by the German Aerospace Center (DLR) and Federal Ministry of Economics and Technology (BMWi) under the project Enviland (FKZ 50EE0404), as well as an ESA Cat-1 proposals (C1P 3115) and European OASIS program (OASIS 58—CE 6324) and in part by the German Research Foundation (DFG) under the Research Training Group 722 (Information Techniques for Precision Crop Protection) at the University of Bonn. B. Waske is with the Center for Remote Sensing of Land Surfaces, University of Bonn, 53129 Bonn, Germany (e-mail: [email protected]). J. A. Benediktsson is with the Department of Electrical and Computer Engineering, University of Iceland, 107 Reykjavik, Iceland (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TGRS.2007.898446

Commonly used statistical pattern recognition methods are often not appropriate for the classification of multisource data, because in most cases the data cannot be modeled by a convenient multivariate statistical model [5], [6]. In addition, the individual data sources may not be equally reliable. One source can be more applicable to describe a specific land cover class, and perhaps another source is more adequate for another class. Thus, it is appropriate to weight the different sources during the classification process. On the other hand, conventional statistical techniques do not allow such weighting. Therefore, other methods are more appropriate in this context, and several nonparametric approaches have been introduced, e.g., artificial neural networks (ANN) [5], [7], self-learning decision trees (DT) [8], [9], and support vector machines (SVM) [10], [11]. These methods are not constrained to prior assumptions on the distribution of input data as is the case for the maximumlikelihood classifier (MLC) and enable the weighting of the different sources. Hence, they are well suited for complex data sets. ANNs have been used successfully for the classification of multisource data sets and have been shown in experiments to increase the classification accuracy compared to conventional statistical classifiers. In [5], a backpropagation ANN was used for the classification of multisource data sets, containing multispectral data and topographical information. In other studies ANNs were used for the classification of multisensor imagery, consisting of multispectral and SAR data [7], [12]. Self-learning DTs have been applied successfully to diverse remote sensing imagery [8], [9], [13], [14]. Similar to neural networks, they are not constrained to any assumptions like parametric distributions of input data and can handle different scaled data types. In contrast to ANNs, the training time of DTs is relatively low, and their training and classification is rather simple [8], [9]. The performance of a DT can be increased by DT-based multiple classifiers. By training the base classifier on modified input data, a set of independent classifiers is created. Afterward, the different outputs are combined to create the final result. In other studies, approaches have been used which are based on multiple classifier systems, consisting of different classifier algorithms [1], [15], [16]. The concept of a parallel use of neural and statistical techniques was introduced for classifying a multisensor data set, consisting of optical and SAR imagery [1]. The different data sources were classified separately by both the statistical classifier and the neural network. Afterward, the outputs were combined by decision fusion. Decision fusion can be defined as the concept of combining information from different data sources, after each individual data set has been

0196-2892/$25.00 © 2007 IEEE

Authorized licensed use limited to: NATIONAL UNIVERSITY OF ICELAND. Downloaded on March 27, 2009 at 06:28 from IEEE Xplore. Restrictions apply.

WASKE AND BENEDIKTSSON: FUSION OF SVM FOR CLASSIFICATION OF MULTISENSOR DATA

classified previously. In several studies, the decision fusion has been based on the consensus theory, which uses single probability functions to summarize estimates from various data sources [17], based on consensus rules. Common consensus rules are the linear opinion pool and the logarithmic opinion pool, which use the weighted sum and the weighted products, respectively, of the source specific a posteriori probabilities. Thus, these methods involve the problem of selecting weights. The weights generally reflect the goodness of the input data, and several different selection schemes have been proposed [18]. In contrast to these costly techniques, voting concepts like majority voting and complete agreement have been proposed. They are computationally less demanding than methods which use complicated weighting schemes. In [1], a statistical classifier and a neural network were trained parallel on the same data set and the results were combined by several voting schemes. When the two outputs disagreed, a training sample was rejected and classified again by a second neural network. In doing so, the accuracy was significantly increased. SVMs are another, recent nonparametric classifier approach, which is well known in the fields of machine learning and pattern recognition. SVMs have been used successfully in several remote sensing studies [10], [11], [19]–[22]. In these studies, they perform more accurate than other classifier or performed at least equally well. SVMs aim to discriminate two classes by fitting an optimal separating hyperplane to the training data within a multidimensional feature space, by using only the closest training samples [23]. Thus, the approach only considers samples close to the class boundary and work well with small training sets, even when high dimensional data sets are classified [20], [22]. Similar to neural networks and DTs, they are not constrained to assumptions concerning the distribution of the input data and can handle different scaled data. Although SVMs have given promising accuracies, only a few studies are known which use SVMs for classifying multisource data. Song et al. [24] used SVMs for classifying geospatial data, consisting of multispectral images and topographical information among other data types. In other studies, the SVM approach was modified for classification of different data sources. Halldorsson et al. [25] extended a common kernel function for classifying a multisource data set, containing Landsat multispectral-scanner data and topographical information. In [26], composite kernels were introduced for combining spectral and spatial information of a hyperspectral image. Fauvel et al. [27] combined SVMs, to fuse spectral and spatial information (i.e., extended morphological profiles) of a hyperspectral data set. Two SVMs were trained separately on the spectral image on one hand and the extended morphological profiles on the other. Afterward, the outputs were fused by using different voting schemes, e.g., absolute maximum and majority voting. The above studies have shown that the classification accuracy of SVMs can be increased by using extended (or composed) kernel functions or separate SVMs when classifying diverse data sets. Therefore, it could be appropriate to handle the different sources separately when classifying multisensor imagery, consisting of multitemporal SAR and multispectral data. In order to accomplish that, a separate SVM trained for classifying

3859

individual data sources in multisensor data sets could be more efficient. In this paper, the concept of SVMs is applied to a multisensor data set, containing multitemporal SAR data and optical images. The individual data sources, i.e., the SAR data and the optical images have undergone separate, preliminary classifications using SVMs. The outputs of the SVM classifiers are combined by well-known voting concepts, i.e., majority voting and the absolute maximum. In addition, the outputs are fused by another SVM. The results are compared to the classification outputs of various algorithms, like the maximum likelihood and DT classifiers. This paper is organized as follows. The concept of SVMs is introduced in Section II. The data set and preprocessing is described briefly in Section III. The proposed classification and fusion methods are discussed in Section IV. The results are presented in Section V. Final conclusions are given in Section VI. II. S UPPORT V ECTOR M ACHINES SVMs discriminate two classes by fitting an optimal linear separating hyperplane (OSH) to the training samples of two classes in a multidimensional feature space. The optimization problem being solved is based on structural risk minimization and aims to maximize the margins between the OSH and the closest training samples—the so-called support vectors [23]. For linearly not separable cases, the input data are mapped into a high-dimensional space in which the new distribution of the samples enables the fitting of a linear hyperplane. A detailed description on the general concept of SVM is given by Burges [28], and Schölkopf and Smola [29]. An overview in the context of remote sensing is given by Huang et al. [10]. A brief summary of SVM theory is given below. Let, for a binary classification problem in a d-dimensional feature space d , xi ∈ d , i = 1, 2, . . . , L be a training data set of L samples with their corresponding class labels yi ∈ {1, −1}. The hyperplane f (x) is defined by the normal vector w ∈ d and the bias b ∈ , where |b|/w is the distance between the hyperplane and the origin, with w as the Euclidean norm from w. f (x) = w · x + b

(1)

The support vectors lie on two hyperplanes w · x + b = ±1, which are parallel to the OSH. The margin maximization leads to the following optimization problem:   L  w2 +C ξi (2) min 2 i=1 where the slack variables ξi and the regularization parameter C are introduced to deal with misclassified samples in a nonseparable case. The constant C is added as a penalty for cases which are located on the wrong side of the hyperplane. Effectively, it controls the shape of the classification boundary and a large value of C might cause an overfitting to the training data. Using so-called kernel methods, the above linear SVM

Authorized licensed use limited to: NATIONAL UNIVERSITY OF ICELAND. Downloaded on March 27, 2009 at 06:28 from IEEE Xplore. Restrictions apply.

3860

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 12, DECEMBER 2007

approach is extended for nonlinear separable cases. Based on a nonlinear mapping of the data into a higher dimensional Hilbert feature space, e.g., an OSH can be fit to a more complex class distribution, which is not separable in the original feature space. The input sample x can be described by in the new highdimensional space. The computationally extensive mapping of in a high dimensional space is reduced by using a positive definite kernel k, which meets Mercers conditions [23] (Φ(xi )Φ(xj )) = k(xi , xj ).

(3)

Consequently, the final hyperplane decision function can be defined as  L   f (x) = αi yi k(xi , xj ) + b (4) i=1

where αi are Lagrange multipliers. Hence, an explicit knowledge of Φ is not required, and only the kernel function is needed. A widely used kernel in remote sensing applications [10], [19], [20] is the Gaussian radial basis function [23]   (5) k(xi , xj ) = exp −γxi − xj 2 . Using the training process involves the estimation of the kernel parameter γ and the regularization parameter C. In this paper, several approaches for an automatic model selection have been introduced [30]–[32] that are based on a leave-oneout procedure. As previously mentioned, SVMs have originally been developed for binary classification problems, which normally do not exist in the context of remote sensing applications. In the literature, several approaches have been introduced to solve multiclass problems. In general, the n-class problem is split into several binary problems and the individual binary classifiers are combined in a classifier ensemble. Two main approaches exist: the one-against-one (OAO) strategy and the one-againstall (OAA) strategy. Let Ω = {ωi } with i = 1, . . . , n be a set of n possible class labels (i.e., land cover classes). The OAO strategy trains n(n − 1)/2 individual binary SVMs, one for each possible pairwise classification problem ωi and ωj (ωi = ωj ). The sign of the distance to the hyperplane is used for the OAO voting scheme. For the final decision, the score function Si is computed for each class ωi , which sums all positive (i.e., sgn = +1) and negative (i.e., sgn = −1) votes for the specific class. The final class for sample x is predicted by a simple majority vote Si (x) =

n 

sgn (fij (x)) .

(6)

j=1 j=i

In case of the OAA approach, a set of n binary classifiers is trained to separate each class ωi from the remaining Ω − ωi . Instead of using the simple sign of the decision function, the maximum decision value (i.e., the distance to the hyperplane) determines the final class label. In more sophisticated approaches, the SVMs are directly defined as a multiclass problem [33], [34]. The simultaneous

Fig. 1. Multisource data set 2005. Multitemporal ERS-2 composite (April 21/ May 26/June 30) and multispectral false-color Landsat-5 TM composite (right).

separation of more than two classes leads to more complex optimization problem [34]. As a consequence the method can be less stable and ineffective compared to classical two-class optimizations. As an alternative multiclass strategy, a hierarchical tree-based SVM approach was introduced by Melgani and Bruzzone [20]. Compared with the original OAO, the approach was shown to be computationally advantageous, but achieved a slightly lower classification accuracy. In regard of the various agricultural land cover classes, which considered in the experiments, it can be supposed that the classes are not separable simply and equally well. Furthermore, in terms of a multisource data set, consisting of multitemporal and multisensor imagery, it can be assumed that the differentiation between classes requires the estimation of a more complex discriminant function. Although the OAO strategy requires a larger number of classifiers, the whole classification problem is divided into many problems that are much simpler. In contrast to this, the main disadvantage of the OAA strategy is that the discrimination between one class and all the others can involve the estimation of a more complex discriminant function. In addition, for an adequate fusion of the multisource data, an “ideal” class-specific differentiation (i.e., the OAO strategy) seems worthwhile. Hence, the classical OAO approach seems appropriate here, which has been used successfully in several remote sensing studies [21], [22]. III. D ATA S ET AND P REPROCESSING Our study site is located near Bonn in the German state North Rhine-Westphalia. The area is almost flat, is dominantly used for agriculture, and is characterized by typical spatial patterns caused by differences in the phenology of planted crops. The size of the agricultural parcels varies between 3 and 5 ha, with cereals and sugar beets being the main crops. The experiments were applied on two multisensor data sets from 2005 and 2006 (Figs. 1 and 2). Each contains a set of multitemporal SAR images, containing Envisat ASAR alternating polarization and ERS-2 precision images (see Tables I and II). Hence, the multitemporal SAR data set comprised information from varying phenological stages and different data types. In addition, a Landsat-5 Thematic Mapper (TM) image from May 28, 2005 and a SPOT 5 scene from June 26, 2006 were available (see Table III). For the generation of training and

Authorized licensed use limited to: NATIONAL UNIVERSITY OF ICELAND. Downloaded on March 27, 2009 at 06:28 from IEEE Xplore. Restrictions apply.

WASKE AND BENEDIKTSSON: FUSION OF SVM FOR CLASSIFICATION OF MULTISENSOR DATA

3861

TABLE I MULTITEMPORAL SAR DATA SET FROM 2005 (DATA SET 1)

Fig. 2. Multisource data set 2006. Multitemporal ERS-2 composite (May 11/ June 15/July 20) and multispectral false-color SPOT 5 composite (right).

TABLE II MULTITEMPORAL SAR DATA SET FROM 2006 (DATA SET 2)

backscatter intensity following the procedures [35]. In addition, an enhanced Frost filter was applied to reduce the speckle noise. Finally, the SAR images (12.5 m) were resampled to the pixel size of the Landsat TM scene (30 m) and orthorectified with a spatial accuracy of approximately one pixel, using a digital elevation model, orbit parameters and the corrected Landsat scene as a reference image. IV. M ETHODS

TABLE III AVAILABLE MULTISPECTRAL IMAGERY FROM 2005 TO 2006

validation data sets, extensive ground truth campaigns were conducted in summer 2005 and 2006. Regarding the dominant agricultural land use and the typical crop penology within this region, it is assumed that no critical changes are present during the period of image acquisition. After sensor calibration, orthorectification and atmospheric correction of the multispectral images were performed based on a digital elevation model. For improved comparison, the SPOT 5 scene was resampled to the same spatial resolution as the Landsat-5 TM image. The SAR imagery was calibrated to

All SVMs were trained on three different data sets per year: Multitemporal SAR data, multispectral image, and a multisensor data set. To simplify the parameter selection and for a more applicable fusion of the different single-source SVMs, the data sets were normalized before the classifier training. By following the OAO strategy, a high number of binary classifiers with optimal parameters for γ and C were defined, as this seemed appropriate for the aim of this paper. For the parameter selection, the Looms (Leave-One-Out Model Selection) approach by Lee and Lin was used [32]. The Looms approach is based on a leave-one-out cross validation. It selects the best values for γ and C in a user-defined range of possible parameters based on a leave-one-out cross validation (C = 0.01 − 10, γ = 1 − 10 000). In doing so, an output f (x) (i.e., distances to the hyperplane) of the final discriminant function (4) was generated for each binary classification problem. These outputs were then used for the decision fusion process to predict the final class membership of each sample. Three different fusion strategies were investigated: a simple majority vote, an absolute maximum rule, and an additional SVM. As in the classical majority voting (6), all positive and negative votes for a specific class are summed. In contrast to a single-source approach, the outputs of both SVM classifiers are considered during this multisource majority voting. In general, the distance to the hyperplane was used for the decision process in the one against all strategy and the final class is the one with the highest absolute distance to the hyperplane. The second fusion approach is based on this concept. For fusing the individual SVM outputs of the SAR (f (x)SAR ) and optical imagery (f (x)opt ), the absolute maximum rule was extended for the one against one strategy, similar to what was done in Fauvel et al. [27]. For each binary two-source classifiers, separating the classes ωi and ωj (ωi = ωj ), the distance of the first SVM fSAR (x) was compared to the distance of the second SVM

Authorized licensed use limited to: NATIONAL UNIVERSITY OF ICELAND. Downloaded on March 27, 2009 at 06:28 from IEEE Xplore. Restrictions apply.

3862

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 12, DECEMBER 2007

TABLE IV LAND COVER CLASSES

Fig. 3. Schematic diagram of SVM-based decision fusion. SVMOAO indicates an SVM trained on the SAR data, the optical images, and the data set f (x)all , respectively; f (x)all contains the single-source outputs f (x)SAR and f (x)opt .

fopt (x). The absolute maximum determines the decision for the two-source classification problem Fmax (x) = AbsMax [fSAR (x), fopt (x)] .

(7)

Afterward, a simple majority vote was performed to determine the final class membership. In the proposed fusion approach, an additional SVM was applied. This additional SVM was trained on the data set f (x)all , containing all outputs (i.e., f (x)SAR and f (x)opt ) of the two single-source SVMs (see Fig. 3). In addition to the SVM, three different classification approaches were applied on the data sets: MLC, a DT, and a DT-based classifier ensemble. The MLC, which is derived from the Bayes rule when classes have equal priors, is one of the most common supervised classification techniques in the field of remote sensing. It assumes that the probability density function for each class is multivariate, and often a Gaussian distribution is assumed. A pixel is finally classified to that class, for which has the highest likelihood. A detailed introduction to the MLC is given for example by Landgrebe [36], and Richards and Jia [37]. DT classifiers successively partition the training data into an increasing number of smaller homogenous classes by producing efficient rules, estimated from the training data. The main element of the classifier is the split rule, which is used at each node of the tree. A well-known rule is the Gain ratio criterion, which is implemented in the algorithm C4.5 that was applied here [38], [39]. The criterion is based on the measurement of the reduction in the entropy of the data created by each split. DT-based classifier ensembles have been used in several studies to increase classification accuracies when compared to accuracies for single classifiers [9], [40], [41]. By resampling the input data (i.e., training data and/or input features) between individual trainings of the so-called base classifier, a set of independent classifiers is created. Afterward, the outputs are combined to obtain a final result. Boosting is a well-known technique for the creation of classifier ensembles.

By boosting, the distribution of the training samples is adaptively changed during the training process. In the first training phase, the samples are equally weighted. Iteratively, the weights of the samples are changed and the misclassified samples be assigned a higher weight than those classified correctly. The next classifier in the ensemble is based on the newly distributed samples. In doing so, boosting can reduce the variance and the bias of the classification. A widely used boosting approach is the AdaBosst.M1 [42]. In contrast to the fused SVMs that perform the data fusion at the decision level, the other classifiers perform the fusion at the data level (maximum likelihood) or at the feature level (DT). A comparison of the results of these algorithms seems worthwhile, because of this distinction and also because of the numerous applications that are based on the well-known algorithms. Detailed ground truth information was used for generating training and validation sample sets. A sample set can be generated in different ways: e.g., by simple random sampling, systematic sampling, or stratified random sampling. Using simple random sampling, each sample has an equal chance of being selected but the systematic technique selects samples with an equal interval over the study area. The latter combines a priori knowledge about a study area (i.e., land cover information) with the simple random sampling approach [43]. Using field surveys as a priori knowledge, the stratified random sampling guarantees that all classes are included in the sample set. In this paper, three sample sets for eight classes are generated with an equalized random sampling (see Table IV). In doing so, each class has the same sample size, containing 50 or 150 and 300 samples per class, respectively, (from now on referred to as training set #50, training set #150, and training set #300). Using systematic random sampling as before, independent validation sets were generated, containing 4000 samples, 500 of each class. In the discussion below, the training sets are labeled with number of different samples per class. Accuracy assessment was performed using overall accuracies and confusion matrices in order to derivate the producer’s and user’s accuracies [37]. The kappa coefficient of agreement was used for a comparison of different maps [43]. The accuracy assessment was conducted five times, using a different validation set. Finally, the results were averaged.

Authorized licensed use limited to: NATIONAL UNIVERSITY OF ICELAND. Downloaded on March 27, 2009 at 06:28 from IEEE Xplore. Restrictions apply.

WASKE AND BENEDIKTSSON: FUSION OF SVM FOR CLASSIFICATION OF MULTISENSOR DATA

3863

TABLE V DATA SET 1, 2005. OVERALL ACCURACIES IN PERCENTAGE

TABLE VI DATA SET 2, 2006. OVERALL ACCURACIES IN PERCENTAGE

V. E XPERIMENTAL R ESULTS The experiments in this paper were conducted on the two multisensor data sets consisting of SAR and multispectral imagery (see Section III), using the four different classifier algorithms (MLC, DT, boosted DT, and SVM). For both years, all methods were applied three times: On the multitemporal SAR data, the multispectral image and the multisensor data set. As expected, the experimental results clearly show the positive effect of increasing the number of training samples, as well as the use of multisensor data sets. Irrespective of the classifier algorithm, the accuracy was significantly improved by the multisensor data set (see Tables V and VI). Comparing the multisource-based results of the SVMs, the total accuracy increased by up to 10% compared to the classification results achieved on the optical images (see, e.g., Table V, training set #300) and up to 17% when compared to the accuracy achieved with the SAR data (see, e.g., Table VI, training set #300). Even when only a “weaker” classifier was used (in this case a DT for the whole data), the accuracy slightly increased by classifying a multisensor data set. Comparing the different algorithms, it can be assessed that the single DT performed worst in terms of accuracies. On the other hand, a boosted DT generally outperformed the MLC in most cases or performed at least equally well in terms of accuracies. The SVMs achieved the highest overall accuracies of all algorithms when classifying the multitemporal SAR data, independent of the sample set size. When classifying the multispectral data, the SVMs performed slightly better or comparable to the other approaches. In contrast to that, the performance of the boosted DT was comparable to the SVMs or even outperformed them in terms of accuracies in classification of the multisensor imagery. On one hand, a classifier ensemble could seem more adequate for classifying multisensor imagery (see Table V, training set #300). On the other hand, it can be expected, that an adequate multisensor SVM approach achieves higher accuracies, because

TABLE VII MULTISOURCE DATA SET 1, 2005. OVERALL ACCURACY [PERCENT] AND KAPPA COEFFICIENT

TABLE VIII MULTISOURCE DATA SET 2, 2006. OVERALL ACCURACY [PERCENT] AND KAPPA COEFFICIENT

the single source SVMs perform generally better than other classifiers in terms of accuracies, or at least equal. The performance of multisource SVM can be further increased when the SVM training process is performed separately for the two data sources and an adequate fusion technique is used (see Tables VII and VIII). The proposed approach—which combines the separate outputs of a priori trained singlesource SVMs by another SVM (“Fused SVM”)—outperforms all other classification methods in terms of accuracies. Compared to the conventional single SVMs that were trained on the whole multisource data set

Authorized licensed use limited to: NATIONAL UNIVERSITY OF ICELAND. Downloaded on March 27, 2009 at 06:28 from IEEE Xplore. Restrictions apply.

3864

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 12, DECEMBER 2007

TABLE IX DATA SET 1, 2005. CLASS-SPECIFIC ACCURACIES [PERCENT] FOR SAMPLE SET #300. BOLD NUMBERS INDICATES THE BEST RESULTS

crops” are less variable and consist dominantly of winter wheat and sugar beets, respectively. The accuracies for both—“arable crops” and “orchard”—are increased by the proposed approach. For data set 1, the classification accuracy of “arable crops” increased about 10% compared to the single SVM and 0.8% compared to the DT approach (see Table IX). The improvement for the “orchard” class was less dominant but the accuracies increased by 4.5% and 1%, respectively (see Table X). The accuracies for “canola,” “root crops,” and “urban” are also improved. On the other hand, the classifier accuracies of the land cover types “cereals,” “forest,” and “grassland” were decreased by the SVM-based fusion approach. VI. C ONCLUSION

TABLE X DATA SET 2, 2006. CLASS-SPECIFIC ACCURACIES [PERCENT] USING SAMPLE SET #300. BOLD NUMBERS INDICATES THE BEST RESULTS

(same results as in Tables V and VI), the overall accuracies are significantly increased. The improvement varies between 2.3% and 3.6% for the data set 1 (see Table VII) and 1.2% and 1.5% for the data set of 2006 (see Table VIII). Compared to the boosted DT, the differences between the overall accuracies are less dominant in some cases (see e.g., Table VII, training set #300), but the Fused SVM still performs better or comparable. Whereas the fusion by a simple majority seems inefficient and the overall accuracy can drop below the accuracy achieved by a single multisensor SVM or a boosted DT. The absolute maximum strategy is more efficient and outperformed the single SVM in most cases. Particularly on data set 2 (2006), the boosted DT outperformed the approach both in terms of overall accuracies and the kappa coefficient. In regard to the class-specific accuracies, the proposed approach outperforms the other classification techniques in the majority of the cases (see Tables IX and X). The classification accuracy for the land cover classes “arable crops” and “orchard” are lower, compared to the accuracies achieved for other classes. A reason for this could be the variability within the class “arable crops,” that consist of a variety of different crop types. The surface beneath the orchards is often covered by grassland; hence, the class can appear as a mixture between grassland and forest. Whereas classes like “cereals” and “root

In this paper, the problem of classifying multisensor data sets, containing multitemporal SAR data and optical imagery, was addressed, and an approach based on the fusion of SVMs was proposed. The proposed method is based on the SVMbased decision fusion of SVMs that were individually trained on the different data sources. In addition to the SVM fusion, two other fusion schemes were considered: a majority vote and the absolute maximum. Generally, SVMs have the potential to be more accurate in classification of complex data sets when compared to conventional multivariate classifiers, since a convenient multivariate model is not generally known for such data. In the presented experiments, the separate training and further fusion by an SVM outperformed all other parametric and nonparametric classification techniques, including the single SVM. Even if the presented technique is computationally more costly than a single SVM, the concept can simply be integrated into existing SVM approaches. Based on the results, the proposed fusion approach which uses the absolute maximum strategy can be considered attractive when classifying multisensor data sets. However, a simple majority vote seems not to be appropriate for fusing outputs of separate SVMs. In experiments, the accuracies of critical classes were seen to be significantly increased. In contrast, there was a reduction in accuracies for a few classes, but those classes were still classified with relatively high accuracies. Although the two different data sets were normalized, one may argue that the two outputs of the SVMs are not directly comparable for an adequate fusion process, because they were calculated in different feature spaces. However, this seems not a limiting condition for the proposed technique, because SVMs can handle differently scaled data. Furthermore, the classification accuracy is increased by combining the separate outputs by another SVM. In contrast, concepts such as the absolute maximum strategy for example, could be affected, if the outputs of a priori SVMs are not directly comparable. That could be a reason for the better performance of the fused SVM in terms of accuracies compared to the absolute maximum strategy. The main reason for the success of the proposed approach could be the different nature of the used data types, i.e., the multispectral imagery and the SAR data. The multitemporal SAR data and the multispectral images provide different information and may not be equally reliable. Using a single SVM

Authorized licensed use limited to: NATIONAL UNIVERSITY OF ICELAND. Downloaded on March 27, 2009 at 06:28 from IEEE Xplore. Restrictions apply.

WASKE AND BENEDIKTSSON: FUSION OF SVM FOR CLASSIFICATION OF MULTISENSOR DATA

for the whole heterogeneous data requires the definition of one single kernel function. Therefore, it seems more adequate to define the kernel functions for each data source separately and fuse the derived outputs (i.e., the distance of each sample to the hyperplane). This statement is confirmed by the experimental results. Furthermore, it has been shown that SVMs outperforms other classifiers in term of accuracies when classifying single-source data sets. Hence, it can be expected that an extension of the proposed methodology would further increase the classification accuracy. In this context, it would be possible to either rely on a further data differentiation (e.g., between different wavelength and polarizations of the SAR imagery) during the SVM training process or to use different kernel methods for each source. Generally, the results show that the use of multisensor imagery is worthwhile and the classification accuracy is significantly increased by such data sets, irrespective of the applied classifier. That is particularly important with respect to missions like ALOS, Radarsat-2, and TerraSAR-X.

ACKNOWLEDGMENT B. Waske visited the Department of Electrical and Computer Engineering, University of Iceland, funded by the ENVILAND research project. He would like to thank G. Menz, M. Braun, and V. Heinzel from the University of Bonn. The software for the SVM classification was developed by A. Janz and S. Schiefer, Humboldt-Universität, Berlin, Germany. The authors would like to thank the European Space Agency for providing Envisat ASAR and ERS-2 data through a CAT 1 proposal (C1.3115). The SPOT image is provided through the European OASIS program (OASIS 58 - CE 6324). The data were acquired within the ENVILAND research project (FKZ 50EE0404), funded by the German Aerospace Center (DLR) and the Federal Ministry of Economics and Technology (BMWi). The authors would also like to thank the comments of the anonymous reviewers who helped us significantly improve this paper. R EFERENCES [1] J. A. Benediktsson and I. Kanellopoulos, “Classification of multisource and hyperspectral data based on decision fusion,” IEEE Trans. Geosci. Remote Sens., vol. 37, no. 3, pp. 1367–1377, May 1999. [2] X. Blaes, L. Vanhalle, and P. Defourny, “Efficiency of crop identification based on optical and SAR image time series,” Remote Sens. Environ., vol. 96, no. 3/4, pp. 352–365, Jun. 2005. [3] G. Chust, D. Ducrot, and J. L. Pretus, “Land cover discrimination potential of radar multitemporal series and optical multispectral images in a Mediteranean cultural landscape,” Int. J. Remote Sens., vol. 25, no. 17, pp. 3513–3528, 2004. [4] D. B. Michelson, B. M. Liljeberg, and P. Pilesjo, “Comparison of algorithms for classifying Swedish landcover using Landsat TM and ERS-1 SAR data,” Remote Sens. Environ., vol. 71, no. 1, pp. 1–15, Jan. 2000. [5] J. A. Benediktsson, P. H. Swain, and O. K. Ersoy, “Neural network approaches versus statistical methods in classification of multisource remote sensing data,” IEEE Trans. Geosci. Remote Sens., vol. 28, no. 4, pp. 540–552, Jul. 1990. [6] T. Lee, J. A. Richards, and P. H. Swain, “Probabilistic and evidential approaches for multisource data analysis,” IEEE Trans. Geosci. Remote Sens., vol. GRS-25, no. 3, pp. 283–293, May 1987. [7] S. B. Serpico and F. Roli, “Classification of multisensor remote-sensing images by structured neural networks,” IEEE Trans. Geosci. Remote Sens., vol. 33, no. 3, pp. 562–578, May 1995.

3865

[8] M. A. Friedl and C. E. Brodley, “Decision tree classification of land cover from remotely sensed data,” Remote Sens. Environ., vol. 61, no. 3, pp. 399–409, Sep. 1997. [9] M. Pal and P. M. Mather, “An assessment of the effectiveness of decision tree methods for land cover classifications,” Remote Sens. Environ., vol. 86, no. 4, pp. 554–565, Aug. 2003. [10] C. Huang, L. S. Davis, and J. R. Townshend, “An assessment of support vector machines for land cover classification,” Int. J. Remote Sens., vol. 23, no. 4, pp. 725–749, Feb. 2002. [11] G. M. Foody and A. Mathur, “The use of small training sets containing mixed pixels for accurate hard image classification: Training on mixed spectral responses for classification by a SVM,” Remote Sens. Environ., vol. 103, no. 2, pp. 179–189, Jul. 2006. [12] T. Kavzoglu and P. M. Mather, “Pruning artificial neural networks: An example using land cover classification of multi-sensor images,” Int. J. Remote Sens., vol. 20, no. 14, pp. 2787–2803, Sep. 1999. [13] M. A. Friedl, C. E. Brodley, and A. H. Strahler, “Maximizing land cover classification accuracies produced by decision trees at continental to global scales,” IEEE Trans. Geosci. Remote Sens., vol. 37, no. 2, pp. 969– 977, Mar. 1999. [14] M. Simard, S. S. Saatchi, and G. de Grandi, “The use of decision tree and multiscale texture for classification of JERS-1 SAR data over tropical forest,” IEEE Trans. Geosci. Remote Sens., vol. 38, no. 5, pp. 2310–2321, Sep. 2000. [15] B. M. Steele, “Combining multiple classifiers: An application using spatial and remotely sensed information for land cover type mapping,” Remote Sens. Environ., vol. 74, no. 3, pp. 545–556, Dec. 2000. [16] M. Fauvel, J. Chanussot, and J. A. Benediktsson, “Decision fusion for the classification of urban remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 44, no. 10, pp. 2828–2838, Oct. 2006. [17] J. A. Benediktsson and P. H. Swain, “Consensus theoretic classification methods,” IEEE Trans. Syst., Man, Cybern., vol. 22, no. 4, pp. 688–704, Jul./Aug. 1992. [18] J. A. Benediktsson, J. R. Sveinsson, and P. H. Swain, “Hybrid consensus theoretic classification,” IEEE Trans. Geosci. Remote Sens., vol. 35, no. 4, pp. 833–843, Jul. 1997. [19] G. M. Foody and A. Mathur, “A relative evaluation of multiclass image classification of support vector machines,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 6, pp. 1335–1343, Jun. 2004. [20] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sensing images with support vector machines,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 8, pp. 1778–1790, Aug. 2004. [21] M. Pal and P. M. Mather, “Support vector machines for classification in remote sensing,” Int. J. Remote Sens., vol. 26, no. 5, pp. 1007–1011, Mar. 2005. [22] M. Pal and P. M. Mather, “Some issues in the classification of DAIS hyperspectral data,” Int. J. Remote Sens., vol. 27, no. 14, pp. 2895–2916, Jul. 2006. [23] V. N. Vapnik, Statistical Learning Theory. New York: Wiley, 1998. [24] X. Song, G. Fan, and M. Rao, “Automatic CRP mapping using nonparametric machine learning approaches,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 4, pp. 888–897, Apr. 2005. [25] G. H. Halldorsson, J. A. Benediktsson, and J. R. Sveinsson, “Support vector machines in multisource classification,” in Proc. IGARSS, 2003, vol. 3, pp. 2054–2056. [26] G. Camps-Valls, L. Gomez-Chova, J. Munoz-Mari, J. Vila-Frances, and J. Calpe-Maravilla, “Composite kernels for hyperspectral image classification,” IEEE Geosci. Remote Sens. Lett., vol. 3, no. 1, pp. 93–97, Jan. 2006. [27] M. Fauvel, J. Chanussot, and J. A. Benediktsson, “A combined support vector machines classification based on decision fusion,” in Proc. IGARSS, 2006, pp. 2494–2497. [28] C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Min. Knowl. Discov., vol. 2, no. 2, pp. 121–167, Jun. 1998. [29] B. Schölkopf and A. Smola, Learning With Kernels. Cambridge, MA: MIT Press, 2002. [30] O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee, “Choosing multiple parameters for support vector machines,” Mach. Learn., vol. 46, no. 1–3, pp. 131–159, Jan. 2001. [31] K.-M. Chung, W.-C. Kao, C.-L. Sun, L.-L. Wang, and C.-J. Lin, “Radius margin bounds for support vector machines with the RBF kernel,” Neural Comput., vol. 15, no. 11, pp. 2643–2681, Nov. 2003. [32] J. H. Lee and C. J. Lin, “Automatic model selection for support vector machines, Technical report,” Dept. Comput. Sci. and Inf. Eng., Nat. Taiwan Univ., Taipei, Taiwan, R.O.C., 2000. [Online]. Available: http://www.csie. ntu.edu.tw/~cjlin/looms/

Authorized licensed use limited to: NATIONAL UNIVERSITY OF ICELAND. Downloaded on March 27, 2009 at 06:28 from IEEE Xplore. Restrictions apply.

3866

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 12, DECEMBER 2007

[33] C.-W. Hsu and C.-J. Lin, “A comparison of methods for multi-class support vector machines,” IEEE Trans. Neural Netw., vol. 13, no. 2, pp. 415–425, Mar. 2002. [34] D. J. Sebald and J. A. Bucklew, “Support vector machines and the multiple hypothesis test problem,” IEEE Trans. Signal Process., vol. 49, no. 11, pp. 2865–2872, Nov. 2001. [35] H. Laur, P. Bally, P. Meadows, J. Sanchez, B. Schaettler, E. Lopinto, and D. Esteban, “Derivation of the backscattering coefficient σ0 in ESA ERS SAR PRI products,” ESA, Noordjiwk, The Netherlands, ESA Document ES-TN-RE-PM-HL09, 2002, no. 2, Rev. 5d. [36] D. Landgrebe, Signal Theory Methods in Multispectral Remote Sensing. Hoboken, NJ: Wiley, 2003. [37] J. A. Richards and X. Jia, Remote Sensing Digital Image Analysis: An Introduction. New York: Springer-Verlag, 2003. [38] J. R. Quinlan, “Induction of decision trees,” Mach. Learn., vol. 1, no. 1, pp. 81–106, 1986. [39] J. R. Quinlan, C4.5: Programs for Machine Learning. Los Altos, CA: Morgan Kaufmann, 1993. [40] G. J. Briem, J. A. Benediktsson, and J. R. Sveinsson, “Multiple classifiers applied to multisource remote sensing data,” IEEE Trans. Geosci. Remote Sens., vol. 40, no. 10, pp. 2291–2299, Oct. 2002. [41] B. Waske, S. Schiefer, and M. Braun, “Random feature selection for a decision tree classification of multi-temporal SAR data,” in Proc. IGARSS, 2006, pp. 168–171. [42] Y. Freund and R. E. Schapire, “Experiments with a new boosting algorithm,” in Proc. 13th Int. Conf. Mach. Learn., 1996, pp. 148–156. [43] R. G. Congalton and K. Green, Assessing the Accuracy of Remote Sensed Data: Principles and Practices. Boca Raton, FL: Lewis Publishers, 1999.

Björn Waske (S’06) received the degree in environmental sciences, with a major in remote sensing, from the University of Trier, Trier, Germany, in 2002. Since 2004, he has been working toward the Ph.D. degree at the Center for Remote Sensing of Land Surfaces (ZFL), University of Bonn, Bonn, Germany. Until mid-2004, he was a Research Assistant in the Department of Geosciences, University of Munich, Germany, doing research on the use of remote sensing data for flood forecast modeling. His current research activities concentrate on data fusion of multispectral and synthetic aperture radar data for land cover classifications.

Jón Atli Benediktsson (S’84–M’90–SM’99–F’04) received the Cand.Sci. degree in electrical engineering from University of Iceland, Reykjavik, Iceland, in 1984, and the M.S.E.E. and Ph.D. degrees from Purdue University, West Lafayette, IN, in 1987 and 1990, respectively. He is currently Director of Academic Development and Innovation, and Professor of electrical and computer engineering at the University of Iceland. He has held visiting positions at the Department of Information and Communication Technology, University of Trento, Trento, Italy (2002–present), School of Computing and Information Systems, Kingston University, Kingston upon Thames, U.K. (1999–2004), the Joint Research Centre of the European Commission, Ispra, Italy (1998), Denmark’s Technical University, Lyngby (1998), and the School of Electrical and Computer Engineering, Purdue University (1995). He was a Fellow at the Australian Defence Force Academy, Canberra, A.C.T., Australia, in August 1997. From 1999 to 2004, he was a Chairman of the energy company Metan, Ltd. His research interests are in remote sensing, pattern recognition, neural networks, image processing, and signal processing, and he has published extensively in those fields. Dr. Benediktsson is Editor of the IEEE TRANSACTIONS ON GEOSCIENCE AND R EMOTE S ENSING (TGARS) and Associate Editor of the IEEE GEOSCIENCE AND REMOTE SENSING LETTERS. He was Associate Editor of TGARS from 1999 to 2002. He coedited (with Prof. D. A. Landgrebe) a special issue on Data Fusion of TGARS (May 1999). In 2002, he was appointed Vice President of Technical Activities in the Administrative Committee of the IEEE Geoscience and Remote Sensing Society (GRSS) and (with P. Gamba and G. G. Wilkinson) a special issue on Urban Remote Sensing from Satellite (October 2003). From 1996 to 1999, he was the Chairman of the GRSS Technical Committee on Data Fusion and was elected to the Administrative Committee of the GRSS for the term 2000 to 2002, and in 2002, he was appointed Vice President of Technical Activities of GRSS. He was the founding Chairman of the IEEE Iceland Section and served as its Chairman from 2000 to 2003. Currently, he is the Chairman of the University of Iceland’s Quality Assurance Committee (2006–present). He was the Chairman of the University of Iceland’s Science and Research Committee (1999–2005), a member of Iceland’s Science and Technology Council (2003–2006), and a member of the Nordic Research Policy Council (2004). He was a member of a NATO Advisory Panel of the Physical and Engineering Science and Technology Subprogramme (2002–2003). He received the Stevan J. Kristof Award from Purdue University in 1991 as an outstanding graduate student in remote sensing. In 1997, he was the recipient of the Icelandic Research Council’s Outstanding Young Researcher Award. In 2000, he was granted the IEEE Third Millennium Medal. In 2004, he was a corecipient of the University of Iceland’s Technology Innovation Award. In 2006, he received the yearly research award from the Engineering Research Institute of the University of Iceland, and in 2007, he received the Outstanding Service Award from the IEEE GRSS. He is a member of Societas Scinetiarum Islandica and Tau Beta Pi.

Authorized licensed use limited to: NATIONAL UNIVERSITY OF ICELAND. Downloaded on March 27, 2009 at 06:28 from IEEE Xplore. Restrictions apply.