Performance of Support Vector Machines and Artificial Neural Network ...

3 downloads 0 Views 3MB Size Report
and Artificial Neural Network for Mapping. Endangered Tree Species Using WorldView-2. Data in Dukuduku Forest, South Africa. Galal Omer, Onisimo Mutanga, ...
Thisarticlehasbeenacceptedforinclusioninafutureissueofthisjournal.Contentisfinalaspresented,withtheexceptionofpagination. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING

1

Performance of Support Vector Machines and Artificial Neural Network for Mapping Endangered Tree Species Using WorldView-2 Data in Dukuduku Forest, South Africa Galal Omer, Onisimo Mutanga, Elfatih M. Abdel-Rahman, and Elhadi Adam

Abstract—Endangered tree species (ETS) play a significant role in ecosystem functioning and services, land use dynamics, and other socio-economic aspects. Such aspects include ecological, economic, livelihood, and security-based and well-being benefits. The development of techniques for mapping and monitoring ETS is thus critical for understanding functioning of ecosystems. The advent of advanced imaging systems and supervised learning algorithms has provided an opportunity to map ETS over fragmenting areas. Recently, vegetation maps have been produced using advanced imaging systems such as WorldView-2 (WV-2) and robust classification algorithms such as support vector machines (SVM) and artificial neural network (ANN). However, delineation of ETS in a fragmenting ecosystem using high-resolution imagery has largely remained elusive due to the complexity of the species structure and their distribution. Therefore, the aim of the present study was to examine the utility of the advanced WV-2 data for mapping ETS in the fragmenting Dukuduku indigenous forest of South Africa using SVM and ANN classification algorithms. Specifically, the study looked at testing the advent of the additional WV-2 bands in mapping six ETS. WV-2 image was spectrally resized to separate four standard bands (SB) and four additional bands (AB). WV-2 image (8 bands: 8B) together with the SB and AB subsets was classified using SVM and ANN methods. The results showed the robustness of the two machine learning algorithms with an overall accuracy (OA) of 77.00% for SVM and 75.00% for ANN using 8B. The SB produced OA of 65.00% for SVM and 64.00% for ANN. The AB produced almost the same OA of 70.00% for both SVM and ANN. There were significant differences between the performances of the two algorithms as demonstrated by the results of McNemar’s test (Z score ≥ 1.96). This study concludes that SVM and ANN classification algorithms with WV-2 8B have the potential to map ETS in the Dukuduku indigenous forest. This study offers relatively accurate information that is important for forest managers to make informed decisions regarding management and conservation protocols of ETS.

Manuscript received April 02, 2015; revised July 21, 2015; accepted July 21, 2015. G. Omer and E. M. Abdel-Rahman are with the School of Agricultural, Earth, and Environmental Sciences, Pietermaritzburg Campus, University of KwaZulu-Natal, Pietermaritzburg 3209, South Africa, and also with the University of Khartoum, Khartoum North 13314, Sudan (e-mail: [email protected]). O. Mutanga is with the School of Agricultural, Earth, and Environmental Sciences, University of KwaZulu-Natal, Pietermaritzburg 3209, South Africa. E. Adam is with the School of Geography, Archaeology, and Environmental Studies, University of the Witwatersrand-Johannesburg, Johannesburg 2050, South Africa. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSTARS.2015.2461136

Index Terms—Artificial neural network (ANN), Dukuduku, endangered tree species (ETS), indigenous forest, support vector machines (SVM).

I. I NTRODUCTION

I

NDIGENOUS forests span across different parts of Africa with relatively more existence in the southern and eastern parts of the continent [1]. In South Africa, indigenous forests consist of many small, fragmenting, and largely scattered patches and cover approximately 0.1% of the country’s land surface [2]–[4]. Apart from their ecological, economic, livelihood security, and well-being, indigenous forests in the country provide some medicinal products to the communities in the rural areas and contribute to the concept of “ecosystem services” [5]–[7]. One such indigenous forest in South Africa is the Dukuduku forest that is located in KwaZulu-Natal Province. Dukuduku forest provides varied products and usable materials for human needs that include construction and fence poles, raw material for craft work, livestock browse, and medicine to the poor rural communities [3], [6]–[9]. Different tree species in the forest play a vital role in providing such useful needs. It is interesting to note that local communities in Zululand, South Africa use some of these tree species to treat human diseases such as fever, stomachache, dysentery, snake and scorpion bites, malaria, inflammations, backache, and facilitating childbirth [6], [9]–[12]. Therefore, some tree species in the Dukuduku indigenous forest have become endangered and threatened because of the rapid harvesting rate and removal [2], [13]. These activities have resulted in over-exploitation of natural resources and caused severe forest fragmentation and serious threats to the conservation of tree species diversity in the indigenous Dukuduku forest ecosystem [3], [14]–[16]. Endangered tree species (ETS) need specific management and conservation protocols in order to play significant roles in ecosystem functioning, land use dynamics, and other socioeconomic aspects in the value chain [8], [17], [18]. That requires intensive fieldwork to geolocate and identify ETS and characterizes as well as estimate their coverage and distribution [18], [19]. In this context, more precise information from forest survey is needed for mapping and monitoring ETS to develop sustainable forest management practices. However, traditional field survey protocols are costly, time-consuming, and often lack the necessary geospatial accuracy. Remotely sensed

1939-1404 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Thisarticlehasbeenacceptedforinclusioninafutureissueofthisjournal.Contentisfinalaspresented,withtheexceptionofpagination. 2

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING

data were regarded as a precious source of information over the past decades in classifying and monitoring forest species and vegetation communities [20]–[23]. However, mapping of tree species (e.g., ETS) still faces complex challenges in relation to ambiguous classes. Multiple objects within a pixel can lead to spectral confusion and poor distinction among different cover types [24]–[26]. In particular, these challenges hinder the classification of tree species when multispectral data are captured in fragmenting forests [3], [27]. This is due to broader and fewer spectral measurements collected by some multispectral sensors like Landsat and SPOT5 which might lead to spectral overlap between tree species. Recently, the advent of new generation satellites with high spectral and spatial resolution such as Sentinel-2, WorldView3 (WV-3), WorldView-2 (WV-2), RapidEye, and Pleiades has brought unique opportunities for mapping trees at species level. Among these satellites, WV-2 and WV-3 offer key spectral bands like yellow, coastal blue, and red edge that help in depicting tree characteristics. The utility of WV-2 image, for instance, has been demonstrated in various studies that include among others, predicting and mapping forest structural parameters [28], mapping of tree species [29], monitoring plantation forest [30], mapping increaser and decreaser grass species in degraded rangelands [31], and the detection of invasive alien plants [32]. These studies discussed the utility of the eight available spectral bands of WV-2 imagery and concluded that the WV-2 data have considerably improved the classification and prediction accuracies of features of interest compared to other multispectral data. These studies, however, have some limitations related to lack of knowledge on the performance of WV-2 spectral subsets in mapping tree species using advanced and robust classification algorithms. Various advanced classification algorithms such as classification trees, random forest (RF), artificial neural networks (ANN), and support vector machines (SVM) have been used to extract tree species information from multisensor and multispectral remote sensing images [33]–[36]. Among these classification methods, attention has been accorded to the use of RF, SVM, and ANN due to their superior image handling capability [33], [37]. RF, SVM, and ANN offer a precise way to map vegetation cover and tree species from remote sensing images without having to depend on any assumptions [35], [37]–[40]. RF is widely used for mining and classifying hyperspectral data for plant secrecies identification and classification [41]–[43]. While, the application of SVM and ANN classification algorithms has been mainly explored in forest species classification using multispectral imagery [38], [39], [44]. In addition, SVM are well-known machine learning algorithms that perform accurately in reduction of the complexity of illposed problem associated with image classification [37], [44], [45]. It is frequently used to locate multiple linear or potentially nonlinear class samples by a variety of kernel approaches [46]. The kernel approach takes on several methods such as polynomial and a radial basis function (RBF) that have revealed accurate results for vegetation classification [47]. RBF has many advantages which include its effectiveness as it works in an infinite dimensional feature space and having a single parameter conversely to the other well-working kernels [48]–[50].

On the other hand, ANN basically is machine learning systems comprising interconnected linkages of modest processing elements. They are characterized by robust pattern recognition power, allowing them to represent complex multivariate data forms [34]. ANN have many advantages over the statistical methods which include easy adaptation to different kinds of data and input structures, and the ability to generalization for technique with multiple images [51]–[54], as well as the ability to categorize data with limited training data compared with traditional classifiers [54]. Previous studies demonstrated that SVM and ANN perform better than traditional classification methods like maximum likelihood, minimum distance to the mean and decision tree [37], [39], [44], [55], [56]. These conventional classification methods depend on assumptions that may limit their utilities for many datasets and to mapping areas with limited training samples [37], [55], [57]. Moreover, there are some additional challenges with the utilization of the ANN. These include the likelihood of involving local extremum; a lack of strict design packages with theoretical foundation; and the difficulty faced in attempting to control the training process [58]. Furthermore, ANN are often referred to as a black-box technique that could encounter an over-fitting problem on the test dataset [59], [60]. Perfectly describing processes that interpret input data into output classes could, however, be challenging due to the combined use of multiple nonlinear activation functions at different layers [57]. To the best of our knowledge, there is rarity and lack of information on how SVM and ANN classification algorithms can perform on delineating ETS in a fragmenting ecosystem using very high-resolution WV-2 imagery. Therefore, the objective of the present study was to examine the utility of the advanced WV-2 satellite data for mapping ETS in the fragmenting Dukuduku indigenous forest of South Africa using SVM and ANN classification algorithms. Specifically, the study looked at testing the advent of the additional WV-2 bands in mapping six ETS.

II. M ETHODOLOGY A. Study Area The study area encompasses the inland coastal forest of South Africa known as the Dukuduku indigenous forest. The area falls between Mtubatuba and St. Lucia towns in the northern KwaZulu-Natal Province (32◦ 17 23 E and 28◦ 52 25 S) and form part of Mtubatuba Local Municipality (Fig. 1) [3], [61]. The Dukuduku indigenous forest is, therefore, considered to be among the only remaining indigenous lowland forests along South African coastline [3], [62]. The forest covers about 6500 ha of indigenous tree species. The majority of the land cover classes surrounding the forest consist of sugarcane and private commercial forest plantations [3], [62]. As earlier pointed out by Ndlovu [61], 29% of this great indigenous forest has been converted into other land uses which include subsistence agriculture and human settlements. In addition, increasing human activities in the area has led to an increase in landscape fragmentation. Therefore, the Dukuduku area is facing

Thisarticlehasbeenacceptedforinclusioninafutureissueofthisjournal.Contentisfinalaspresented,withtheexceptionofpagination. OMER et al.: PERFORMANCE OF SVM AND ANN FOR MAPPING ETS

3

identified with the aid of expert knowledge. In total, 827 sample points were collected from the six ETS and LULC classes in the study area. The sample points for each class were 101 (AA), 71 (EC), 62 (HC), 70 (HU), 80 (ScB), 68 (TD), 75 (sugarcane), 75 (grassland), 75 (plantation forest), and 150 (OFS that include coastal and dune forest species). These points were then used as ground-truth data to classify the different WV-2 spectral subsets based on the pixel spectral signatures of the classes. For the tree classes, one pixel was used for individual tree crown.

C. Image Acquisition and Preprocessing

Fig. 1. Location of the Dukuduku indigenous forest in KwaZulu-Natal Province of South Africa.

huge threats presented by the destruction of indigenous vegetation, commercial forest plantations, and agricultural farmlands. Some tree species are also cut by the local community for different purposes such as crafts and medicines. The area is dominated by various natural indigenous vegetation species comprising different age groups and other forms of land cover classes. Among these species, the area is dominated by many rare tree species, e.g., Syzygium cordatum (SyC), cussonia zuluensis (CZ), Ficus natalensis (FN), Canthium inerme (CI), Strychnos madagascariensis (SM), Strychnos spinosa (SS), Apodytes dimidiate (AD), Ozoroa engleri (OE), Barringtonia racemosa (BR), Albizia adianthifolia (AA), Ekebergia capensis (EC), Harpephyllum caffrum (HC), Hymenocardia ulmoides (HU), Sclercarya birrea (ScB), and Trichilia dregeana (TD). These tree species were used for a number of human purposes like rapid harvesting for woodcarving and traditional medicine [8], [9]. Forest managers in the Dukuduku indigenous forest observed that six of these species; namely AA, EC, HC, HU, ScB, and TD are under severe threats and endangerment [63]. Therefore, the current study focuses on mapping six tree species among other forest species (OFS) and land use/cover (LULC) in the study area.

B. Field Data Collection Intensive field work was conducted to identify ETS associated with indigenous forest in the study area within one week of the WV-2 imagery acquisition. We employed purposive sampling protocol to geolocate six ETS in the area using a handheld Leica GS20 geographical positioning system (GPS) with a submeter accuracy. The networks of road and open paths were used to assist in selecting the ETS by walking in various directions in the study area. Fig. 2 shows the morphological and spectral characteristics of the six tree species. The target species were

In this study, a cloud-free WV-2 satellite imagery was acquired from the study area on December 1, 2013. WV-2 image is the first very high-resolution satellite imagery that has the ability to acquire data of eight spectral bands in the 0.4−1.40 µm spectral range with spatial resolution of 2 m and swath width of 16.4 km. The eight spectral bands of WV-2 consist of four standard bands (SB) situated in the blue (0.450−0.510 µm), green (0.510−0.580 µm), red (0.630−0.690 µm), and NIR-1(0.770−0.895 µm), and four additional bands (AB) which are coastal blue (0.4−0.45 µm), yellow (0.585−0.625 µm), red edge (0.705−0.745 µm), and NIR-2 (0.860−1.40 µm). The WV-2 image was atmospherically corrected using the quick atmospheric correction (QUAC) procedure in interactive data language (IDL) environment for visualizing images (ENVI) 4.7 software [64]. QUAC determines atmospheric compensation parameters directly from the information contained within the image (pixel spectra), thus allowing for the retrieval of accurate reflectance spectra [65]. No geometric correction was made on the WV-2 image as it was provided already corrected by DigitalGlobe. The image was referenced to the universal transverse mercator (UTM zone 36 South) projection and WGS-84 geodetic datum. The WV-2 image was spectrally resized to separate the four SB and four new bands (AB) subsets. The separated WV-2 subsets together with the eight bands (8B) were then compared for classifying ETS using SVM and ANN supervised learning classification algorithms.

III. S TATISTICAL A NALYSIS It is noted that a sufficient number of training samples are a prerequisite for a successful classification [53]. It is also reported that the sample points should be distributed consistently across the spatial extent of the classes to provide a representative description of the overall population [66]. In the current study, the SVM and ANN classification algorithms were trained on 70% (583) of a randomly selected holdout samples and final accuracy assessments were evaluated using the remaining 30% (244) of the dataset. The ground-truth training samples were overlaid over WV-2 spectral subsets and classification signatures were then created for the six selected ETS and LULC classes in the area. After assessing and adjusting signatures, SVM and ANN supervised classification methods were employed.

Thisarticlehasbeenacceptedforinclusioninafutureissueofthisjournal.Contentisfinalaspresented,withtheexceptionofpagination. 4

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING

Fig. 2. Field photographs and the corresponding average spectral reflectance curves of the six tree species extracted from WV-2 image pixels (n = 44 for each spectrum) located at the center of tree crowns.

Thisarticlehasbeenacceptedforinclusioninafutureissueofthisjournal.Contentisfinalaspresented,withtheexceptionofpagination. OMER et al.: PERFORMANCE OF SVM AND ANN FOR MAPPING ETS

A. SVM Classification Algorithm SVM classification algorithm is a binary learning technique that analyzes data and recognizes patterns [49]. However, numerous pattern recognition applications require multiple classes. Hence, multiclass SVM problems are solved by generating multiple binary classifiers [67]. Amendments are effective to the simple SVM binary classifiers to run as a multiclass classifier using procedures like one against one (OAO) and one against all (OAA). In each class, OAA adopts one binary SVM to alienate a class member from others. Conversely, OAO uses a binary SVM from each pair of classes to separate members of one class from others. The SVM algorithm is then assigned the correct class by using a voting mechanism [50], [68]. The algorithm requires no assumption about the data distribution and uses very efficient principles to not over fit the test or new data sample [49], [69]–[71]. In addition, SVM attempts to maximize the margin, i.e., the distance between the data points of each class, to the optimal separating linear hyperplane axes created from each variable [72]. In a two class experiment, the algorithm sets two supporting hyperplanes in the boundaries and searches to maximize the margin between them. Sample points lying on the supporting hyperplanes are named support vectors and in the middle of the margin is the optimal hyperplane. The algorithm aims to determine a linear discriminant function with maximum margin to discriminate each class. However, many classes are not linearly separable, hence SVM projects vectors into a high-dimensional feature space by means of a kernel trick and fits the optimum hyperplane that discriminates classes using an optimization function [73], [74]. Polynomial and RBF kernels are the most commonly used functions for classifying remotely sensed data [37], [75]. In comparison, a number of authors have found that an RBF outperforms polynomial kernel in classification of remotely sensed data [73]–[75]. Furthermore, RBF is computationally fast and easy to implement and requires optimizing only two parameters. These are the cost function (C), which is a value for standardizing the error of misclassified data points in the training dataset samples, and gamma (γ) which is the kernel width parameter of the RBF [73]. In the current study, three WV-2 spectral subsets and RBF were used to find an optimal hyperplane that can differentiate among the six ETS in the Dukuduku forest. The C and γ parameters of the RBF were optimized to avoid over-fitting problems [73]. The regularization of the two parameters was performed using a 10-fold cross validation method [48], [73], [74]. The training dataset was divided into 10 subsets of equal sizes. SVM models were then trained on nine subset samples, and tested using the removed one and the process was repeated 10 times until all subset samples had served as test samples. The pair parameter that minimizes the classification error was then considered as the best value for the final classification process. The OAO procedure was used to implement a multiclass SVM model as suggested by Hsu and Lin [76] who stated that this scheme is more symmetric than OAA with regard to class sizes. The e1071 library version 2.15.2 in R statistical packages [77] was employed for optimizing SVM parameters. The optimal SVM parameters were then input into the ENVI software

5

TABLE I PARAMETERS FOR THE B EST T RAINED AND ANN U SED FOR M APPING ETS

MLP, multilayer perceptron.

to implement SVM classification algorithm to map the ETS and other classes on WV-2 image. B. ANN Classification Algorithm: Multilayer Perceptron In machine learning and related fields, an ANN is a nonparametric classification technique that does not depend on an assumption of data normality [34], [37], [78]. An ANN is a mathematical model that attempts to simulate the structure and functional aspects of biological neural linkages. It comprises an interconnected group of artificial neurons and processes information using a connectionist approach for computation [79]. An ANN is originally designed as pattern recognition and data analysis tools that mimic the neural storage and analytical operations of the brain. Like SVM, an ANN approach has a distinct advantage over other classification methods. The algorithm is advantageous in that it is nonparametric and, therefore, requires little or no prior knowledge on the distribution of input data [80]. Moreover, an ANN fits arbitrary decision boundary to separate among the data points and, therefore, produces high classification accuracy [52], [81]. Various models of ANN have been used in remote sensing studies such as RBF, back propagation (BP), and multilayer perceptron (MLP) [34], [37], [58], [82]–[84]. The MLP is a commonly used ANN structure that comprises an input layer and an output layer and one or more hidden layers of nonlinearly activating nodes [34], [78], [85]. The nodes in each layer connect by a certain synaptic weight to all the nodes in the next layer [85]. Perceptron learning occurs through changes in the linkages weights after items of data are processed. An MLP is a feed forward ANN model that maps input data onto a set of appropriate output. It is an adjustment of the standard linear perceptron that uses three or more layers of neurons (nodes) with nonlinear activation functions [39], [57], [78]. ANN has been widely used in classifying land cover and tree species that are not linearly separable in the original spectral space [37], [58], [79], [86]. Moreover, MLP model using the standard BP algorithm is one of the well-known ANN structures of algorithm. This algorithm used standard BP for supervised learning from Tanagra software 1.4 [87]. In the present study, ANN classification algorithm was used as a supervised nonlinear classification algorithm to map ETS and other classes using WV-2 data. Numerous variations of internal network structure, input data, and learning algorithms have been tested to define optimal classifier features. Table I shows a list of the parameters used to train all ANN models. The input layers consisted of eight for 8B and four for both SB and AB, while two hidden layers were found optimum for

Thisarticlehasbeenacceptedforinclusioninafutureissueofthisjournal.Contentisfinalaspresented,withtheexceptionofpagination. 6

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING

Fig. 3. Classification maps obtained using all eight WV-2 bands (8B). (a) SVM algorithm. (b) ANN algorithm.

all models (Table I). The structure of the hidden layers was also tested to assess the necessary number of hidden layers and number of required nodes per layer. This was tested by manually changing the number of nodes. The train-test parameters were excluded to obtain overall test error rate and then viewed to get confusion matrix for classifying ETS and other classes using Tanagra software [87], [88]. The ANN was trained with a BPtraining algorithm and one hidden layer [89]. The BP algorithm is a supervised method that uses the gradient descent technique, which adjusts weights to minimize the classification error and optimizes ANN parameters [84], [90]. In the present study, the number of hidden layers was set to 2 and the number of training iterations was set to a default value of 1000 [87]. The optimum

Fig. 4. Classification maps obtained using four standard WV-2 bands (SB). (a) SVM algorithm. (b) ANN algorithm.

number of nodes was established after manually changing the number of nodes (Table I). The optimal parameters were then input into the ENVI software to map the ETS and LULC on WV-2 image [64].

C. Accuracy Assessment The accuracy of SVM and ANN classification algorithms was evaluated using the 30% (n = 244) holdout sample of the dataset. A confusion matrix was constructed to compare the true class with the predicted class assigned by SVM and ANN and to calculate the overall accuracy (OA), producer’s accuracy (PA),

Thisarticlehasbeenacceptedforinclusioninafutureissueofthisjournal.Contentisfinalaspresented,withtheexceptionofpagination. OMER et al.: PERFORMANCE OF SVM AND ANN FOR MAPPING ETS

7

recommend using a more suitable and simpler approach that focuses on two parameters of disagreement between maps in terms of the quantity [quantity disagreement (QD)] and spatial allocation [allocation disagreement (AD)] of the categories instead of kappa variants. The two useful parameters were calculated from the cross-tabulation matrices to assess reliability of each classification algorithm. The QD is the amount of difference between the number of test data points and the predicted ones, while the AD describes the number of expected classes that have less than optimal spatial location in comparison to the test data. According to accuracy metrics achieved for each algorithm in each accuracy assessment method, a statistical analysis can be performed to test if there was any significant difference between the classification results of SVM and ANN classification algorithms. Therefore, we performed McNemar’s test to examine whether there were any significant differences among the confusion matrices of the two classification algorithms (SVM and ANN). McNemar’s test is a nonparametric test based on standardized normal test statistic calculated from error matrices of two algorithms given as follows (1) [93], [94]: f 12 − f 21 Z=√ f 12 + f 21

(1)

where f 12 denotes the number of samples that are misclassified on the first confusion matrix but correctly classified on the second confusion matrix. f 21 denotes the number of samples that are misclassified on the second confusion matrix but correctly classified on the first confusion matrix. A difference in accuracy between the confusion matrices of two algorithms that used different WV-2 spectral subsets is statistically significant (p ≤ 0.05) if a Z-value is more than 1.96 [93], [94]. IV. R ESULTS A. Classification Results

Fig. 5. Classification maps obtained using four additional WV-2 bands (AB). (a) SVM algorithm. (b) ANN algorithm.

and user’s accuracy (UA) [39], [91]. OA is overall probability that test sample points on the image have been classified properly. PA, which is expressed as a percentage (%), denotes the likelihood of a certain class being correctly classified, whereas UA refers to the probability that a sample is labeled as a specific class and the classifier accurately assigns it such a class. It has become customary in remote sensing studies to report the kappa index of agreement for accuracy assessment purposes since kappa also compares two maps that show a set of categories. However, recent studies have shown some limitations of kappa since it gives information that is redundant or misleading for practical decision making [92]. The authors

Results of the grid search and 10-fold cross validation method indicated optimal values of γ and C for SVM, respectively, of 1 and 10 for 8B, 1 and 1000 for SB, and 1 and 100 for AB. When these optimal values were input into SVM algorithm, minimum overall classification errors of 38.40%, 39.40%, and 36.90% for 8B, SB, and AB, respectively, were obtained. Figs. 3–5 show the spatial distribution of the six ETS and other classes in the study area when WV-2 8B, SB, and AB were classified using SVM and ANN. The main visual difference between the maps in Figs. 3–5 is that a relatively homogeneous map was produced when the 8B was used as compared with other WV-2 spectral subsets. The maps also show that the study area was mainly surrounded by grassland, sugarcane, and plantation forest, while the forest and grassland on the north eastern part of the study area were relatively patchy. The results in Table II show areas under each ETS and other classes obtained from different WV-2 spectral subsets using SVM and ANN classification algorithms. This table demonstrates that when WV-2 8B subset and SVM classifier were used, the plantation class obtained the largest area (4063.5 ha),

Thisarticlehasbeenacceptedforinclusioninafutureissueofthisjournal.Contentisfinalaspresented,withtheexceptionofpagination. 8

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING

TABLE II A REA U NDER E ACH ETS AND LULC C LASSES IN THE S TUDY A REA O BTAINED U SING WV-2 DATA , SVM, AND ANN C LASSIFICATION A LGORITHMS

8B, WV-2 eight bands subset; SB, WV-2 SB subset; and AB, WV-2 AB.

while grassland occupied most of the study area (2472.1 ha) when SB subset and SVM classifier were deployed. With regard to ETS, results show that the AA achieved the largest area when WV-2 AB subset and SVM (1316.3 ha) as well as ANN (1315.1ha) classifiers were used, while the HC achieved the smallest area (478.1 ha) when WV-2 SB subset and SVM were employed (Table II).

B. Accuracy Assessment The ETS (AA, EC, HC, HU, ScB, and TD) and LULC were successfully mapped with a higher OA for 8B followed by AB and SB when SVM was used (Table III). For ANN, a higher OA was also achieved using 8B followed by AB and SB (Table IV). The SVM classifier obtained high QD values for AB followed by 8B and SB which obtained the same values of QD (Table III). While the ANN classifier obtained high-QD values for AB followed by 8B and SB that have achieved the same values of QD, but these values are less than those obtained using SVM (Table IV). Additionally, the tables also show relatively highAD values for SB followed by AB and 8B that obtained the lowest values when both SVM and ANN algorithms were used (Tables III and IV). Figs. 3–5 show the classification maps (LULC and ETS) of the study area. The maps show nearly similar spatial distribution of ETS in the study area. The individual UA and PA for each ETS achieved by the two classification algorithms are shown in Figs. 6 and 7. When WV-2 8B subset was classified using SVM, the UA for some ETS (AA, HU, HC, and ScB) was relatively higher and ranged between 73.17% and 85.71%. While for TD and EC, the accuracy was less than 55% [Fig. 6(a)]. Additionally, the use of WV-2 8B subset and ANN resulted in relatively higher UA for HC (73.68%) and ScB (81.81%). Whereas, all other four ETS (AA, HU, EC, and TD) achieved UA of less than 65% [Fig. 6(b)].

Regarding the PA, the two classification algorithms (SVM and ANN) mapped AA with an accuracy ranging between 80.00% and 100.00% (Fig. 7). Overall, all ETS were classified with fairly higher PA, except for HU (Figs. 6 and 7). Additionally, all LULC obtained individual accuracies of more than 50.00% when SVM, ANN, and all WV-2 subsets were employed (Figs. 6 and 7), except grassland class (45.45%) when SVM and WV-2 SB were used [Fig. 7(a)]. According to McNemar’s test, there were significant differences (Z ≥ 1.96) at 95% confidence level among the confusion matrices of SVM and ANN classification algorithms using WV-2 8B, SB, and AB spectral subsets (Table V). Regardless of the WV-2 subset used, SVM significantly outperformed ANN in mapping the ETS and LULC classes in the study area. V. D ISCUSSION AND C ONCLUSION This study examined the performance of SVM and ANN algorithms and the potential of new generation multispectral WV-2 data to map the spatial extent of ETS in the Dukuduku indigenous forest of South Africa. LULC in the study area was also classified to map target species within different LULC patterns and to produce a wall-to-wall thematic map. Tree species maps obtained from commonly used medium-spatial-resolution multispectral satellite (30–100 m) have often been less accurate for operational application [27], [53]. Conversely, the use of imagery from very high spatial resolution sensors (