Testing artificial neural network (ANN) for spatial ...

Testing artificial neural network (ANN) for spatial interpolation Veronika Nevtipilova1, Justyna Pastwa1, Mukesh Singh Boori1, 2, Vit Vozenilek1 [email protected], [email protected], [email protected] 1 Dept. of Geo-informatics, Palacky University Olomouc, Czech Republic - Europe 2 Dept. of Environmental Science, JECRC University Jaipur, Rajasthan - India Abstract: The aim of this research is to test the use of Artificial Neural Network (ANN) module in GRASS GIS software for spatial interpolation and to compare it with common interpolation techniques IDW and ordinary kriging. This module was also compared with neural networks packages nnet and neuralnet in R Project. The evaluation of methods was based mainly on RMSE. Multilayer perceptron with back propagation training algorithm was used to perform the interpolation both in GRASS GIS and R Project. All the tests were done on artificial data created in R Project which simulated three surfaces with different characteristics. In order to find the best configuration for the multilayer perceptron many different settings were tested. Number of neurons in hidden layers was the main tested parameter. In GRASS GIS two slightly different approaches were used for interpolation using multilayer perceptron. The first model of neural network was trained on raster data file. This was the original use of ANN module. Then a vector data file was prepared with the use of new script in Java (not a part of the ANN module) and a new neural network model was trained on this file. After the multilayer perceptron were trained interpolation models was made in the same way with both networks. The results indicate that multilayer perceptron in the ANN module can be used for spatial interpolation purposes. However the resulting RMSE was higher than RMSE from IDW and ordinary kriging methods. Also the time demands were higher when using the neural networks ANN module. When compared with neural network packages in R Project it is better to use the packages in R Project. Training of multilayer perceptron was faster in this case and results were the same or slightly better. Keywords: ANN, IDW, Kriging, Interpolation, GRASS GIS, Nnet, Neuralnet 1. Introduction Spatial interpolation is quite frequently used method for working with spatial data. Currently there are many interpolation methods, each of which has its application. The level of accuracy of these methods is limited, and therefore the spatial interpolation looking for new techniques and methods. One of these techniques is the use of neural networks. The principle of neural networks is known for a very long time, the first artificial neuron was constructed in 1943 (Freedom, 2008). However their use in the field of geo-informatics only started to develop in recent years. From the available literature it is evident that neural networks are using the spatial interpolation of the very good results comparable with other interpolation methods, in some cases even better (Snell, 2000; Bhaskaran, 2010; Chowdhury, 2010). Using neural networks for spatial interpolation is not yet very widespread issue among regular users of GIS, since most of the available GIS software is not implemented itself a neural network. GRASS GIS software is one of the few for which there is a module to work with neural networks, namely the multilayer perceptron model. This work is engaged in testing of this module and its comparison with other interpolation methods. The aim of this research is to use neural networks for normal interpolation and determine whether the quality of the resulting interpolation comparable with other conventional methods. In this paper, first we mention objectives of the research work. Second summarizes 1

the methods used and work progress. In third part we briefly describes the theoretical basis used interpolation methods - that is, neural networks, IDW and Kriging. The neuronal network is more attention due to the main topic of this work. This part also deals with the implementation of neural networks in two software’s used in this work and briefly assesses and compares examples from the literature on the use of neural networks for spatial interpolation. Fourth describes - data creation, selection best neural network, process, steps of used commands and settings for custom interpolation in the GRASS GIS software and R Project. It is described therein subsequent processing data. The last part summarized results, this part presents and evaluates outcomes of previous interpolation methods used compares and evaluates their quality. Also compare neural network method in GRASS GIS software and R Project. 2. Methodology and data processing Spatial interpolation is a process in which the known values of a certain phenomenon estimating the value in places where not measured. This only applies to continuous phenomena, discontinuous phenomena does not make sense to interpolate. The essence of most methods of spatial interpolation Toblerův summarizes the law: "All seats are together related, but more sites are linked over the distant "(Longley et al., 2005). Layout of neural networks: First create several neural networks with different number of neurons in hidden layers. Then test training datasets using back propagation algorithms. This test was to calculate mean square error (RMSE) and determined intervals of neurons, wherein the network for best results. Grids with a number of neurons in this interval were again trained enough on data that present a variety of rugged terrain. For each of the proposed network was RMSE calculated for each dataset and the chosen network, which showed the smallest average RMSE on test data. Custom Interpolation: On a dataset that was not part of the selection of the appropriate network was interpolation using IDW, kriging and neural network methods, which came from the previous test as best programs in GRASS GIS software and R Project. Interpolation was once each articulation surface. The GRASS GIS software was created twelve Raster (three using neural network trained on raster data, six on vector data and three IDW), R Project was created twelve Raster (six neural networks of the two packages, six using IDW and kriging). Evaluation of the results of interpolation: For the entire resulting interpolated surface was calculated root mean square error (RMSE) by the following formula:

(1) Where is the number of points for which RMSE is calculated, is the value in point from the interpolated surface; is the original value at this point. In order to visually compare the differences between the methods were subtracted from each the resulting bitmaps and obtained values of the differences between them so subtracted grid created by the methods IDW and kriging grid created by using neural networks. Neural Network: The biological neuron and its simplified features serve as the basic unit of artificial neural networks. These were created as a simplified mathematical model simulating operation of the human brain (Rumelhart et al., 1994). The formal neuron based 2

on a biological neuron, form the basis mathematical model of neural network. It consists of input vector x = ( 1, .....xn) from real input replaced dendrites. Each input is evaluated the real synaptic weight, which can be positive or negative. The synaptic weights are vector w = ( 1,......n). Another input neuron input is 0 = 1 rated weight 0, which represents the threshold (Freedom, 2008). A simple neuron model shown in figure1:

Figure 1: Formal neuron (http://www.root.cz/clanky/biologicke-algoritmy-4-neuronove-site/) Sum of all input weighted _

indicates the internal potential of the neuron

(2) If the value of the internal potential reaches the threshold 0 0, then the output is (State) neuron. This output value is modified by an activation (transfer) function : = ( _ ). The simplest type of activation function is sharp nonlinearity in this form (Free, 2008). This simple model neuron is known as perceptron (Voženílek, 2011).

(3) The neural network creates the interconnection of neurons. The connection method is such that the output from one neuron is the input of other neurons (Freedom, 2008). Neurons in the network are organized into layers (Figure 2). Each network includes input and output layer and any number of hidden layers (Voženílek, 2011). The number of neurons in each layer and their interconnection puts together the network architecture. The activation functions in most cases for all neurons in the same network (Freedom, 2008). An important feature of neural network classification is changing weights between neurons. The weights in the network are strengthened or weakened, depending on whether the right to keep or wrong answers. There are various types of classification such as supervised and unsupervised (Voženílek, 2011), free (2008) mentions as another type of classification called reinforcement. Multilayer perceptron is multilayer neural network in which each neuron is modeled as a perceptron. Activation functions of neurons in multilayer perceptron is differentiable continuous function, most used is a sigmoid function (Tarassenko, 1998): 3

(4)

Input layer

Hidden (internal) layer

Output layer

Figure2: Example of a neural network (freely adapted from (Freedom, 2008)) In multilayer networks lead to the full interconnection of neurons - each neuron the layer is connected to all the neurons of the above (following) layer. According to Broad (2008) the network in figure2 indicates a three-layer (input, internal, output layer), Tarassenko (1998), such a network is understood as a two-layer - a layer indicates the weight rated connection instead of neurons because neurons do not form a network inputs only vector of values. The most common type of networks are forward type networks (feed forward). Propagation they precede as follows. Input vector is multiplied by a vector of weights and transferred to neurons in the internal (hidden) layer. In these neurons will processing of input values activation functions, the resulting vector is multiplied by weights and used as input to another (in this case the output) layer. The output from this layer is also the output of the entire network. Back propagation algorithm: The most commonly used learning algorithm for multilayer neural networks is back-propagation. This is a supervised classification. According to Broad (2008) takes place in three stage. The first of these is the forward propagation. The second phase is back propagation. For each neuron in the output layer is calculated δ , what is often missing which is retroactively spread to neurons in the previous layer according to the following formula:

4

(5) Where is the expected output of the neuron, is the calculated output and _ is internal potential neuron . Then each neuron in the inner layer is assigned sum δ input from the output layer.

(6) With this sum and the derivative of the internal potential _ partial error δ

neuron

is calculated

(7) Calculated δ used to adjust the weights between the input layer and the internal network: Δ = αδ where α is the coefficient of learning and is the input value of the network. With δ is adjusted balance between internal and output layer of the network: Δ = αδ0 where the output value neuron . The third phase is to update the weights of connections neurons. New weight denoted as and old as . Then ( ) = ( ) + Δ and ( ) = ( )+Δ 3. Implementation in GIS GRASS GIS: The GRASS GIS software to work with neural networks using five ANN scripts. These scripts are written in the Python programming language and used to work with neural networks library FANN (Fast Artificial Neural Network) first Scripts are designed to work with raster data and allow to use multilayer perceptron with back propagation algorithm, used for training. The author of this module is Pawel Netzel. Second Ann.data.rast.py script use for raster data to determine the required input and output networks, from these data a certain number of randomly distributed control vector points needed and then creates a file with the extension. Data that is used for classification in neural network, network parameters are set using a script ann.create.py. Here we enter the number of input and output neurons and the number of neurons in the hidden layers, further degree of neuronal connections and classification coefficient. The classification network is created using a script ann.learn.py. This script uses the classification back propagation algorithm and there is a possibility of error and enter the desired maximum number of times. Length of classification depends on the number of hidden layers and neurons in them and number of control points, from which the file is created data from which the network classify. For the actual calculation of the network through trained by the script ann.run.rast.py. User to input coordinates in raster format and the name of the network trained enough. Calculating speed is affected by the resolution of raster for which the calculation is performed. R Project: The R Project software is used to calculate training and networking packages in Nnet and NeuralNet. The Nnet package (Venables and Ripley, 2002), provides training feed forward networks with a classification back propagation algorithm, and one hidden layer. There are basic setting 5

parameters: input and desired output, the number of neurons in the hidden layer, number to random initialization of weights, the maximum number of times. Training the network is relatively fast and the results are comparable with other methods. The NeuralNet package (Fritsch et al., 2012) in many respects resembles Package nnet, but the possibility of setting parameters is more. Compared to the previous package is here can be set more hidden layers with any number of neurons in each of them, as well as There is a choice of several types of learning algorithms and the type of activation functions. Training network is slower than the package nnet, but trained enough networks usually exhibit at testing error is less than the network package nnet. Calculations through trained networks in both packages are very similar. It is awarded trained enough and network input data, for which the results will be counted. Calculation takes a very short time for both packages. Inverse distance weighting (IDW): IDW or method of inverse distance is one of the most popular interpolation methods. It is very simple to implement, understand and use it. IDW among exact interpolation method in the calculation of the measured change known values (Longley et al., 2005). Inverse distance weighting model (IDW) method estimates the unknown value as a weighted average of the surrounding known values, while values closer to have more weight than those farther away. The value of an unknown point is calculated using following formula:

(8) Where ( ) is an unknown value, , are known value and 2005).

their weight (Longley et al.,

Figure 3: IDW (Source: Longley et al. (2005)) There are several ways to determine the weight of points with known values. Most often used is one in which the weight is calculated as the reciprocal of the distance from point with an unknown value and distance is enhanced parameter (Horak, 2011). 6

(9) A setting range 1 to 3 If 1, then the opposite is true (Neteler and Mitasova, 2008). Selection of points with known values from which the unknown value will be calculated can be done in several ways. Most often determines the number of nearest points enters into the calculation, another option is to determine the threshold distance, the body behind it’s have no effect on the calculation (Longley et al., 2005). IDW method has characteristic undesirable phenomena that arise as a result of the calculation of average weighted. The method of calculation allows the creation of new values only in the range of existing values. If interpolated terrain directed by peaks and valleys, then peaks will appear as depressions and vice versa (Longley et al., 2005). Another undesirable phenomenon is the formation of concentric contour lines around points with the initial values of the so-called bull's eyes (Horak, 2011). IDW method is readily available in most GIS software such as ArcGIS, QGIS, GRASS GIS, IDRISI, R. Ordinary kriging: The basic idea to find Kriging is certain general characteristics measured by values and applying these properties when calculating the unknown values. From these most important feature is the smoothness. The theory says that the values in closer the points are more similar than in more distant points. The difference between the two values points is calculated as (10)

Figure 4: Example semivariogramu (Source: freely adapted from Longley et al. (2005)) With increasing distance, it is likely that this difference will increase until certain distance and then it will not change (Longley et al., 2005). Figure 4 is semivariogramu example that illustrates how changes differences pairs of points from the measured values. Value sill indicates enhanced by half the difference between a pair of points - ie semivarianci, range is the distance at which the sill value it remains unchanged. Value semivariance is never 0, even zero distance. It is therefore in semivariogramu nugget parameter that indicates the difference in value between the two points at the same site, or at a very small distance. Crosses indicate

7

the difference between selected pairs of points; wheels are averages of the values at a certain distance (Longley et al., 2005). Ordinary kriging is standard and frequently used version of Kriging. Value calculated using following formula:

in point

0

is

(11) Where is the number of points used to calculate weight is calculated for a point ( ) and point ( ) is a point with a known value . In short, therefore λ0 is the vector of weights and is the vector points of known values . Weights are calculated using the system of equations (Hengl, 2009). As IDW is known as the kriging interpolation method available in popular GIS software. 4. Testing interpolation methods Creating Data: Neural networks have using for interpolation testing purposes in the R Project and created three datasets that simulate different surface topography. The most indented surface topography was identified as articulation 1, medium as rugged surface topography and at least 2 rugged surface topography as a third Datasets were randomly generated using GRF (Gaussian random fields) from Geor package that creates points and randomly assigned them values. These values are influenced by other parameters of the function. grf(pocetBodu, grid = "reg", cov.pars = c(sill, range), nug = nugget, cov.model = covModel, aniso.pars = c(anisotropyDirection, anisotropyRatio), xlims = xlims, ylims = ylims) Parameter grid = "reg" indicates that the points are generated in a regular grid. This parameter indicates the type cov.model variogram, in this case a spherical variogram. Values xlims and ylims were awarded in the range of 0 - 1 for each dataset was Generated in 1024 in a regular grid. These points have three attributes: coordinates and in the range of 0-1 (influenced by parameters xlims and ylims) and the value of , which represented height. The values of the parameters affecting segmentation that was used in the creation of datasets (Table 1). Table 1: Parameter values for all the contours of the surface sill range nugget anisotropy ratio Articulation 1 0.12 0.3 0.00001 0.8 Articulation 2 0.08 0.5 0.00001 0.8 Articulation 3 0.01 1.2 0.00001 0.3 The range of values for each range values lower.

varied topography. Data with lower segmentation should

∙ articulation 1: -0.7396643 to 1.090838 ∙ articulation 2: -0.534922 to 0.8930179228 ∙ articulation 3: -0.2012766 to 0.1411988275 Figures 5 are shown in the distribution of points in different datasets. The larger diameter wheels indicate higher . Then, using the sample formed from each dataset training and test data. The training data included 724 randomly selected points and was used for neural 8

networks and learning for its own interpolation. Test data contained remaining 300 points were used to calculate the RMSE.

Figure 5: Distribution of points of articulation 1, 2 and 3. Selecting nnet package: It creates a script that gradually creating datasets for all surface topography. Ten datasets was created and these datasets were divided into training and test data. Furthermore, the trained neural network on training data from all data sets and calculated RMSE for test data. This procedure was carried out a total of fifteen times, each time through the varied number of neurons in hidden layer in the range 15 to 30. For all segmentation was calculated as the average RMSE of all datasets. Finally, the calculated diameter of articulation for each setting network. The results were listed in Table 2. 28 neurons came out as the best network with in the hidden layer. Table 2: Average RMSE for testing network settings (nnet) number of neurons 28 24 25 26 22 30 19 20 18 29 21 16 23 27 17 15

articulation 1

articulation 2

articulation 3

0.148758 0.151135 0.149176 0.154605 0.152875 0.157484 0.154560 0.157060 0.156477 0.163756 0.161621 0.161058 0.167908 0.158991 0.166058 0.164467

0.096397 0.096007 0.099806 0.095057 0.097339 0.095408 0.097975 0.097911 0.098647 0.095861 0.097511 0.099874 0.095541 0.106556 0.100206 0.103390

0.017947 0.018192 0.018237 0.017955 0.018894 0.018022 0.018826 0.018621 0.019657 0.018037 0.019381 0.019209 0.019640 0.018620 0.019149 0.019708

segmentation average 0.087701 0.088444 0.089073 0.089206 0.089703 0.090304 0.090454 0.091197 0.091594 0.092551 0.092838 0.093380 0.094363 0.094722 0.095138 0.095855

Selecting NeuralNet package: this is similar to the nnet package. Difference was that the package NeuralNet allows trained networks with more hidden layers. Gradually networks 9

were tested with one to four hidden layers. The number of neurons in the first hidden layer is always moved in the interval 15 to 30. The number of neurons in other hidden layers had been fixed and was selected from a number of 5, 10, 15, 20, 25. For each test set was again calculated the average RMSE for all datasets. 24, 15, 10, 5 number of neurons was selected as the best network with four hidden layers. Results of testing the network viz in Table 3. Table 3: Average RMSE for testing network settings (NeuralNet) number of segmentation articulation 1 articulation 2 articulation 3 neurons average 24 0.149159 0.098706 0.029940 0.092602 16 0.153596 0.098004 0.029210 0.093603 15 0.148941 0.100402 0.032237 0.093860 22 0.150838 0.097616 0.033587 0.094014 20 0.149615 0.098070 0.035774 0.094486 30 0.150110 0.098537 0.035594 0.094747 17 0.152036 0.100431 0.031792 0.094753 23 0.152735 0.099836 0.031717 0.094763 27 0.155015 0.096519 0.033971 0.095169 29 0.150820 0.098389 0.036693 0.095301 26 0.152207 0.099028 0.034765 0.095333 28 0.151465 0.098248 0.038276 0.095996 21 0.153177 0.099907 0.036171 0.096418 19 0.150610 0.102501 0.036541 0.096551 18 0.152551 0.101085 0.038194 0.097277 25 0.156908 0.101286 0.033683 0.097292 Selecting the network in GRASS GIS: The basis for the choice of setting the previous tests in the R Project. Other source of the poster author ANN module (Netzel, 2011). The first was tested network, which came out as the best package for NeuralNet. The module ANN also allows train networks with multiple hidden layers. Also, the poster was used to interpolate network with four hidden layers. However, the learning time of the networks was very long and although error on the training data from the beginning of the decline, after a number of iterations started again rise and the resulting RMSE was too big. Networks were also tested with a lower number of hidden layers. Resulting RMSE was again excessively high, the values were several orders of magnitude higher than expected. As best shown in use a network with three hidden layers. For this network, first were tested the number of neurons in the interval 15 to 30 in all hidden layers. This number, however, was insufficient, thus gradually the numbers of neurons increased and was observed falling RMSE. This continued until the RMSE started to rise. This procedure was finally achieved network with three hidden layers and the number of neurons 32, 38, 27. For the purpose of testing other options module ANN was similarly way, a new network has three hidden layers and the number of neurons 20, 25, 17. IDW: The R Project software implements this method several packages in this work was gstat package used with the function idw. idw_result