ARTIFICIAL NEURAL NETWORKS - IASRI

ARTIFICIAL NEURAL NETWORKS GIRISH KUMAR JHA Indian Agricultural Research Institute PUSA, New Delhi-110 012 [email protected] 1. Introduction Artificial Neural Networks (ANNs) are non-linear mapping structures based on the function of the human brain. They are powerful tools for modelling, especially when the underlying data relationship is unknown. ANNs can identify and learn correlated patterns between input data sets and corresponding target values. After training, ANNs can be used to predict the outcome of new independent input data. ANNs imitate the learning process of the human brain and can process problems involving non-linear and complex data even if the data are imprecise and noisy. Thus they are ideally suited for the modeling of agricultural data which are known to be complex and often non-linear. ANNs has great capacity in predictive modeling i.e., all the characters describing the unknown situation can be presented to the trained ANNs, and then prediction of agricultural systems is guaranteed. An ANN is a computational structure that is inspired by observed process in natural networks of biological neurons in the brain. It consists of simple computational units called neurons, which are highly interconnected. ANNs have become the focus of much attention, largely because of their wide range of applicability and the ease with which they can treat complicated problems. ANNs are parallel computational models comprised of densely interconnected adaptive processing units. These networks are fine-grained parallel implementations of nonlinear static or dynamic systems. A very important feature of these networks is their adaptive nature, where “learning by example” replaces “programming” in solving problems. This feature makes such computational models very appealing in application domains where one has little or incomplete understanding of the problem to be solved but where training data is readily available. ANNs are now being increasingly recognized in the area of classification and prediction, where regression model and other related statistical techniques have traditionally been employed. The most widely used learning algorithm in an ANN is the Backpropagation algorithm. There are various types of ANNs like Multilayered Perceptron, Radial Basis Function and Kohonen networks. These networks are “neural” in the sense that they may have been inspired by neuroscience but not necessarily because they are faithful models of biological neural or cognitive phenomena. In fact majority of the network are more closely related to traditional mathematical and/or statistical models such as non-parametric pattern classifiers, clustering algorithms, nonlinear filters, and statistical regression models than they are to neurobiology models. ANNs have been used for a wide variety of applications where statistical methods are traditionally employed. The problems which were normally solved through classical statistical methods, such as discriminant analysis, logistic regression, Bayes analysis, multiple regression, and ARIMA time-series models are being tackled by ANNs. It is, therefore, time to recognize ANN as a powerful tool for data analysis.

Artificial Neural Networks

2. Review Mc-Culloch – Pitts (1943) proposed a model of “computing elements” called Mc-Culloch – Pitts neurons, which performs weighted sum of the inputs to these elements followed by a threshold logic operation. Combinations of these computing elements were used to realize several logical computations. The main drawback of this model of computation is that the weights are fixed and hence the models could not learn from examples which is the main characteristic of the ANN technique which later evolved. Hebb (1949), proposed a learning scheme for adjusting a connection weight based on pre and post synaptic values of the variables. Hebb’s law became a fundamental learning rule in neuron – network literature. Rosenblatt (1958), proposed the perceptron models, which has weight adjustable by the perceptron learning law. Widrows and Hoff (1960) and his group proposed an ADALINE (Adaptive Linear Element) model for computing elements and LMS (Least Mean Square) learning algorithm to adjust the weights of an ADALINE model. Hopfield (1982), gave energy analysis of feed back neural networks. The analysis has shown the existence of stable equilibrium states in a feed back network, provided the network has symmetrical weights. Rumelhart et al. (1986), showed that it is possible to adjust the weight of a multilayer feed forward neural network in a systematic way to learn the implicit mapping in a set of input – output patterns pairs. The learning law is called generalized delta rule or error back propagation. An excellent overview of various aspects of ANN is provided by Cheng and Titterington (1994) and Warner and Misra (1996). Kaastra and Boyd (1996) developed neural network model for forecasting financial and economic time series. Dewolf et al. (1997, 2000) demonstrated the applicability of neural network technology for plant diseases forecasting Zhang et al. (1998) provided the general summary of the work in ANN forecasting, providing the guidelines for neural network modeling, general paradigm of the ANNs especially those used for forecasting, modeling issue of ANNs in forecasting and relative performance of ANN over traditional statistical methods. Sanzogni et al. (2001) developed the models for predicting milk production from farm inputs using standard feed forward ANN. Gaudart et al. (2004) compared the performance of MLP and that of linear regression for epidemiological data with regard to quality of prediction and robustness to deviation from underlying assumptions of normality, homoscedasticity and independence of errors. More general books on neural networks and related topics contain separate chapters/sections on neural networks, to cite a few, Hassoun (1995), Patterson (1996), Schalkoff (1997), Yegnanarayana (1999), Anderson (2003), etc. Software on neural networks has also been made and are as follows: Commercial Software:- Statistica Neural Network, TNs2Server,DataEngine, Know Man Basic Suite, Partek, Saxon, ECANSE - Environment for Computer Aided Neural Software Engineering, Neuroshell, Neurogen, Matlab:Neural Network Toolbar, Tarjan, FCM(Fuzzy Control manager). Freeware Software:- NetII, Spider Nets Neural Network Library, NeuDC, Binary Hopfeild Net with free Java source, Neural shell, PlaNet, Valentino Computational Neuroscience Work bench, Neural Simulation language version-NSL, Brain neural network Simulator. In India, few studies have been conducted using ANN models. Kumar et. al. (2002) studied utility of artificial neural networks (ANNs) for estimation of daily grass reference 2


crop evapotranspiration and compared the performance of ANNs with the conventional method used to estimate evapotranspiration. Pal et al. (2002) developed MLP based forecasting model for maximum and minimum temperatures for ground level at Dum Dum station, Kolkata on the basis of daily data on several variables, such as mean sea level pressure, vapour pressure, relative humidity, rainfall, and radiation for the period 1989-95. Madhav (2003) forecasted wheat productivity of Junagadh (Gujarat) using data upon weather parameters based on ANN model. 3. Development of an ANN Model Development of ANN model is discussed here briefly. ANNs are constructed with layers of units, and thus are termed multilayer ANNs. A layer of units in such an ANN is composed of units that perform similar tasks. First layer of a multilayer ANN consists of input units. These units are known as independent variables in statistical literature. Last layer contains output units. In statistical nomenclature, these units are known as dependent or response variables. All other units in the model are called hidden units and constitute hidden layers. There are two functions governing the behaviour of a unit in a particular layer, which normally are the same for all units within the whole ANN, i.e. • •

the input function, and the output/activation function.

Input into a node is a weighted sum of outputs from nodes connected to it. The input function is normally given by equation (1) as follows: net i = ∑ w ijx j + µi j

where net i describes the result of the net inputs x i (weighted by the weights wij ) impacting on unit i. Also, wij are weights connecting neuron j to neuron i, x j is output from unit j and µ i is a threshold for neuron i. Threshold term is baseline input to a node in absence of any other inputs. If a weight wij is negative, it is termed inhibitory because it decreases net input, otherwise it is called excitatory. Each unit takes its net input and applies an activation function to it. For example, output of jth unit, also called activation value of the unit, is g (∑ w ji x i ) , where g(.) is activation function and x i is output of ith unit connected to unit j. A number of nonlinear functions have been used in the literature as activation functions. The threshold function is useful in situations where the inputs and outputs are binary encoded. However, most common choice is sigmoid functions, such as

[

g(netinput ) = 1 + e − netinput

]

−1

or g (netinput ) = tanh( netinput ) The activation function exhibits a great variety, and has the biggest impact on behaviour and performance of the ANN. The main task of the activation function is to map the outlying values of the obtained neural input back to a bounded interval such as [0, 1] or [– 1, 1]. The sigmoid function has some advantages, due to its differentiability within the 3


context of finding a steepest descent gradient for the backpropagation method and moreover maps a wide domain of values into the interval [0, 1]. The various steps in developing a neural network forecasting model are:

3.1 Variable Selection The input variables important for modeling/ forecasting variable(s) under study are selected by suitable variable selection procedures. 3.2 Formation of Training, Testing and Validation Sets The data set is divided into three distinct sets called training, testing and validation sets. The training set is the largest set and is used by neural network to learn patterns present in the data. The testing set is used to evaluate the generalization ability of a supposedly trained network. A final check on the performance of the trained network is made using validation set. 3.3 Neural Network Architecture Neural network architecture defines its structure including number of hidden layers, number of hidden nodes and number of output nodes etc. (i)

Number of hidden layers: The hidden layer(s) provide the network with its ability to generalize. In theory, a neural network with one hidden layer with a sufficient number of hidden neurons is capable of approximating any continuous function. In practice, neural network with one and occasionally two hidden layers are widely used and have to perform very well. (ii) Number of hidden nodes: There is no magic formula for selecting the optimum number of hidden neurons. However, some thumb rules are available for calculating number of hidden neurons. A rough approximation can be obtained by the geometric pyramid rule proposed by Masters (1993). For a three layer network with n input and m output neurons, the hidden layer would have sqrt(n*m) neurons. (iii) Number of output nodes: Neural networks with multiple outputs, especially if these outputs are widely spaced, will produce inferior results as compared to a network with a single output. (iv) Activation function: Activation functions are mathematical formulae that determine the output of a processing node. Each unit takes its net input and applies an activation function to it. Non linear functions have been used as activation functions such as logistic, tanh etc. The purpose of the transfer function is to prevent output from reaching very large value which can ‘paralyze’ neural networks and thereby inhibit training. Transfer functions such as sigmoid are commonly used because they are nonlinear and continuously differentiable which are desirable for network learning.

3.4 Model Building Multilayer feed forward neural network or multi layer perceptron (MLP), is very popular and is used more than other neural network type for a wide variety of tasks. Multilayer feed forward neural network learned by back propagation algorithm is based on supervised procedure, i.e., the network constructs a model based on examples of data with known output. It has to build the model up solely from the examples presented, which are together assumed to implicitly contain the information necessary to establish the relation. 4


An MLP is a powerful system, often capable of modeling complex, relationships between variables. It allows prediction of an output object for a given input object. The architecture of MLP is a layered feedforward neural network in which the non-linear elements (neurons) are arranged in successive layers, and the information flow uni-directionally from input layer to output layer through hidden layer(s). The characteristics of Multilayer Perceptron are as follows: (i)

has any number of inputs

(ii)

has one or more hidden layers with any number of nodes. The internal layers are called “hidden” because they only receive internal input (input from other processing units) and produce internal output (output to other processing units). Consequently, they are hidden from the output world.

(iii)

uses linear combination function in the hidden and output layers

(iv) uses generally sigmoid activation function in the hidden layers (v)

has any number of outputs with any activation function.

(vi)

has connections between the input layer and the first hidden layer, between the hidden layers, and between the last hidden layer and the output layer.

An MLP with just one hidden layer can learn to approximate virtually any function to any degree of accuracy. For this reason MLPs are known as universal approximates and can be used when we have litter prior knowledge of the relationship between input and targets. One hidden layer is always sufficient provided we have enough data. Schematic representation of neural network is given in Fig. 1 and mathematical representation of neural network is given in Fig. 2.

Output

Inputs

Fig. 4.1: Schematic representation of neural network

Fig. 4.2: Mathematical representation of neural network 5


Each interconnection in an ANN has a strength that is expressed by a number referred to as weight. This is accomplished by adjusting the weights of given interconnection according to some learning algorithm. Learning methods in neural networks can be broadly classified into three basic types (i) supervised learning (ii) unsupervised learning and (iii) reinforced learning. In MLP, the supervised learning will be used for adjusting the weights. The graphic representation of this learning is given in Fig. 3.

=

Differences

Target vector

Output vector

Input vector

ANN

Adjust weights Fig. 4.3: A learning cycle in the ANN model

4. Architecture of Neural Networks There are several types of architecture of ANN. However, the two most widely used ANN are discussed below: Feedforward Networks Feedforward ANNs allow signals to travel one way only; from input to output. There is no feedback (loops) i.e. the output of any layer does not affect that same layer. They are extensively used in pattern recognition. Feedback/Recurrent Networks Feedback networks can have signals traveling in both directions by introducing loops in the network. Feedback networks are dynamic; their 'state' is changing continuously until they reach an equilibrium point. They remain at the equilibrium point until the input changes and a new equilibrium needs to be found. 5. Supervised and Unsupervised Learning Learning law describes the weight vector for the ith processing unit at time instant (t+1) in terms of the weight vector at time instant (t) as follows: w i ( t + 1) = w i ( t ) + ∆w i ( t ) , where ∆w i ( t ) is the change in the weight vector. The network adapts as follows: change the weight by an amount proportional to the difference between the desired output and the actual output. As an equation: ∆ Wi = η * (D-Y).Ii 6


where η is the learning rate, D is the desired output, Y is the actual output, and Ii is the ith input. This is called the Perceptron Learning Rule. The weights in an ANN, similar to coefficients in a regression model, are adjusted to solve the problem presented to ANN. Learning or training is term used to describe process of finding values of these weights. Two types of learning with ANN are supervised and unsupervised learning. Supervised learning which incorporates an external teacher, so that each output unit is told what its desired response to input signals ought to be. During the learning process global information may be required. An important issue concerning supervised learning is the problem of error convergence, i.e. the minimization of error between the desired and computed unit values. The aim is to determine a set of weights which minimizes the error. Unsupervised learning uses no external teacher and is based upon only local information. We say that a neural network learns off-line if the learning phase and the operation phase are distinct. A neural network learns on-line if it learns and operates at the same time. Usually, supervised learning is performed off-line, whereas unsupervised learning is performed on-line.

6. Further Concepts on Hidden Layers There are really two decisions that must be made with regards to the hidden layers. The first is how many hidden layers to actually have in the neural network. Secondly, you must determine how many neurons will be in each of these layers. We will first examine how to determine the number of hidden layers to use with the neural network. Differences between the numbers of hidden layers are summarized in Table 6.1. Table 6.1: Determining the number of hidden layers

Number of Hidden Result Layers None Only capable of representing linear separable functions or decisions. 1 Can approximate arbitrarily while any functions which contains a continuous mapping from one finite space to another 2 Represent an arbitrary decision boundary to arbitrary accuracy with rational activation functions and can approximate any smooth mapping to any accuracy. Just deciding the number of hidden neuron layers is only a small part of the problem. You must also determine how many neurons will be in each of these hidden layers. This process is covered in the next section There are many rule-of-thumb methods for determining the correct number of neurons to use in the hidden layers. Some of them are summarized as follows: •

The number of hidden neurons should be in the range between the size of the input layer and the size of the output layer.

7

Artificial Neural Networks •

The number of hidden neurons should be 2/3 of the input layer size, plus the size of the output layer.

•

The number of hidden neurons should be less than twice the input layer size.

These three rules are only starting points. Ultimately the selection of the architecture of neural network will come down to trial and error. One approach is to split the data set about 70%, 20% and 10%, using the first set for training and the second for validation and to end learning if the reported value of the error function increases again after a longer phase of reduction in the first part of the learning.

7. Backpropagation Algorithm The MLP network is trained using one of the supervised learning algorithms of which the best known example is backpropagation, which uses the data to adjust the network's weights and thresholds so as to minimize the error in its predictions on the training set. We denote by Wij the weight of the connection from unit u i to unit u j . It is then convenient to represent the pattern of connectivity in the network by a weight matrix W whose elements are the weights Wij . The pattern of connectivity characterizes the architecture of the network. A unit in the output layer determines its activity by following a two step procedure. •

First, it computes the total weighted input x j , using the formula:

X j = ∑ yi Wij i

where yi is the activity level of the jth unit in the previous layer and Wij is the weight of the connection between the ith and the jth unit. •

Next, the unit calculates the activity y j using some function of the total weighted input.

Typically we use the sigmoid function:

[

yj = 1+ e

]

− x j −1

Once the activities of all output units have been determined, the network computes the error E, which is defined by the expression:

E=

1 (yi − di )2 ∑ 2 i

where y j is the activity level of the jth unit in the top layer and d j is the desired output of the jth unit.

The back propagation algorithm consists of four steps: (i)

Compute how fast the error changes as the activity of an output unit is changed. This 8


error derivative (EA) is the difference between the actual and the desired activity.

EA j = (ii)

∂E = yj − dj ∂y j

Compute how fast the error changes as the total input received by an output unit is changed. This quantity (EI) is the answer from step (i) multiplied by the rate at which the output of a unit changes as its total input is changed. ∂E ∂E dy j = EI j = × = EA j y j 1 − y j ∂X j ∂y j dx j

(

)

(iii) Compute how fast the error changes as a weight on the connection into an output unit is changed. This quantity (EW) is the answer from step (ii) multiplied by the activity level of the unit from which the connection emanates. ∂E ∂E ∂X j EWij = = = EI j yi × ∂Wij ∂X j ∂Wij

(iv) Compute how fast the error changes as the activity of a unit in the previous layer is changed. This crucial step allows back propagation to be applied to multilayer networks. When the activity of a unit in the previous layer changes, it affects the activities of all the output units to which it is connected. So to compute the overall effect on the error, we add together all these separate effects on output units. But each effect is simple to calculate. It is the answer in step (iii) multiplied by the weight on the connection to that output unit. ∂E ∂E ∂x j EA i = =∑ × = ∑ EI jWij ∂yi j ∂x j ∂yi j

By using steps (ii) and (iv), we can convert the EAs of one layer of units into EAs for the previous layer. This procedure can be repeated to get the EAs for as many previous layers as desired. Once we know the EA of a unit, we can use steps (ii) and (iii) to compute the EWs on its incoming connections.

References and Suggested Reading Anderson, J. A. (2003). An Introduction to neural networks. Prentice Hall. Cheng, B. and Titterington, D. M. (1994). Neural networks: A review from a statistical perspective. Statistical Science, 9, 2-54. Dewolf, E.D., and Francl, L.J., (1997). Neural network that distinguish in period of wheat tan spot in an outdoor environment. Phytopathalogy, 87(1) pp 83-87. Dewolf, E.D. and Francl, L.J. (2000) Neural network classification of tan spot and stagonespore blotch infection period in wheat field environment. Phytopathalogy, 20(2), 108-113 . Gaudart, J. Giusiano, B. and Huiart, L. (2004). Comparison of the performance of multilayer perceptron and linear regression for epidemiological data. Comput. Statist. & Data Anal., 44, 547-70. Hassoun, M. H. (1995). Fundamentals of Artificial Neural Networks. Cambridge: MIT Press. 9


Hebb,D.O. (1949) The organization of behaviour: A Neuropsychological Theory, Wiley, New York Hopfield, J.J. (1982). Neural network and physical system with emergent collective computational capabilities. In proceeding of the National Academy of Science(USA) 79, pp 2554-2558. Kaastra, I. and Boyd, M.(1996): Designing a neural network for forecasting financial and economic time series. Neurocomputing, 10(3), pp 215-236 (1996) Kumar, M., Raghuwanshi, N. S., Singh, R,. Wallender, W. W. and Pruitt, W. O. (2002). Estimating Evapotranspiration using Artificial Neural Network. Journal of Irrigation and Drainage Engineering, Vol. 128(4), pp. 224-233 Madhav, Kandala Venu (2003). Study of statistical modeling techniques in agricultural. Ph.D. thesis thesis, IARI. Masters, T. (1993). Practical neural network recipes in C++, Academic press, NewYork. Mcculloch, W.S. and Pitts, W. (1943) A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophy., 5, pp 115-133 Pal, S. Das, J. Sengupta, P. and Banerjee, S. K. (2002). Short term prediction of atmospheric temperature using neural networks. Mausam, 53, 471-80 Patterson, D. (1996). Artificial Neural Networks. Singapore: Prentice Hall. Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage ang organization in the brain. Psychological review, 65, pp 386-408. Rumelhart, D.E., Hinton, G.E and Williams, R.J. (1986). “Learning internal representation by error propagation”, in Parallel distributed processing: Exploration in microstructure of cognition, Vol. (1) ( D.E. Rumelhart, J.L. McClelland and the PDP research gropus, edn.) Cambridge, MA: MIT Press, pp 318-362. Saanzogni, Louis and Kerr, Don (2001) Milk production estimate using feed forward artificial neural networks. Computer and Electronics in Agriculture, 32, pp 21-30. Schalkoff, R. J. (1997). Artificial neural networks. The Mc Graw-Hall Warner, B. and Misra, M. (1996). Understanding neural networks as statistical tools. American Statistician, 50, 284-93. Widrow, B. and Hoff, M.E. (1960). Adapative switching circuit. IREWESCON convention record, 4, pp 96-104 Yegnanarayana, B. (1999). Artificial Neural Networks. Prentice Hall Zhang, G., Patuwo, B. E. and Hu, M. Y. (1998). Forecasting with artificial neural networks: The state of the art. International Journal of Forecasting, 14, 35-62.

10