Grammatical Swarm for Artificial Neural Network Training

3 downloads 704 Views 298KB Size Report
a gradient descent optimize technique to search the synaptic weight coefficients of ANN and to .... Swarm Optimizer(PSO) [28] is used as search engine in.
2014 International Conference on Circuit, Power and Computing Technologies [ICCPCT]

Grammatical Swarm for Artificial Neural Network Training Tapas Si

Department of CSE, Bankura Unnayani Institute of Engg., Bankura, W.B, India [email protected]

Arunava De

Department of IT, Dr. B.C Roy Engineering College, Durgapur, West Bengal, India [email protected]

Abstract—This paper presents a proof of concept for Artificial Neural Network training using Grammatical Swarm. Grammatical Swarm is a variant of Grammatical Evolution. The synaptic weight coefficients of a multilayer feed-forward neural network are evolved using Grammatical Swarm. The synaptic weight coefficients are derived from predefined BackusNaur Form grammar for real value generation in a specified range. The proposed method is applied to solve XOR problem and compared with the multilayer feed-forward neural network training using Particle Swarm Optimizer, Comprehensive Learning Particle Swarm Optimizer, Differential Evolution and Trigonometric Differential Evolution. The experimental results shows that Grammatical Swarm is able to train the Artificial Neural Network. Index Terms—Artificial neural network, Grammatical evolution, Grammatical swarm, Particle swarm optimizer, Comprehensive learning particle swarm optimizer , Differential evolution, Trigonometric differential evolution, XOR problem

I. I NTRODUCTION Grammatical Swarm(GS) [1] is a variant of Grammatical Evolution(GE) [2]. Grammatical Evolution is a form of Genetic Programming [3]. A variable length linear genome is used in Grammatical Evolution. Backus-Naur Form(BNF) grammar is used in GE to generate computer programs in any arbitrary language from the genome. In recent past few years, Grammatical Evolution is successfully applied in different areas [4] like horse gait optimization [5], diagnosing corporate stability [6], image processing, classification tasks [7], [8] etc. I. Tsoulos et al. [9] applied Grammatical Evolution in evolving neural network architecture and training simultaneously. Evolving neural [10] network using Evolutionary Algorithm [11] is an important research area during last several years. Artificial neural network(ANN) is a useful tool in machine learning. Back-propagation (BP) [10] algorithm is used to train the ANN in supervise learning. BP Algorithm is a gradient descent optimize technique to search the synaptic weight coefficients of ANN and to minimize the learning error in the error surface. But BP algorithm has several drawbacks [10], [11]. The error function of ANN is a multi-modal

978-1-4799-2395-3/14/$31.00 ©2014 IEEE

Anup Kumar Bhattacharjee

Department of ECE, National Institute of Technology, Durgapur, West Bengal, India [email protected],

function which has several local minima and the BP algorithm gets stuck into local minima easily. Secondly, it has slow convergence speed. Therefore evolutionary algorithms like Evolutionary Strategy(ES) [12], Genetic Algorithm(GA) [13], [14], Particle Swarm Optimization(PSO) [16]–[18], Differential Evolution algorithm [19]–[23], Cuckoo Search(CS) [24], Artificial Bee Colony(ABC) [25], Bacteria Foraging Optimization Algorithm(BFOA) [26] are used to train the ANN as an alternative of BP algorithm. A. Organization of the Article The remainder of this paper is organized as follows: In Section II, related work is described. The Grammatical Swarm algorithm is described in Section III. The devised method is discussed in Section IV. This section gives a description of ANN, BNF grammar for real value generation and description of XOR problem. Experimental setup is given in Section V. Results and discussions are given in Section VI and finally a conclusion with future works is given in Section VII. II. R ELATED W ORKS The basic motivation of this study is to use Grammatical Swarm for Artificial Neural Network Training. Evolving neural network is an important research area and so many research contributions have been given in this area. Xin Yao described the different methodologies for “Evolving Neural Network” in his most famous paper [11]. There are three ways to evolve neural network and they are:(1) evolving neural network architecture, (2) evolving synaptic weight coefficients and (3) evolving learning rule for training. As this paper is motivated in evolving synaptic weights of ANN, a literature review on evolving synaptic weights of ANN using different types of Evolutionary Algorithms as well as other metaheuristic algorithms has been described in next. In the year 2003, Fan and Lampinen [19] introduced Trigonometric Differential Evolution(TDE) algorithm and applied to train the ANN for XOR problem as well as aerodynamic fivehole probe calibration. Y. Gao et al. [20] introduced a modified

1657

2014 International Conference on Circuit, Power and Computing Technologies [ICCPCT]

DE algorithm and trained the neural network for exclusive-OR (XOR) classification problem. In the paper [21], A. Slowik et al. applied DE algorithm to train ANN and applied to classification of parity-p problem. Adam Slowik [22] applied adaptive DE algorithm with multiple trial vectors to ANN learning to classify parity-p problem. T. Si et al. [23] applied a variant of DE algorithm(DEGL) in ANN training for classification tasks of real world data set. Particle Swarm optimization algorithm is a population based global optimization algorithm and it is also used for ANN training. Y. S. Lee [16] applied PSO with bounded Vmax function in Neural Network Learning. B. Junyou used PSO-trained neural networks for Stock Price forecasting in the paper [17]. T. Si [18] used Grammatical Differential Evolution Based Adaptable PSO(GDE-APSO) and Comprehensive Learning Particle Swarm Optimizer(CLPSO - a variant of Particle Swarm Optimizer) in ANN training and applied in solving XOR problem. E. Valian [24] used improved Cuckoo Search for feed forward neural network. P.Y Kumbhar et al. [25] used Artificial Bee Colony(ABC) algorithm in ANN synthesis. Zhang Yet al. devised Bacteria Foraging Optimization based ANN for short-term load forecasting in paper [26]. In the next section, Grammatical Swarm is described in details. III. G RAMMATICAL S WARM Grammatical Swarm is developed by O’Neill et al. [1] in 2006. Grammatical Swarm is a variant of Grammatical Evolution [2]. In Grammatical Evolution, Backus-Naur Form grammar which is a meta-syntax of Context Free Grammar, is used to map from genotype to phenotype to generate computer program in any arbitrary language. Variable-length Genetic Algorithm(GA) is used as search engine in genotypeto-phenotype mapping process. On the other hand, Particle Swarm Optimizer(PSO) [28] is used as search engine in genotype-to-phenotype mapping process and fixed number of codons are used for genotype.

and position is updated by following equation: Xi (t + 1) = Xi (t) + Vi (t + 1)

(2)

In Eq. (1), W is the inertia weight in the range (0, 1), C1 and C2 are the personal cognizance and social cognizance respectively. R1 and R2 are two uniformly distributed random number in (0, 1) used for diversification. In Grammatical Swarm, authors used linearly decreasing inertia weight with time in the range (Wmin , Wmax ) = (0.4, 0.9). The corresponding equation is given in below: W = Wmax − (Wmax − Wmin ) × (

t tmax

)

(3)

B. Backus-Naur Form Grammar

Fig. 1.

An example of particle’s position that represents a genotype

The Backus-Naur Form (BNF) Grammar is used in GE for genotype-phenotype mapping. BNF is a meta-syntax used to express Context-Free Grammar(CFG) by specifying production rules in simple, human and machine -understandable manner. An example of BNF grammar is described below: 1. := () (0) | (1) 2. := + (0) |(1) |* (2) |/ (3) 3. := x1 (0) |x2 (1) |x3 (2) |x4 (3)

A. Particle Swarm Optimizer PSO is population based global optimization algorithm having stochastic nature. Each individual in PSO is termed as ‘particle’ and each solution in the search space is known as particle’s position. Each particle has velocity in each dimension. Particles updates their position by accelerating the velocity followed by a velocity update rule. Each particle has its personal best position known as ‘pbest’ stored in its memory and the best of all personal best position is known as global best position found so far and it is termed as ‘gbest’. Let particle’s position is Xi and velocity is Vi where i is the index of particle. The position Xi is represented as < Xi1 , Xi2 , Xi3 , ..., XiD > where D is the dimension of the problem to be optimized by the PSO. Personal best and global best position are denoted as Xipbest and X gbest respectively. The velocity is updated by the following equation:

A mapping process is used to map from integer-value to rule number in the derivation of expression using BNF grammar by the following ways: rule=(codon integer value) MOD (number of rules for the current non-terminal) If the current non-terminal is in the derivation process, then the rule number is generated by the following way: rule number=(180 mod 2)=0 will be replaced by ()

Vi (t + 1) = W × Vi (t) + C1 R1 (Xipbest (t) − Xi (t)) + C2 R2 (X gbest (t) − Xi (t)) (1)

1658

:=() :=() :=(x1) :=(x1/) :=(x1/) :=(x1/x4)

(180 (155 (150 (183 (241 (203

mod mod mod mod mod mod

2)=0 2)=1 3)=0 4)=3 2)=1 4)=3

2014 International Conference on Circuit, Power and Computing Technologies [ICCPCT]

and k hidden nodes in the hidden-layer, then the total number of weight coefficients including bias terms in the ANN is D = (n ∗ k + k + k ∗ m + m) = (n + m + 1)k + m. Each and every individual in GS represents a neural network and is trained with the complete training set. The devised method is termed as GSNN. B. Searching Network’s Weights using GS Fig. 2.

Feed-forward neural network

In the next section, the proposed method is described. IV. GS FOR ANN T RAINING A. Artificial Neural Network Artificial Neural Network is an information-processing model of a human brain. Figure 2 depicts a feed forward multi-layer perceptron (MLP). The n attributes in data set are used as input to NN. This work used feed forward multi-layer perceptron that has three layers known as input, hidden and output layers respectively. Each processing node, except the input layer nodes, calculates a weighted sum of the nodes in the preceding layer to which it is connected. This weighted sum passes through the transfer function to derive its output which is fed to the nodes in the next layer. The input to node j is thus obtained as netj =

M X

Wij Oi + biasj

(4)

j=1

and output as Oj = Fa (netj )

(5)

where Wij is the synaptic weight for the connection linking node i to j, value for node j,Oj is the output of node j, and Fa is the activation function (AF). Here, the sigmoid function [10] is used as an activation function and is defined as 1 (6) Fa (netj ) = 1 + enetj In this work, ANN is trained using GS to search the synaptic weight coefficients of feed forward neural network as well as to minimize the mean-square-error in the error surface . Mean Square Error (MSE) is calculated by following Equation (7 ) and it is used as a fitness function for GS algorithm. M SE =

N M 1 XX d p (O − Oij ) N.M i=1 j=1 ij

(7)

p where i is a training pattern and j is the output node. Oij denotes the predicted output of node j when the training d pattern i is applied to the network, Oij is the corresponding desired output, N=number of training samples and M=number of outputs. If a feed forward multi-layer perceptron has n input nodes in input-layer, m output nodes in output-layer

The weights of ANN are real values and weights are being searched in the continuous search space in specified range while weights are being evolved by Evolutionary Algorithms as well as other meta-heuristic algorithms. In this work, GS is used to evolve real valued weights in a specified range. Therefore, a BNF grammar should be defined for generating real values in a specified range. A. Nasuf et al. [27] presented a method for multi-objective optimization using GE. They defined problem specific BNF grammar for generating real values in a specified range. This paper adopts the same methodology for generating real valued weights and it is described next. 1) BNF Grammar for Real Value Generation: One advantage of Grammatical Evolution as well as Grammatical Swarm is that domain knowledge based and problem specific BNF grammar can be defined to solve a problem. In this work, weights are initialized in the range [−10, +10]. The following production rules are defined to evolve real value in the range [−10, +10]: 1. := . (0) |-. (1) 2. := 0 (0) |1 (1) |2 (2) |3 (3) |4 (4) |5 (5) |6 (6) |7 (7) |8 (8) |9 (9) The above grammar can generate a real value in the range [−10, 10] upto four digits after decimal point. The following grammar can be used to generate real values in the range [−5, +5]: 1. := . |-. 2. := 0|1|2|3|4 3. := 0|1|2|3|4|5|6|7|8|9 2) Solving XOR Problem: XOR problem is a widely used benchmark problem as a test case for ANN training [18], [19] and it is a non-linear separable problem. In this work, GS

1659

2014 International Conference on Circuit, Power and Computing Technologies [ICCPCT]

TABLE I M EAN AND STANDARD DEVIATION OF MSE ERROR AND CPU TIME Method GSNN PSONN CLPSO-NN DE-NN TDE-NN

Fig. 3. An example of generated real valued weights from groups of six codons in genome

algorithm is used to train the FNN for solving XOR problem. FNN is trained with four input-output pair : ([1 1]T , [0]), ([1 0]T , [1]), ([0 1]T , [1]), ([0 0]T , [0]). The synaptic weights are initialized in the range [-10, +10]. A feed-forward neural network with the fully connected architecture ‘[2 − 3 − 1]’ is used in this work. Therefore, total number of weights is 13 including bias terms. A set of 13 real values represents a neural network. According to the specified grammar, the derivation of a real value needs six codons. Therefore, a group of six codons is used to generate one real value. Hence, the genome length is 13×6 = 78(seventy eight). No wrapping1 process is required for the derivation of real values because the production rules are not recursive in nature. In the same way as in GSNN, feed-forward neural network is trained using PSO, Comprehensive Learning Particle Swarm Optimizer(CLPSO) [29], DE/best/1/bin [18] and TDE [19]. In PSO-NN [17], CLPSO-NN [18], DE-NN [18] and TDENN2 ,the weight coefficients are randomly initialized in the interval [-10,10] with uniform distribution. A comparative study of performances of GSNN, PSONN, CLPSO-NN, DE-NN and TDE-NN has been made in Section VI. V. E XPERIMENTAL S ETUP A. Parameters Settings The parameters of GS are set as following: C1 = C2 = 1.49445,Wmax = 0.9, Wmin = 0.4. Particle’s positions are initialized in the range [0, 255] and it is represented by a vector of length 100. Vmax = 0.5 × 255. A population size 30 running for 1000 iterations is used. The above same parameters excluding range and dimension of particle’s position are set for PSO and CLPSO. In DE and TDE algorithm, the parameters are set as following: scale factor=0.8, cross-over rate=0.8. Mutation probability in TDE=0.05. A population size 500 running for 60 iterations is used. For each method, separate 50 runs are carried out. B. PC Configuration 1) 2) 3) 4)

MSE Mean 0.00336 0.000126 0.00011 3.92e − 5 0.000103

Error Std. Dev. 0.00291 0.000136 3.66e − 5 6.44e − 6 6.1e − 5

CPU time(Sec.) Mean Std. Dev. 741.5238 11.7789 83.8851 0.6815 83.0068 0.4221 91.2692 0.4664 89.2391 4.1956

TABLE II R ESULTS OF T- TEST Method

h-value

p-value

GSNN PSONN

1

2.62e − 10

Better method (than GSNN) PSONN

CLPSONN DE-NN

1

3.01e − 10

CLPSO-NN

1

1.43e − 10

DE-NN

TDE-NN

1

2.46e − 10

TDE-NN

Significance

‘Extremely High’ ‘Extremely High’ ‘Extremely High’ ‘Extremely High’

VI. R ESULTS & D ISCUSSION GSNN, PSONN, CLPSO-NN,DE-NN,TDE-NN are applied to solve XOR problem and a comparative study is made. Mean and standard devationa of MSE error and CPU time over 50 separate runs are tabulated in Table I. A statistical ttest [30] with confidence interval 95% and degree of freedom 98 has been carried out between GSNN and others method by assuming that means are equal. Resuls of t-test has been tabulated in Table II. In Table II, h-value=1 indicates the rejection of null-hypothesis. Convergence graph of GSNN is plotted in Figure 4. From this experimental results, it is observed that PSONN, CLPSO-NN,DE-NN and TDE-NN statistically outperformed over GSNN. Also GS takes much higher CPU time for ANN training. Another observation is that PSO outperformed over GS in ANN training. Therefore, it may be concluded that with combination of BNF grammar and its derivation with PSO, GS can’t become able to generate better solutions. This is another finding of that work. In this work, only one problem is considered for giving a proof of concept for ANN training using Grammatical Swarm. An investigation in more details is needed with considering other benchmark problems as well as real world problems and may be done in another work. Another variant of GE, Grammatical Differential Evolution(GDE) [31] can be applied following this approach presented in this paper.

System: Windows 7 CPU: AMD FX -8150 Eight-Core 3.6 GHz RAM: 16 GB Software: Matlab 2010b

1 When derivation of strings run out of codons, derivation process starts from the beginning of the genome and this process is called wrapping. 2 Feed-forward neural network training using TDE is termed as TDE-NN in this paper.

1660

2014 International Conference on Circuit, Power and Computing Technologies [ICCPCT]

Fig. 4.

Convergence graph of GSNN

VII. C ONCLUSIONS This paper presents a proof of concept for Artificial Neural Network training using Grammatical Swarm. Grammatical Swarm is used to evolve synaptic weights of the feed-forward neural network. The devised method is applied to solve XOR problem. The experimental results show that Grammatical Swarm is able to train the Artificial Neural Network. The future works directed towards the application of Grammatical Swarm in training different type of neural network as well as real world problem solving. R EFERENCES [1] M. O’Neill and A. Brabazon, Grammatical Swarm: The Generation of Programs by Social Programming, Natural Computing 5(4),pp. 443– 462(2006) [2] M. O’Neill and C. Ryan,Grammatical Evolution, IEEE Trans. Evolutionary Computation 5(4), pp. 349–358(2001) [3] J.R. Koza,Genetic Programming: On the Programming of Computers by Means of Natural Selection, MIT Press,1992. [4] M. O’Neill and A. Brabazon, Recent Adventures in Grammatical Evolution, In: Proc. Computer Methods and Systems CMS’05,Oprogramowanie Naukowo-Techniczne Tadeusiewicz Krakow, Poland,vol. 1,pp. 245–253,(2005) [5] J.E. Murphy, M. O’Neill and H. Carr, Exploring Grammatical Evolution for Horse Gait Optimization, In: L. Venneschi et al.(Eds.): EuroGP 2009, LNCS 5481,pp. 183–194,(2009) [6] A. Brabazon and M. O’Neill, Diagnosing Corporate Stability Using Grammatical Evolution, Int. J. Appl. Math. Comput. Sci., Vol. 14, No. 3,pp. 363–374(2004) [7] A. Brabazon and M .O’Neill, Credit Classification using Grammatical Evolution, Informatica(30),pp. 325–335,(2006) [8] R. Kala, A. Shukla and R. Tiwari, A Novel Approach to Classificatory Problem Using Grammatical Evolution Based Hybrid Algorithm, International Journal of Computer Applications, Vol. 1, No. 28,pp. 61– 68,(2010) [9] I. Tsoulos, D. Gavrilis and E. Glavac, Neural Network Construction and Training using Gramatical Evolution, Neurocomputing(72),pp. 269– 277(2008) [10] Simon Haykin, Neural Networks - A Comprehensive Foundation, PHI,Second Edition(1994). [11] X. Yao, Evolving Artificial Neural Networks, Proceedings of The IEEE, Vol. 87, No. 9, pp. 1423–1447(1999). [12] M. Mandischer, A Comparison of Evolution Strategies and Backpropagation for Neural Network Training, Nerocomputing(42),pp. 87– 117,(2002) [13] F.H.F. Leung , H.K. Lam , S.H. Ling and P.K.S. Tam, Tuning of the Structure and Parameters of a Neural Network using an Improved Genetic Algorithm, IEEE Transactions on Neural Networks,Vol. 14,No. 1,pp. 79– 88,(2003)

[14] Du-Yih Tsi, Classification of Heart Diseases in Ultrasonic Images using Neural Networks Trained by Genetic Algorithm, In: Proceedings on International Conference on Signal Processing,pp. 1213–1216,(1998). [15] Y. Gao, J.Liu, A Modified Differential Evolution Algorithm and Its Application in the Training of BP Neural Network, In: Proceedings of the 2008 IEEE/ASME International Conference on Advanced Intelligent Mechatronics,Xi’an,China, pp. 1373–1377,2008. [16] Y. S. Lee, S. M. Shamsuddin and H. N. Hamed, Bounded PSO Vmax Function in Neural Network Learning, In: Eighth International Conference on Intelligent Systems Design and Applications ,IEEE,pp. 474– 479,2008. [17] B. Junyou, Stock Price forecasting using PSO-trained neural networks, In: IEEE Congress on Evolutionary Computation ,pp. 2879–2885,2007. [18] T. Si,Grammatical Differential Evolution Adaptable Particle Swarm Optimizer for Artificial Neural Network Training, International Journal of Electronics Communication and Computer Engineering(IJECCE),Vol.4 Issue 1, ISSN(Online):2249-071X, pp. 239-243 (2013) [19] HY Fan and J Lampinen, A Trigonometric Mutation Operation to Differential Evolution, International Journal of Global Optimization, 27,pp. 105–129(2003). [20] Y. Gao and J. Liu, A Modified Differential Evolution Algorithm and Its Application in the Training of BP Neural Network, In: Proceedings of the 2008 IEEE/ASME International Conference on Advanced Intelligent Mechatronics,Xi’an,China, pp. 1373–1377(2008). [21] A. Slowik and M. Bialko, Training of Artificial Neural Networks Using Differential Evolution Algorithm, Conference on Human System Interactions,pp. 60–65,2008. [22] A. Slowik, Application of an Adaptive Differential Evolution With Multiple Trial Vectors to Artificial Neural Network Training, IEEE Transactions On Industrial Electronics , Vol. 58, No. 8, pp.3160–3167,2011. [23] T.Si, S. Hazra and N.D. Jana, Artificial Neural Netwrok Training using Differential Evolutionary Algorithm, In:Suresh Chandra Satapathy, P. S. Avadhani, Ajith Abraham (Eds.):INDIA2012,pp. 769-778(2012) [24] E. Valian, S. Mahanna and S. Tavakoli, Improved Cuckoo Search for Feed Forward Neural Network, International Journal of Artificial Intelligence & Applications,Vol. 2,No. 3,pp. 36–43,2011. [25] P.Y. Kumbhar and S. Krishnan, Use of Artificial Bee Colony(ABC)Algorithm in Artificial Neural Network Synthesis, International Journal of Advanced Engineering Sciences and Technologies, Vol. No. 11,Issue No.1,pp. 162–171,2011. [26] Y. Zhang, L. WU and S. Wang, Bacteria Foraging Optimization Based Neural Network for Short-term Load Forecasting, Journal of Computational Information Systems,Vol.6,No.7,pp. 2099–2105,2010. [27] A. Nasuf, A. Bhaskar, A.J. Keane, Multi-Objective Optimization Using Grammatical Evolution, Anniversary Scientific Conference ”40 Years Department of Industrial Automation”, University of Chemical Technology and Metallurgy, Sofia, 18 March 2011,pp. 137–140. [28] J. Kennedy and R. C. Eberhart, Particle swarm optimization, In: IEEE International Conference on Neural Networks, Piscataway, NJ. pp. 19421948, 1995 [29] J.J. Liang, A.K. Qin, P.N. Suganthan and S. Baskar, Comprehensive Particle Swarm Optimizer for Global Optimization of Multimodal Functions, IEEE Transactions on Evolutionary Computation,Vol. 10,No. 3,pp. 281– 295(2006). [30] N.G Das, Statistical Methods (Combined Vol), Tata Mcgraw Hill Education Private Limited(2008) [31] M. O’Neill and A. Brabazo, Grammatical Differential Evolution, In: International Conference on Artificial Intelligence (ICAI’06) CSEA Press,Las Vegas, Nevada,pp. 231–236(2006)

1661