Harmony Search Based Supervised Training of Artificial ... - IEEE Xplore

2010 International Conference on Intelligent Systems, Modelling and Simulation

Harmony Search Based Supervised Training of Artificial Neural Networks Ali Kattan1, Rosni Abdullah2, Rosalina Abdul Salam3 School of Computer Science Universiti Sains Malaysia 1800 USM, Penang, Malaysia 1 [email protected], [email protected], [email protected] that differ from those of the standard gradient-descent BP in that they are not trajectory-driven, but population driven. These methods are expected to avoid local minima frequently by promoting exploration of the search space. A relatively young meta-heuristic SGO method is the Harmony Search (HS) [7]. This method draws its inspiration not from biological or physical processes but from the improvisation process of musicians. It was reported without details, that HS has been used to train a feed-forward ANN using the classical XOR problem where HS performed better than the standard gradientdescent BP method [8]. Many different evolutionary-based training techniques have also reported to be superior in comparison with the BP technique [9], [3], [4], [5]. However, most of these reported improvements were based on using the classical XOR problem. This problem, in addition of being a very small scale ANN with a dataset of only four patterns, is also a special case. It was proven that the XOR has no local minima [10]. In this work we will formally address the supervised training of feed-forward ANN with the HS algorithm making use of a well-known benchmarking problem that is considerably larger than that of the classical XOR. For this purpose the adaptation and modification of the HS algorithm are discussed in details.

Abstract— This paper presents a novel technique for the supervised training of feed-forward artificial neural networks (ANN) using the Harmony Search (HS) algorithm. HS is a stochastic meta-heuristic that is inspired from the improvisation process of musicians. Unlike Backpropagation, HS is non-trajectory driven. By modifying an existing improved version of HS & adopting a suitable ANN data representation, we propose a training technique where two of HS probabilistic parameters are determined dynamically based on the best-to-worst (BtW) harmony ratio in the current harmony memory instead of the improvisation count. This would be more suitable for ANN training since parameters and termination would depend on the quality of the attained solution. We have empirically tested and verified our technique by training an ANN with a benchmarking problem. In terms of overall training time and recognition, our results have revealed that our method is superior to both the original improved HS and standard Backpropagation. Keywords: neural network; harmony search; meta-heuristic; optimization

I. INTRODUCTION The process of training an artificial neural network (ANN) is generally concerned with adjusting the individual weights between each of the individual neurons. A training dataset is presented to the network’s input to determine the correct outputs. With the ANN architecture fixed, training is an iterative process that continues until we can achieve the close to the desired output by adjusting the network weights accordingly. The back-propagation (BP) learning has become the single most popular method to train feed forward artificial neural networks in many domains [1]. However, BP suffers from two shortcomings; the first is that this technique requires a differentiable neuron transfer function and the second is the high possibility of converging into local minima instead of a global one. The algorithm is a gradientdescent technique, which is analogous to an errorminimizing process. Neural networks generate complex error surfaces with multiple local minima and BP tend to become trapped in local solution that is not global [2]. Many stochastic global optimization (SGO) methods have been adopted for the training of ANN. Most of these methods draw their inspiration from biological processes like Genetic Algorithms (GA) [3], Ant Colony Optimization (ACO) [4], Improved Bacterial Chemo-taxis Optimization (IBCO) [5], and Particle Swarm Optimization (PSO) [6]. These training techniques overcome the aforementioned inefficiencies of BP. They have explorative search features 978-0-7695-3973-7/10 $26.00 © 2010 IEEE DOI 10.1109/ISMS.2010.31

II. THE HARMONY SEARCH ALGORITHM HS is a meta-heuristic SGO method similar in concept to other SGO methods in terms of combining the rules of randomness to imitate the process that inspired it. The method can handle discrete and continuous variables with similar ease [11]. HS was reported to be a competitive alternative to other SGO methods [12]. It has been used in many applications mostly in engineering and industry [13], [14]. However, the last couple of years have witnessed a noticeable increase in the application of HS in many IT related fields like robotics [12] web page clustering [15], and classification [16] to name few.

Figure 1. Music improvisation process for a harmony in a band of 7

105

The Harmony Memory (HM) is a matrix of the best solution vectors attained so far. The memory size (HMS) is set prior to running the algorithm. The number of components in each harmony vector N is analogous to the tone’s pitch, i.e. note, values played by N musical instruments. It represents the total number of decision variables. Each pitch value is drawn from a pre-specified range of values. The ranges’ lower and upper limits are specified by two vectors xL and xU both having the same length N. Each harmony vector is also associated with a harmony quality value (fitness) based on an objective function f(x). The modeling of HM is shown in Fig. 2. In order to improvise new harmony vector, each decision variable, i.e. pitch, is considered separately. HS uses certain parameters to reflect playing probabilistic choices. These are the Harmony Memory Considering Rate (HMCR) and the Pitch Adjustment Rate (PAR). The former determines the probability of playing a pitch from memory or playing a totally new random one. The second, PAR, determines the probability of whether the pitch that is played from memory is to be adjusted or not. Adjustment value for each decision variable is drawn from the respective component in the bandwidth vector B having the size N. The adjustment process should guarantee that the resultant pitch value is within the permissible range specified by xL and xU. The HS algorithm would require the initialization for a number of parameters. These are HMS, HMCR, PAR, B, and the maximum number of improvisations (MAXIMP) for which the algorithm terminates. We will refer to the HS algorithm proposed by Lee and Geem [11] as “classical”. The algorithm pseudo code is shown in Fig. 3.

Figure 2. The modeling of HM with N decision variables

HS concept is based on the improvisation process of musicians in a band. Improvisation occurs when each musician tests and plays a note on his instrument such that the resultant tones are considered by an aesthetic quality measure as in harmony with the rest of the band. Each instrument would have a permissible range of notes that can be played representing the pitch value range of that musical instrument. To improvise a new harmony, each musician would either play a totally new random note from the permissible range of notes, play an existing note from memory, or play a note from memory that is slightly modified. Only good improvised harmonies are kept and remembered by musicians till better ones are found and replace the worst ones. Each note played by a musician represents one component of the solution vector of all musician notes and as shown in Fig. 1. The perfect solution vector is found when each component value is optimal based on some objective function evaluated for this solution vector [17].

III. THE IMPROVED HARMONY SEARCH ALGORITHM Mahdavi [18] have proposed an improved version of the HS algorithm for better fine-tuning of the final solution in comparison with the original algorithm. Our proposed ANN training method is based on this improved version of HS

Figure 3. The classical HS algorithm

106

Figure 4. Mahdavi’s improved HS to set the PAR and B values dynamically

V. PROPOSED METHOD

In this improved version the parameter values of PAR and B are not set statically before starting the algorithm. Instead they change dynamically as the improvisation count, i.e. iteration, increases. Details are shown in Fig. 4.

Our proposed method considers the aforementioned improved HS algorithm suggested by Mahdavi [18]. Below we discuss the adaptation and modification of this algorithm to make it suitable as an ANN supervised training method.

IV. HS AND ANN TRAINING

A. Data Structure A harmony vector in HM represents HS decision variables. Since the concept of a harmony vector is similar to that of a member of population in GA, we have adopted the vector representation from the Genetic Adaptive Neural Network Training or GANNT originally introduced by Dorsey [20]. GANNT is still used in some relatively recent works [21], [22]. Fig. 5 illustrates such representation for a small-scale sample ANN. Each vector represents a complete set of ANN weights including biases. Neurons respective weights are listed in sequence assuming a fixed ANN architecture. The objective function, i.e. fitness measure, is to minimize SSE [23]. The squared difference between the target output and actual output determines the amount of error. This is represented by (t-z)2 for each pattern and each output unit as shown in Fig. 5. Calculating SSE would involve doing ANN forward-pass calculations to compare the resultant output with target output. Since ANN weight values are usually within the same range we can simplify the HS model by using fixed ranges for all decision variables instead for the vectors xL, xU and B. Thus we have the scalar range [xL, xU] and the scalar value B. The latter specifies the range of permissible weight changes given by the range [-B,B].

When ANN training is initiated, the iterative process of presenting the training patterns of the dataset to the network’s input continues until some termination condition is satisfied. This usually happens based on some measure indicating that the current achieved solution is presumably good enough to stop training. For instance one of the common termination criteria in BP is the difference between the current obtained sum of squared errors (SSE) value and the value obtained in the previous iteration [19]. If this difference is smaller than some small value then training terminates. HS total number of improvisations is bound by MAXIMP, which is set prior to starting the algorithm. Using such termination condition for ANN training wouldn’t consider the quality of the best solution attained so far. In addition, selecting a value for MAXIMP is a subjective issue. On the other hand using the termination criterion for BP discussed above wouldn’t help either since the outcomes for the current HS improvisation does not guarantee that the best harmony would be replaced. Thus if SSE difference is used it would be zero in this case causing the training process to terminate.

107

Figure 5. Harmony vector representation of ANN weights

The proposed method is shown in Fig. 6, which is analogous to that of Fig. 4 presented earlier. Both dynamic PAR and B values become a function of BtW rather than the number of improvisations. PAR ranges between [PARmin, PARmax] and m gives the slop the line past the value of BtWthr. BtWthr is a threshold value that controls the start of PAR and B dynamic change. Setting this value to 1.0 would make the algorithm behave just like the classical HS such that PAR is fixed at PARmin and B is fixed at Bmax. B ranges between [Bmin, Bmax] and CB is a constant in the range [-10,-5] (based on empirical results) controlling the steepness of change and as shown in Fig. 6. BtWscaled is the value of BtW past the BtWthr point scaled to be in the range [0,1].

B. Best-to-Worst Ratio We introduce the Best-to-Worst ratio (BtW), which is a value between zero and one. This value represents the ratio of the current best harmony fitness to the current worst harmony fitness in HM. Values close to one indicate that the average fitness of harmonies in the current HM is close to the current best. From another perspective, the BtW ratio would actually indicate the size of the area of the search space that is currently being investigated by the algorithm. Thus values close to zero would indicate a large search area while values close to one would indicate smaller areas. The BtW ratio would be used for three purposes: The dynamic setting of PAR value, the dynamic setting of B value, and determining the termination condition.

Figure 6. Proposed Method

108

The termination condition is based on BtWtermination value that is usually set to something close to unity. Training will terminate if BtW>= BtWtermination. MAXIMP could be added as an extra optional termination criterion to limit the total number of training iterations if intended. This is to be OR-ed with the BtW based termination criterion.

Our proposed method has achieved faster convergence in comparison with both the standard Backpropagation and the adapted HS version. Although the SSE achieved by Backpropagation is less in comparison, the out-of-sample test clearly shows that both HS versions are better than Backpropagation. This is a well-known issue of the latter known as “overtraining” [23]. Mahdavi’s [18] version of HS increases PAR and B values proportionally with iteration number. Our proposed method might increase or decrease these values based on the current best-to-worst ratio of the harmony memory. This technique not only guarantees better fine-tuning but can also be used as a good measure to signal training termination.

VI. EXPERIMENTAL RESULTS & DISCUSSION The CANCER dataset from the PROBEN1 benchmarking problem (available from ftp://ftp.ira.uka.de/pub/neuron/proben1.tar.gz) was used for experimentation. The dataset was split into two groups: 80% for training and the rest for the out-of-sample testing. Each group contains equal percentages of each class type. A summary of the used dataset is given in Table (1). A 3-layer feed-forward ANN with a 9-8-2 (input-hiddenoutput) architecture was designed. The total number of weights in this case would be 98 including biases. All neuron units use the Bipolar Sigmoid transfer function and the output is determined in the winner-take-all fashion [23]. For comparison, we trained our network with the standard BP method. Table (2) lists the results of best out of five training sessions. Weights were initialized using the Nguyen-Widrow method [19] and training was set to terminate once the SSE difference is less than 2.1E-2. Using Java 5, implementation was carried out on an AMD Athlon64 X2 3.0GHz computer with 2GB of RAM. The same computer was used to run two versions of the HS algorithm to train the same ANN independently. The first one being the adapted Mahdavi version and the second is our proposed method. Harmony vector length is 98 and the initial weight values (including the biases) are set to be in the range of [-250, 250] to guarantee a large search space. The initial weight values were saved in a file for common use by the two versions. Other parameters and the results of training are listed in Table (3).

VII. CONCLUSION The HS algorithm can be used to train feed-forward ANN with superiority in comparison with trajectory-driven methods like Backpropagation. By adapting an improved version of HS we have proposed a technique that is better suited for ANN training. Empirical results show that faster convergence can be achieved using our proposed technique of modifying PAR and B based on BtW rather than current iteration number. In addition, our proposed technique gave better out-of-sample recognition results. BtW can also be used as a good measure to signal ANN training termination. This is in contrast with the classical and improved versions of HS where the maximum number of improvisations is the only termination criterion. Future work should investigate the effect of HMS and the total number of decision variables, i.e. total ANN weights, on the performance of our proposed technique. Such a task is currently being undertaken by the authors using a larger ANN with a larger dataset.

TABLE (1): PROBEN1 CANCER DATASET Total

Input Units

Output Units

Class Distribution

Training

Testing

699

9

2

65.52% Benign 34.48% Malignant

559

140

TABLE (2): ANN FOR CANCER WITH BP TRAINING RESULTS ANN Architecture

Total Weights

Overall Training Time (hh:mm:ss)

Achieved SSE

Testing Set Recognition % Overall & out of each class

9-8-2

98

01:50:37

25.40

95.71% (98.91% Benign) (89.58% Malignant)

109

TABLE (3): HS TRAINING RESULTS FOR CANCER 9-8-2. Adapted Mahdavi Common Parameters

Proposed Method

HMS= 10, HMCR= 0.97, Bounds= [-250, 250], MAXIMP = 2500 PAR min= 0.1, PAR max= 0.99, Bmin= 2.5, Bmax= 25

Specific Parameters

CB= -5.0 BtWthreshold= 0.6 BtWtermination =0.98

MAXIMP

BtW> =BtWtermination OR MAXIMP

Total Iterations

2500

1112

Improvisations Accepted

213

179

01:34:00

00:52:59

Termination Criterion

Training Time (hh:mm:SS) Achieved SSE

120.0

108.0

Out-of-Sample Recognition %

98:91% (98.91% Benign) (97.92% Malignant)

99.29% (100.00% Benign) (97.92% Malignant)

[13] Z. W. Geem, "Harmony Search Applications in Industry," in Soft Computing Applications in Industry. vol. 226/2008: Springer Berlin / Heidelberg, 2008, pp. 117-134. [14] J.-H. Lee and Y.-S. Yoon, "Modified Harmony Search Algorithm and Neural Networks for Concrete Mix Proportion Design," Journal of Computing in Civil Engineering, vol. 23, pp. 57-61, 2009. [15] R. Forsati, M. Mahdavi, M. Kangavari, and B. Safarkhani, "Web page clustering using Harmony Search optimization," in Canadian Conference on Electrical and Computer Engineering (CCECE 2008) Ontario, Canada: IEEE Canada, 2008, pp. 001601 - 001604 [16] H. Moeinzadeh, E. Asgarian, M. Zanjani, A. Rezaee, and M. Seidi, "Combination of Harmony Search and Linear Discriminate Analysis to Improve Classification," in Third Asia International Conference on Modelling & Simulation (AMS '09) Bandung/bali, Indonesia, 2009, pp. 131-135. [17] Z. W. Geem, C.-L. Tseng, and Y. Park, "Harmony Search for Generalized Orienteering Problem: Best Touring in China," in Advances in Natural Computation. vol. 3612/2005: Springer Berlin / Heidelberg, 2005, pp. 741-750. [18] M. Mahdavi, M. Fesanghary, and E. Damangir, "An Improved Harmony Search Algorithm for Solving Optimization Problems," Applied Mathematics and Computation, vol. 188, pp. 1567-1579, 2007. [19] L. Fausett, Fundamentals of Neural Networks Architectures, Algorithms, and Applications. New Jersey: Prentice Hall, 1994. [20] R. E. Dorsey, J. D. Johnson, and W. J. Mayer, "A Genetic Algoirthm for the Training of Feedforward Neural Networks," Advances in A.pngicial Intelligence in Economics, Finance, and Management vol. 1, pp. 93-111, 1994. [21] K. E. Fish, J. D. Johnson, R. E. Dorsey, and J. G. Blodgett, "Using an Artificial Neural Network Trained with a Genetic Algorithm to Model Brand Share " Journal of Business Research, vol. 57, pp. 79-85, January 2004 2004. [22] R. S. Sexton, R. E. Dorsey, and N. A. Sikander, "Simultaneous Optimization of Neural Network Function and Architecture Algorithm," Decision Support Systems, vol. 30, pp. 11-22, December 2004 2004. [23] M. H. Hassoun, Fundamentals of Artificial Neural Networks. Massachusetts: MIT Press, Cambridge, 1995

REFERENCES: [1]

A. T. Chronopoulos and J. Sarangapani, "A distributed discretetime neural network architecture for pattern allocation and control," in Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’02), Florida, USA, 2002, pp. 204-211. [2] J. N. D. Gupta and R. S. Sexton, "Comparing backpropagation with a genetic algorithm for neural network training," Omega, The International Journal of Management Science, vol. 27, pp. 679-684, 1999. [3] D. Kim, H. Kim, and D. Chung, "A Modified Genetic Algorithm for Fast Training Neural Networks," in Advances in Neural Networks - ISNN 2005. vol. 3496/2005: Springer Berlin / Heidelberg, 2005, pp. 660-665. [4] G. Wei, "Study on Evolutionary Neural Network Based on Ant Colony Optimization," in International Conference on Computational Intelligence and Security Workshops Harbin, Heilongjiang, China, 2007, pp. 3-6. [5] Y. Zhang and L. Wu, "Weights Optimization of Neural Networks via Improved BCO Approach," Progress In Electromagnetics Research, vol. 83, pp. 185-198, 2008. [6] J. Yu, S. Wang, and L. Xi, "Evolving artificial neural networks using an improved PSO and DPSO," Neurocomputing, vol. 71, pp. 1054-1060, 2008. [7] Z. W. Geem, J. H. Kim, and G. V. Loganathan, "A New Heuristic Optimization Algorithm: Harmony Search," Simulation, vol. 72, pp. 60-68, 2001. [8] Z. W. Geem, C.-L. Tseng, J. Kim, and C. Bae, "Trenchless Water Pipe Condition Assessment Using Artificial Neural Network," in Pipelines 2007, Boston, Massachusetts, 2007, pp. 1-9. [9] E. Alba and J. F. Chicano, "Training Neural Networks with GA Hybrid Algorithms," in Genetic and Evolutionary Computation, GECCO 2004. vol. 3102/2004: Springer Berlin / Heidelberg, 2004, pp. 852-863. [10] L. G. C. Hamey, "XOR Has No Local Minima: A Case Study in Neural Network Error Surface Analysis" Neural Networks, vol. 11, pp. 669-681, 1998. [11] K. S. Lee and Z. W. Geem, "A New Meta-heuristic Algorithm for Continuous Engineering Optimization: Harmony Search Theory and Practice," Computer Methods in Applied Mechanics and Engineering, vol. 194, pp. 3902-3933, 2005. [12] P. Tangpattanakul and P. Artrit, "Minimum-time trajectory of robot manipulator using Harmony Search algorithm," in 6th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON 2009) vol. 01 Pattaya, Thailand: IEEE, 2009, pp. 354-357.

110