Hybrid Training of Feed-Forward Neural Networks

Hybrid Training of Feed-Forward Neural Networks with Particle Swarm Optimization Marcio Carvalho, Teresa B. Ludermir Center of Informatics Federal University of Pernambuco, P.O. Box 7851 Cidade Universitária, Recife - PE, Brazil, 50732-970 {mrc2,tbl}@cin.ufpe.br

Abstract. Training neural networks is a complex task of great importance in problems of supervised learning. The Particle Swarm Optimization (PSO) consists of a stochastic global search originated from the attempt to graphically simulate the social behavior of a flock of birds looking for resources. In this work we analyze the use of the PSO algorithm and two variants with a local search operator for neural network training and investigate the influence of the GL5 stop criteria in generalization control for swarm optimizers. For evaluating these algorithms we apply them to benchmark classification problems of the medical field. The results showed that the hybrid GCPSO with local search operator had the best results among the particle swarm optimizers in two of the three tested problems.

1 Introduction Artificial Neural Networks (ANNs) exhibit remarkable properties, such as: adaptability, capability of learning by examples, and ability to generalize [4]. When applied to pattern classification problems, ANNs through supervised learning techniques are considered a general method for constructing mappings between two data sets: the example vectors and the corresponding classes. As this mapping is constructed the ANN can classify unseen data as one of the classes of the training process. One of the most used ANN models is the well-known Multi-Layer Perceptron (MLP) [20]. The training process of MLPs for pattern classification problems consists of two tasks, the first one is the selection of an appropriate architecture for the problem, and the second is the adjustment of the connection weights of the network. For the latter is frequently used the Backpropagation (generalized delta rule) algorithm [3], a gradient descent method which originally showed good performance in some non-linearly separable problems, but has a very slow convergence and can get stuck in local minima, such as other gradient-based local methods [12][2]. In this work we focus only on the second task, the optimization of connection weights of MLPs through the use of Hybrid PSO methods. Global search techniques, with the ability to broaden the search space in the attempt to avoid local minima, has been used for connection weights adjustment or architecture optimization of MLPs, such as evolutionary algorithms (EA) [5], simulated annealing (SA) [21], tabu search (TS) [8], ant colony optimization (ACO) [14] and particle swarm optimization (PSO) [11]. For example: in [4], a genetic algorithm [9] is hybridized with

local search gradient methods for the process of MLP training (weight adjustment); in [1], ant colony optimization is used for the same purpose; in [18], tabu search is used for fixed topology neural networks training; in [19] simulated annealing and genetic algorithms were compared for the training of neural networks with fixed topology, with the GA performing better; in [16] simulated annealing and the backpropagation variant Rprop [15] are combined for MLP training with weight decay; in [22] simulated annealing and tabu search are hybridized to simultaneously optimize the weights and the number of active connections of MLP neural networks aiming classifiers with good classification and generalization performance; in [6], particle swarm optimization and some variants are applied to MLP training without generalization control. The motivation of this work is to apply the PSO algorithm, its guaranteed convergence variation (GCPSO) and the cooperative PSO (CPSO-Sk) to the process of weight optimization of MLPs. Additionally, we hybridize the first two techniques with the local gradient search algorithm Rprop, combine the cooperative form of the PSO with the guaranteed convergence variation, and employ the GL5 [13] stop criteria in all the tested algorithms in order to achieve networks with better generalization power. For evaluating all of these algorithms we used benchmark classification problems of the medical field (cancer, diabetes and heart) obtained from the repository Proben1 [13]. The remainder of the article is organized as follows. Section 2 presents the standard PSO and two variations: the Guaranteed Convergence PSO (GCPSO) and the Cooperative PSO algorithms. The experimental setup of this work are described in Section 3. Section 4 presents and analyzes the results obtained from the experiments, and finally, in Section 5 we summarize our conclusions and future works.

2 Particle Swarm Optimization The PSO optimization technique was introduced by Kennedy and Eberhart in [11] as a stochastic search through an n-dimensional problem space aiming the minimization (or maximization) of the objective function of the problem. The PSO was built through the attempt to graphically simulate the choreography of a flock of birds flying to resources. Later, looking for theoretical foundations, studies were realized concerning the way individuals in groups interact, exchanging information and reviewing personal concepts improving their adaptation to the environment [10]. In PSO, a swarm of solutions (particles) is kept. Let s be the swarm size, n be the dimension of the optimization problem and t the current instant, each particle 1 ≤ i ≤ s has a position xi (t) ∈