Artificial Neural Network Synthesis by means of Artificial Bee Colony ...

2 downloads 0 Views 1MB Size Report
the ANN to realize a learning stage of the input data rep- resenting the .... Onlooker bees: This kind of bees watch the dancing of the employed bee so ... max − xj min). (6). The pseudo-code of the ABC algorithm is next shown: 1: Initialize the ...
Artificial Neural Network Synthesis by means of Artificial Bee Colony (ABC) Algorithm Beatriz A. Garro

Humberto Sossa

Roberto A. V´azquez

Center for Computing Research National Polytechnic Institute CIC-IPN Mexico City, Mexico Email: [email protected]

Center for Computing Research National Polytechnic Institute CIC-IPN Mexico City, Mexico Email: [email protected]

Intelligent Systems Group Faculty of Engineering La Salle University Mexico City, Mexico Email: [email protected]

Abstract—Artificial bee colony (ABC) algorithm has been used in several optimization problems, including the optimization of synaptic weights from an Artificial Neural Network (ANN). However, this is not enough to generate a robust ANN. For that reason, some authors have proposed methodologies based on so-called metaheuristics that automatically allow designing an ANN, taking into account not only the optimization of the synaptic weights as well as the ANN’s architecture, and the transfer function of each neuron. However, those methodologies do not generate a reduced design (synthesis) of the ANN. In this paper, we present an ABC based methodology, that maximizes its accuracy and minimizes the number of connections of an ANN by evolving at the same time the synaptic weights, the ANN’s architecture and the transfer functions of each neuron. The methodology is tested with several pattern recognition problems.

I. I NTRODUCTION Artificial neural networks (ANN) are very important tools for solving different kind of problems such as pattern classification, forecasting and regression. However, their design imply a mechanism of error-testing that tests different architectures, transfer functions and the selection of a training algorithm that permits to adjust the synaptic weights of the ANN. This design is very important because the wrong selection of one of these characteristics could provoke that the training algorithm be trapped in a local minimum. Because of this, several metaheuristic based methods in order to obtain a good ANN design have been reported. In [1], Xin Yao presents a state-of-the-art where evolutionary algorithms are used to evolve the synaptic weights and the architecture, in some cases with the help of classic techniques like back-propagation algorithm. But there are some works like [2] where the authors evolve automatically the design of an ANN using basic PSO, second generation PSO (2GPSO) and a new technique (NMPSO). In [3], the authors design an ANN by means of DE algorithm and compare it with other bio-inspired techniques. In these last two works, the authors evolve, at the same time, the principal features of an ANN: the synaptic weights, the transfer functions for each neuron and the architecture. However, the architectures obtained by these two methods contain many connections. In [4] the authors train an ANN by means of ABC algorithm. In [5] the authors applied this algorithm to train

978-1-4244-7835-4/11/$26.00 ©2011 IEEE

a feed-forward for solving the XOR, 3-Bit Parity and 4Bit Encoder-Decoder problems. In the pattern classification area, other works like [6] ABC algorithm is compared with other evolutionary techniques, while in [7] an ANN is trained with medical pattern classification. Another problem solved by applying the ABC algorithm can be found in [8], where the authors test with clustering problems. In [9], authors train a RBF Neural Network using ABC algorithm. In this work four characteristics of this kind of ANN are optimized: the weights between the hidden layer and the output layer, the spread parameters of the hidden layer base function, the center vectors of the hidden layer and the bias parameters of the neurons of the output layer. In [10] an ANN is trained to estimate and model the daily reference evapotranspiration of two USA stations. There are other kinds of algorithms based on the bee behavior that have been applied to train an ANN. For example, in [11] bee algorithm is used to identify woods defects, while in [12], the same algorithm is applied to optimize the synaptic weights of an ANN. In [13] a good review concerning this kind a bio-inspired algorithms to provide solutions to different problems is given. It said that ABC algorithm is a good optimization technique. In this paper we want to verify if this algorithm performs in the automatic designing of an ANN, including not only the synaptic weights but also architecture and the transfer functions of the neurons. As we will see, the architectures obtained are optimal in the sense that the number of connections is minimal without loosing efficiency. The paper is organized as follows: in section 2 we briefly present the basics of ANNs. In section 3 we explain the fundamental concepts of ABC algorithm, while in section 4 we describe how ABC algorithm is used to design and ANN and how the ANN’s architecture can be optimized. In section 5 the experimental results using different classification problems are given. Finally, in section 6 we present the conclusions of the work. II. A RTIFICIAL N EURAL N ETWORKS An ANN tries to simulate the brain’s behavior when they generate information, save it or transform it. An ANN is a

331

system made up from simple processing units. This system offers the input-output mapping property and capability [14]. This type of processing unit performs in two stages: weighted summation and some type of non-linear function, this allows the ANN to realize a learning stage of the input data representing the problem to be solved. Each value of an input pattern a ∈ IRN is associated with its synaptic weight values W ∈ IRN , which is normally between 0 and 1. Furthermore, the summation function often takes an extra input value θ with weight value 1, representing a threshold or a bias for the neuron. The summation function will be then performed as Eq. 1. o=

N X

ai wi + θ

(1)

i=1

The sum of the products is then passed to the second stage to perform the activation function f (o) which generates the output of the neuron, and determines the behavior of the neural model. By connecting multiple neurons, the true computing power of the ANN emerges. The most common structure of connecting neurons is by layers. In a multilayer structure, the input nodes, which receive the pattern a ∈ IRN , pass the information to the units in the first hidden layer, the outputs from this first hidden layer are passed to the next layer, and so on until reaching the output layer, producing thus an approximation of the desired output y ∈ IRM . Basically, learning is a process by which the free parameters (i.e., synaptic weights W and bias levels θ) of an ANN are adapted through a continuous process of stimulation by the environment in which the network is embedded. The type of learning is determined by the manner in which the parameter changes take place. On the other hand, the learning process may be classified as: supervised or unsupervised. In this paper we focus on supervised learning that assumes the availability of a labeled set of training data made up of p input-output samples (see Eq. 2): Tξ =



aξ ∈ IRN , dξ ∈ IRM



∀ξ = 1, . . . , p

(2)

where a is the input pattern and d the desired response. Given the training sample Tξ , the requirement is to compute the free parameters of the neural network so that the actual output yξ of the ANN due to aξ is close enough to dξ for all ξ in a statistical sense. In this sense, we may use the mean-square error (MSE) given in Eq. 6 as the first objective function to be minimized: e=

p M 2 1 XX ξ di − yiξ p·M i=1

can converge to a local minimum instead of to the desired global minimum. Furthermore, the architecture and the transfer function used in their design can influence the ANN’s performance; consequently, the learning algorithm can be trapped in a minimum faraway from the best solution. III. A RTIFICIAL B EE C OLONY ALGORITHM Artificial Bee Colony (ABC) algorithm is based on the metaphor of the bees foraging behavior. The natural selection which created this beautiful system of communication can also be seen within the system. Information about different parts of the environment is like species in competition. The fitness of the species is given by the profitability of the food source it describes. Information survives by continuing to circulate within the nest, and is capable of reproducing itself by recruiting new foragers who become informed of the food source and come back to the nest and share their information [17]. ABC algorithm was proposed by Karaboga in 2005 [18] for solving numerical optimization problems. This algorithm is based on the model proposed by Tereshko and Loengarov [17]. It consists in a set of possible solutions xi (the population) that are represented by the position of the food sources. On the other hand, in order to find the best solution three classes of bees are used: employed bees, onlooker bees and scout bees. These bees have got different tasks in the colony, i. e., in the search space. Employed bees: Each bee searches for new neighbor food source near of their hive. After that, it compares the food source against the old one using Eq. 4. Then, it saves in their memory the best food source.   vij = xji + φji xji − xjk (4) where k ∈ {1, 2, ..., SN} and j ∈ {1, 2, ..., D} are randomly chosen indexes and k 6= i. φji is a random number between [−a, a]. After that, the bee evaluates the quality of each food source based on the amount of the nectar (the information) i.e. the fitness function is calculated. Finally, it returns to the dancing area in the hive, where the onlookers bees are. Onlooker bees: This kind of bees watch the dancing of the employed bee so as to know where the food source can be found, if the nectar is of high quality, as well as the size of the food source. The onlooker bee chooses probabilistically a food source depending on the amount of nectar shown by each employed bee, see Eq. 5.

(3)

pi =

ξ=1

One of the most used ANNs is the feed-forward neural network, trained by means of the back-propagation (BP) algorithm [15] and [16]. This algorithm minimizes the objective function described by Eq. 3. Some algorithms constantly adjust the values of the synaptic weights until the value of the error no longer decreases. However these classic algorithms

f iti SN P

(5)

f itn

n=1

where f iti is the fitness value of the solution i and SN is the number of food sources which are equal to the number of employed bees. Scout bees: This kind of bees helps the colony to randomly create new solutions when a food source cannot be improved

332

anymore, see Eq. 6. This phenomenon is called “limit” or “abandonment criteria”.   xji = xjmin + rand (0, 1) xjmax − xjmin (6) The pseudo-code of the ABC algorithm is next shown: 1: 2: 3: 4: 5: 6: 7:

8: 9:

10: 11: 12:

Initialize the population of solutions xi ∀i , i = 1, ..., SN Evaluate the population xi,G ∀i , i = 1, ..., N P for cycle = 1 to M CN do Produce new solutions vi for the employed bees by using Eq. 4 and evaluate them. Apply the greedy selection process. Calculate the probability values pi for the solutions xi by Eq. 5. Produce the new solutions vi for the onlookers from the solutions xi selected depending on pi and evaluate them. Apply the greedy selection process. Determine the abandoned solution for the scout, if exist, and replace it with a new randomly produced solution xi by Eq. 6. Memorize the best solution achieved so far. cycle = cycle + 1 end for

IV. M ETHODOLOGY The main aim of our methodology is to evolve, at the same time, the synaptic weights, the architecture (or topology), and the transfer functions of each neuron so as to obtain a minimum Mean Square Error (MSE) as well as a minimum classification error (CER). At the same time, we look to optimize the ANN’s architecture by reducing the number of neurons and their connections. The problem to be solved can Giving  be defined as follows: n p a set of input patterns X = x1 , ..., , x ∈ IR x and a set of desired patterns D = d1 , ..., dp , d ∈ IRm , find an ANN represented by W ∈ IRq×(q+2) such that a function defined by min (F (D, X, W )) is minimized. The codification of the ANN’s design to be evolved by ABC algorithm is given in Fig. 1. This figure shows the food source’s position representing the solution to the problem. This solution is defined by a matrix W ∈ IRq×(q+2) composed by three main parts: the topology (T), the synaptic weights (SW), and the transfer functions (F ) where q is the maximum number of neurons MNN; it is defined by q = 2 (m + n) (remember that n is the dimension of the input patterns vector and m is the dimension of the desired patterns vector). The three parts of the matrix W take values form three different ranges. For the  case of the topology, the range is between 1, 2MN N − 1 , for the case of the synaptic weights is between [−4, 4] and for the case of the transfer functions the range is [1, nF ], where nF is the number of transfer functions. The ANN’s topology is codified based on the binary square matrix representation of a graph x where each component xij

Fig. 1. Representation of the individual codifying the architecture, synaptic weights and transfer functions.

represents the connections between neuron i and neuron j when xij = 1. This information was codified into its decimal base. For example, suppose that next binary code ”01101” represents the connections of the ith neuron to five neurons. From this binary code, we can observe that only neurons two, three, and five are connected to neuron i. This binary code is transformed into its decimal base value resulting in ”13; this will be the number that we will evolved instead of the binary value. This scheme is much faster to manipulate. Instead of evolving a string of bits, we thus evolve a decimal base number. The synaptic weights of the ANN are codified again by square matrix representation of a graph x where each component xij represents the synaptic weight between neuron i and neuron j. Finally, the transfer function for each neuron is represented by an integer in the range of [0, 5] codifying one of the six transfer functions used in this research: logsig, tansig, sin, radbas, pureline, and hardlim. These functions were selected for they are the most popular and useful transfer functions in several kinds of problems. When the aptitude of an individual is computed by means of the MSE function (Eq. 7), all the values of matrix W are codified so as to obtain the desired ANN. Moreover, each solution must be tested in order to evaluate its performance. Due to the methodology is tested with several pattern classification problems, it is necessary to know the classification error (CER) function, this means to know how many patterns have been correctly classified and many were incorrectly classified. F1 =

p M 2 1 XX ξ di − yiξ p·M i=1

(7)

ξ=1

For the case of the CER fitness function, the output of the ANN is transformed by means of the winner-take-all technique; this codification is then compared with the set of the desired patterns. When the output of the ANN equals the corresponding desired pattern, this means that the pattern has been correctly classified, otherwise it was incorrectly classified. Based on the winner-take-all technique we can compute the CER, defined by Eq. 8:

333

F2 = 1 −

npwc tnp

(8)

Evolution of the error (Object recognition)

where npwc is the number of patterns well classified and tnp is the total number of patterns to be classified. When the MSE is used, the output yi of the ANN is computed as follows: 1: 2: 3: 4: 5: 6: 7: 8:

For the first n neurons, the output oi = ai . for nei with i = n to M N N do Get connections by using individual x1,i . for each neuron j < i connected to nei do oi = f (o) where f is a transfer function giving by individual xm,i and o is computing using Eq. 4. end for Finally, yi = oi , i = M N N − m, . . . , M N N . end for

Evolution of the error (Object recognition) 0.35

0.3

0.3

0.25

0.25

0.2

CER

MSE

0.35

0.15

0.2 0.15

0.1

0.1

0.05

0.05

0 0

500

1000

1500

2000

2500

3000

3500

4000

0 0

500

generations

1000

1500

2000

2500

3000

3500

4000

generations

(a)

(b)

Fig. 2. Evolution of the error for the ten experiments for the object recognition problem. (a) Evolution of F F 1 using MSE function. (b) Evolution of F F 2 using CER function.

V. R ESULTS Note that the restriction j < i is used to avoid the generation of cycles in the ANN. Until now, we have defined two fitness functions that help to maximize the ANN’s accuracy by minimizing their error (MSE or CER). Now, we have to propose a function that helps not only to get a maximum accuracy but also to minimize the number of connections of the ANN. The reduction of the architecture could be represented as follows: F3 =

NC NMaxC

(9)

where N C is the number of connections in the ANN designed by the proposed methodology and NMaxC is the maximum number of connections generated with M N N neurons. N M axC is given as: NMaxC =

MN XN

i

(10)

i=n

It is important to notice that if F 3 is used as a fitness function in the ABC algorithm. The proposed methodology will allow synthesizing the ANN but the accuracy will not be maximized. For that reason, we have to proposed a fitness function that integrates both objectives: the minimization of the error and synthesis of the ANN (the reduction of the number of connections). Two fitness functions are proposed to achieved this goal using the ABC algorithm. These fitness functions are composed by combining functions F 1, F 2 and F 3. First fitness function (FF1) is represented by Eq. 11, while second fitness function (FF2) is represented by Eq. 12. FF1 = F1 · F3

(11)

FF2 = F2 · F3

(12)

With these functions we will next see the we will be able to design ANNs with a high accuracy and a very low number of connections.

Several experiments were performed in order to evaluate the accuracy of the ANN designed by means of the proposal. The accuracy of the ANN was tested with four pattern classification problems. Three of them were taken from UCI machine learning benchmark repository [19]: iris plant, wine and breast cancer datasets. The other database was a real object recognition problem. The main features of each dataset are next given. For the case of the object recognition dataset the dimension of the input vector is 7 and the number of classes is 5 objects. For the iris plant dataset, the dimension of input vector is 4 and the number of classes is 3. For the wine dataset, the dimension of input vector is 13 and the number of classes is 3. Finally, for the breast cancer dataset, the dimension of input vector is 9 and the number of classes is 2. The parameters of the ABC algorithm were set to the same value for all the dataset problems: Colony size (N P = 50), number of food sources N P /2, limit = 100 and the maximum number of cycles was M CN = 4000. Six different transfer functions were used in all experiments: SN=sin function, LS=sigmoid function, HT= hyper-tangential function, GS=Gaussian function, LN=liner function and HL= hard limit function. 20 experiments using each dataset were performed. Ten for the case of fitness function F F 1, and ten for the case of fitness function F F 2. For each experiment, each dataset was randomly divided into two sets: a training set and a testing set; this with the aim to prove the robustness and the performance of the methodology. The same parameters were used through the whole experimentation. Depending of the problem, the ABC algorithm approaches the solution to the minimum error during the evolutionary learning process at different rates. For instance, for the case of the object recognition problem, we observed that by evolving F F 1 (the one using MSE), the error tends to diminish faster, and after a certain number of generations the error diminish slowly (Figure 2(a)). On the other hand, we also observed that, in some cases when F F 2 is evolved, the error reaches the minimum error in a few number of epochs; nonetheless the error tends to diminish slowly (Figure 2(b)).

334

Evolution of the error (Iris plant)

Evolution of the error (Iris plant)

0.18

8

8

0.35

0.16

7

1

6

2

CER

MSE

0.12

0.08

5

3

4

4

3

5

2

6

1

7

1500

2000

2500

3000

3500

0 0

4000

500

1000

generations

1500

2000

2500

3000

3500

4000

LS

4

4

GS

3

5

LS

2

6

1

7

(a)

1

2

3

4

6

7

8

9

10

5

3

4

5

6

7

8

9

10

5

CER

MSE 0.1

0.2

3.5

GS

GS

2.5

LS

LS

1.5

3

2

4

1

0 0

3

HT GS

4

0 0

1

2

3

4

0.05

4000

HT

0.5

0

0.1

2

LS

1

0.5

0.05

HL

2

1.5

0.15

GS

3

2.5

0.15

LS 2

3

1

4

3.5

3500

2

4.5

1

4

0.25

3000

1

(b)

4.5

2500

0

Evolution of the error (Wine) 0.3

2000

GS LS

Fig. 6. Two different ANNs designs for the object recognition problem. (a) ANN designed by the ABC algorithm without taking into account F 3 function. (b) ANN designed by the ABC algorithm taking into account F 3 function.

0.2

1500

LS LN

(a)

0.35

1000

GS LS

5

(b)

0.25

500

GS LN

0 0

Fig. 3. Evolution of the error for the ten experiments for the Iris plant problem. (a) Evolution of F F 1 using MSE function. (b) Evolution of F F 2 using CER function.

0 0

GS GS

0

generations

Evolution of the error (Wine)

3

LS

0.05

1000

5

SN

0.1

500

SN

GS

GS

0.15

0.04

0 0

2

LS

0.2

0.06

0.02

1

6

HT

0.25 0.1

7

LS

LN

0.3

0.14

5

6

7

8

9

10

0

1

2

3

4

(a) 500

1000

generations

1500

2000

2500

3000

3500

6

7

8

9

10

4000

generations

(a)

5

(b)

Fig. 7. Two different ANNs designs for the Iris plant problem. (a) ANN designed by the ABC algorithm without taking into account F 3 function. (b) ANN designed by the ABC algorithm taking into account F 3 function.

(b)

Fig. 4. Evolution of the error for the ten experiments for the Wine problem. (a) Evolution of F F 1 using MSE function. (b) Evolution of F F 2 using CER function.

For the case of the iris plant dataset, we observed that by evolving F F 1 (the one using MSE) or F F 2 (the one using CER), the error tends to diminish faster, and after a certain number of generations the error diminish slowly (Figure 3). For the case of the wine dataset, we observed that by evolving F F 1 or F F 2, the error tends to diminish slowly (Figure 4). Finally, for the case of the breast cancer dataset, we observed that by evolving F F 1 or F F 2, the error tends to diminish faster and after a certain number of generations the error diminish slowly (Figure 5). Figures 6, 7, 8 and 9 show two of the 20 different ANNs automatically generated with the proposed methodology, and

for each dataset. It is important to note that Figures 6(a), 7(a), 8(a) and 9(a) were automatically designed by the ABC algorithm but the fitness functions F F 1 and F F 2 did not include the synthesis of the ANN, in other words, these fitness functions do not used F 3 function. On the contrary, Figures 6(b), 7(b), 8(b) and 9(b) were automatically designed by the ABC algorithm but the fitness functions F F 1 and F F 2 include the synthesis of the ANN using F 3 function. Furthermore, in some cases the dimensionality of the input pattern is reduced because some features do not contribute to the ANN’s output. 14

14

1

1

2

12

2

12

3

3 HL

4

10

4

10

LS

Evolution of the error (Breast cancer)

5

Evolution of the error (Breast cancer)

0.18

6

8

LS

8

GS

6

HT

LS

7

LN

LS

8

LS

LS

9

LS

HT

0.16

7

0.1

LS LN

0.14 8

6

LS

0.08

0.12

6

GS 9

0.1

CER

MSE

5 HT

0.12

0.08

GS

0.06

10

4

10

4

HT

0.06

0.04

11

0.04 0.02 0 0

11

12

2

0.02

12

2

13

500

1000

1500

2000

2500

generations

(a)

3000

3500

4000

0 0

500

1000

1500

2000

2500

3000

3500

4000

13

0

generations

0 0

(b)

1

2

3

4

5

(a)

Fig. 5. Evolution of the error for the ten experiments for the Breast cancer problem. (a) Evolution of F F 1 using MSE function. (b) Evolution of F F 2 using CER function.

6

7

8

9

10

0

1

2

3

4

5

6

7

8

9

10

(b)

Fig. 8. Two different ANNs designs for the Wine problem. (a) ANN designed by the ABC algorithm without taking into account F 3 function. (b) ANN designed by the ABC algorithm taking into account F 3 function.

335

10

1

9

1

8

2

8

2

7

3

HT

7

3

6

4

LN

6

4

Accuracy of the NN (Object recognition)

LS 5

5

GS

HT

GS

HT

LS

5

5

LS 4

6

GS

4

6

3

7

HT

3

7

2

8

2

8

1

9

1

9

1 0.9

0.8

0.8

0.7 0.6 Training Testing

0.5 0.4 0.3 0.2

0.7 0.6 Training Testing

0.5 0.4 0.3 0.2

0.1 0

0

Accuracy of the NN (Object recognition)

1 0.9

% of recognition

9

% of recognition

10

0.1 1

2

3

4

5

6

7

8

9

0

10

1

2

3

# of experiment

4

5

6

7

8

9

10

# of experiment

0 1

2

3

4

5

6

7

8

9

10

0

1

(a)

2

3

4

5

6

7

8

9

(a)

10

(b)

(b)

Fig. 9. Two different ANNs designs for the Breast cancer problem. (a) ANN designed by the ABC algorithm without taking into account F 3 function. (b) ANN designed by the ABC algorithm taking into account F 3 function.

Fig. 10. Percentage of recognition for the object recognition problem and the ten experiments during the training and testing stage for each fitness function. (a) Percentage of recognition minimizing the MSE. (b) Percentage of recognition minimizing the CER.

Accuracy of the NN (Iris plant)

% of recognition

Table I shows the average connection number achieved with the proposed fitness functions (F F 1 and F F 2). In addition, we also present the average connection number achieved when F 3 is not taken into account by the proposed fitness functions. As the reader can appreciate, the number of connections decreases when function F 3 is used.

Accuracy of the NN (Iris plant)

1

1

0.9

0.9

0.8

0.8

0.7 0.6 Training Testing

0.5 0.4 0.3

% of recognition

0

0.2

0

AVERAGE CONNECTION NUMBER . Dataset

Object rec. Iris plant Wine Breast cancer

Objective func. without F 3 FF1 FF2 74 66.3 20.9 19 104.8 94.6 48 41.9

0.6 Training Testing

0.5 0.4 0.3 0.2

0.1

TABLE I

0.7

0.1 1

2

3

4

5

6

7

8

9

0

10

1

2

3

# of experiment

Objective func. using F 3 FF1 FF2 58.4 65 15.6 12.7 86.3 89.9 30.7 31.6

4

5

6

7

8

9

10

# of experiment

(a)

(b)

Fig. 11. Percentage of recognition for the Iris problem and the ten experiments during the training and testing stage for each fitness function. (a) Percentage of recognition minimizing the F F 1 function. (b) Percentage of recognition minimizing the F F 2 function.

Accuracy of the NN (Wine)

1

1

0.9

0.9

0.8

0.8

0.7 0.6 Training Testing

0.5 0.4 0.3 0.2

0.7 0.6 Training Testing

0.5 0.4 0.3 0.2

0.1 0

% of recognition

% of recognition

Accuracy of the NN (Wine)

Once generated the ANN for each problem, we proceeded to test their accuracy. The next figures show the performance of the methodology with the two fitness functions. Figures 10, 11, 12 and 13 present the percentage of classification for all the experiments (executions) during the training and testing phase, whereas for the case of the object recognition, wine and breast cancer datasets the best percentage of recognition for the two phases was achieved using the F F 1 fitness function, the best percentage of recognition for the iris plant dataset was achieved using F F 2 fitness function. Table II, presents the average percentage of recognition for all the experiments using fitness function F F 1 and the fitness function F F 2. In this Table, we can observe that the best percentage of recognition for all the databases was achieved only during training phase. The accuracy slightly diminished during testing phase. However, the results obtained with the proposed methodology were highly acceptable and stable. This tendency can be corroborated in Table III that shows the standard deviation of all experimental results obtained with each dataset. Tables IV and V show the maximum and minimum percentage of classification achieved in all experiments during training and testing phase using the two fitness functions. In Table IV there are many one’s that represent the maximum

0.1 1

2

3

4

5

6

7

8

9

10

0

# of experiment

(a)

1

2

3

4

5

6

7

8

9

10

# of experiment

(b)

Fig. 12. Percentage of recognition for the Wine problem and the ten experiments during the training and testing stage for each fitness function. (a) Percentage of recognition minimizing the F F 1 function. (b) Percentage of recognition minimizing the F F 2 function.

336

TABLE II AVERAGE PERCENTAGE OF RECOGNITION . Dataset

Object rec. Iris plant Wine Breast cancer

Objective func. FF1 Training Testing 0.984 0.946 0.9667 0.9253 0.9337 0.8629 0.973 0.9655

Objective func. FF2 Training Testing 0.938 0.864 0.9693 0.9387 0.8764 0.7944 0.9739 0.9561

Accuracy of the NN (Breast cancer) 1

0.9

0.9

0.8

0.8

0.7 0.6 Training Testing

0.5 0.4 0.3 0.2

Moreover, the integration of the synthesis into the fitness function provokes that the ABC algorithm generates ANNs with a small number of connections and high performance. The design of the ANN’s consists on providing a good architecture with the best set of transfer functions and synaptic weights. The experimentation shows that all the designs generated by the proposal present an acceptable percentage of recognition for both training and testing phases with the two fitness functions.

0.7 0.6 Training Testing

0.5 0.4 0.3 0.2

0.1 0

% of recognition

% of recognition

Accuracy of the NN (Breast cancer) 1

0.1 1

2

3

4

5

6

7

8

9

10

0

# of experiment

(a)

1

2

3

4

5

6

7

8

9

10

# of experiment

(b)

Fig. 13. Percentage of recognition for the Breast cancer problem and the ten experiments during the training and testing stage for each fitness function. (a) Percentage of recognition minimizing the F F 1 function. (b) Percentage of recognition minimizing the F F 2 function. TABLE III S TANDARD DEVIATION OF RECOGNITION . Dataset

Object rec. Iris plant Wine Breast cancer

Objective func. FF1 Training Testing 0.0386 0.0962 0.0237 0.0378 0.0287 0.0575 0.0063 0.0102

Objective func. FF2 Training Testing 0.0371 0.0842 0.0189 0.0373 0.0304 0.1164 0.0111 0.0134

percentage (100%) of recognition that can be achieved by the designed ANN. This is important because at least, we found one configuration that solves a specific problem without misclassified patterns or with a low percentage of error. In Table V the worst values achieved with the ANN are represented. Particularly, the dataset that provides the worst results was the wine problem. Nonetheless, the accuracy achieved was higly acceptables. TABLE IV T HE BEST PERCENTAGE OF RECOGNITION . Dataset

Object rec. Iris plant Wine Breast cancer

Objective func. FF1 Training Testing 1 1 1 0.9733 0.9775 0.9551 0.9824 0.9766

Objective func. FF2 Training Testing 1 0.96 1 0.9733 0.9213 0.9213 0.9853 0.9766

VI. C ONCLUSIONS The design of an ANN is achieved using the proposed methodology. The synaptic weights, the architecture and the transfer function of an ANN are evolved by means of ABC algorithm. Furthermore, the connections among the neurons that belong to the ANN are synthesized. This allows generating a a reduce design of the an ANN with a high performance. In this work we tested the performance of the ABC algorithm. We have also proved that this novel technique is a good optimization algorithm, because it does not easily traps in local minimums. In the case of the proposed methodology we have demonstrated its robustness; the random choice of the patterns for each experiment allowed us to get, statistically speaking, good significant results. The experiments were tested with two different fitness functions: F F 1 and F F 2, based on the MSE and CER, respectively. Additionally, these fitness functions involve the synthesis of the architecture. Through these experiments, we observed that both functions achieved a highly acceptable performance. Moreover we demonstrated that these fitness functions can considerably reduce the number of connections of an ANN with a minimum of error of MSE and CER functions. On the other hand, in some of the ANN designs generated by the proposed methodology, some neurons belonging to the input layer are not used; they do not present any connections with other neurons. In this particularly case, we can say that a reduction of the dimensionality of the input pattern is also obtained. In general, the results were satisfactory. The proposed methodology allows searching the best values that permit automatically constructing an ANN that generates a good solution for a classification problem.

TABLE V T HE WORST PERCENTAGE OF RECOGNITION . Dataset

Object rec. Iris plant Wine Breast cancer

Objective func. FF1 Training Testing 0.88 0.72 0.92 0.8533 0.8989 0.7865 0.9648 0.9444

ACKNOWLEDGMENT

Objective func. FF2 Training Testing 0.9 0.7 0.9333 0.84 0.8315 0.5169 0.9501 0.9386

From these experiments, we observed that the ABC algorithm was able to find the best configuration for an ANN given a specific set of patterns that define a classification problem.

B. Garro thanks CONACYT for the scholarship provided during her PhD studies. H. Sossa thanks SIP-IPN under grant number 20111016, COTEPABE-IPN, DAAD-PROALMEX under grant J000.426/2009 and the European Union and CONACYT under grant FONCICYT 93829, for the economical support. The content of this paper is an exclusive responsibility of the CIC-IPN and it cannot be considered that it reflects the position of the European Union. Authors thank the anonymous reviewers for their comments to improve the paper.

337

R EFERENCES [1] X. Yao, “Evolving artificial neural networks,” PIEEE: Proceedings of the IEEE, vol. 87, 1999. [2] B. A. Garro, H. Sossa, and R. A. Vazquez, “Design of artificial neural networks using a modified particle swarm optimization algorithm,” in Proceedings of the 2009 international joint conference on Neural Networks, ser. IJCNN’09. Piscataway, NJ, USA: IEEE Press, 2009, pp. 2363–2370. [3] ——, “Design of artificial neural networks using differential evolution algorithm,” in Proceedings of the 17th international conference on Neural information processing: models and applications - Volume Part II, ser. ICONIP’10. Berlin, Heidelberg: Springer-Verlag, 2010, pp. 201–208. [4] D. Karaboga and B. Akay, “Artificial Bee Colony (ABC) Algorithm on Training Artificial Neural Networks,” in Signal Processing and Communications Applications, 2007. SIU 2007. IEEE 15th, 2007, pp. 1–4. [5] D. Karaboga, B. Akay, and C. Ozturk, “Artificial bee colony (abc) optimization algorithm for training feed-forward neural networks,” in Proceedings of the 4th international conference on Modeling Decisions for Artificial Intelligence, ser. MDAI ’07. Berlin, Heidelberg: SpringerVerlag, 2007, pp. 318–329. [6] D. Karaboga and C. Ozturk, “Neural networks training by artificial bee colony algorithm on pattern classification,” Neural Network World, vol. 19, no. 10, pp. 279 –292, 2009. [7] D. Karaboga, C. Ozturk, and B. Akay, “Training neural networks with abc optimization algorithm on medical pattern classification,” in International conference on multivariate statistical modelling and high dimensional data mining, 2008. [8] C. Ozturk and D. Karaboga, “Classification by neural networks and clustering with artificial bee colony (abc) algorithm,” in International symposium on intelligent and manufacturing systems features, strategies and innovation, 2008. [9] T. Kurban and E. Bes¸dok, “A comparison of RBF neural network training algorithms for inertial sensor based terrain classification,” Sensors, vol. 9, pp. 6312–6329, 2009. [10] C. Ozkan, O. Kisi, and B. Akay, “Neural networks with artificial bee colony algorithm for modeling daily reference evapotranspiration,” Irrigation Science, pp. 1–11, 2010. [Online]. Available: http://dx.doi.org/10.1007/s00271-010-0254-0 [11] D. Pham, A. Soroka, A. Ghanbarzadeh, E. Koc, S. Otri, and M. Packianather, “Optimising neural networks for identification of wood defects using the bees algorithm,” in Industrial Informatics, 2006 IEEE International Conference on, 2006, pp. 1346 –1351. [12] D. Pham and A. O. S. Koc, E.and Ghanbarzadeh, “Optimisation of the weights of multi-layered perceptrons using the bees algorithm,” in In Proceedings of 5th international symposium on intelligent manufacturing systems, 2006. [13] D. Karaboga and B. Akay, “A survey: algorithms simulating bee swarm intelligence,” Artificial Intelligence Review, vol. 31, no. 1, pp. 61–85, Jun. 2009. [14] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning internal representations by error propagation. Cambridge, MA, USA: MIT Press, 1986, ch. 8, pp. 318–362. [15] J. A. Anderson, An Introduction To Neural Networks. The MIT Press, 1995. [16] P. Werbos, “Backpropagation through time: what it does and how to do it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550 –1560, Oct. 1990. [17] V. Tereshko and A. Loengarov, “Collective decision making in honeybee foraging dynamics,” Computing and Information System Journal, vol. 9, no. 3, pp. 1 –7, 2005. [18] D. Karaboga, “An idea based on honey bee swarm for numerical optimization,” Computer Engineering Department, Engineering Faculty, Erciyes University., Tech. Rep., 2005. [19] P. M. Murphy and D. W. Aha, “UCI Repository of machine learning databases,” University of California, Department of Information and Computer Science, Irvine, CA, US., Tech. Rep., 1994.

338