nonlinear system modelling via optimal design of neural ... - CiteSeerX

International Journal of Neural Systems, Vol. 14, No. 2 (2004) 1–13 c World Scientific Publishing Company

NONLINEAR SYSTEM MODELLING VIA OPTIMAL DESIGN OF NEURAL TREES YUEHUI CHEN∗ , BO YANG† and JIWEN DONG‡ School of Information science and Engineering, Jinan University, Jiwei Road 106 Jinan, 250022, P. R. China ∗ [email protected] † [email protected] ‡ [email protected] Received 15 June 2003 Revised 19 December 2003 This paper introduces a flexible neural tree model. The model is computed as a flexible multi-layer feedforward neural network. A hybrid learning/evolutionary approach to automatically optimize the neural tree model is also proposed. The approach includes a modified probabilistic incremental program evolution algorithm (MPIPE) to evolve and determine a optimal structure of the neural tree and a parameter learning algorithm to optimize the free parameters embedded in the neural tree. The performance and effectiveness of the proposed method are evaluated using function approximation, time series prediction and system identification problems and compared with the related methods. Keywords: Artificial neural networks; neural free; modified probilistic incremental program evolution algorithm (MPIPE); nonlinear systems modelling.

a given problem, and therefore it is possible to introduce different ways to define the structure corresponding to the problem. Depending on the problem, it may be appropriate to have more than one hidden layer, feedforward or feedback connections, different activation functions for different units, or in some cases, direct connections between input and output layer. In the past decades, there has been an increasing interest to optimize ANN architecture and parameters simultaneously. There have been a number of attempts in designing ANN architecture automatically. The early methods of architecture learning include constructive and pruning algorithms.7 – 9 The main problem of these methods is that the topological subsets rather than the complete class of ANN’s architecture are searched in the search space by structural hill climbing methods.10 Recently, a tendency for optimizing architecture and weights of ANNs by evolutionary algorithm has become an active research area. Xin Yao

1. Introduction Artificial neural networks (ANNs) have been successfully applied to a number of scientific and engineering fields in recent years, such as function approximation, system identification and control, signal and image processing, time series prediction and so on. The major problems in designing of ANN for a given problem are how to design a satisfactory ANN architecture and which kind of learning algorithms can be effectively used for the training of ANN. Weights and biases of ANNs can be learned by many methods, i.e. back-propagation algorithm,1 genetic algorithm,2,3 evolutionary programming,4– 6 random search algorithm19 and so on. Usually, a neural network’s performance is highly dependent on its structure. The interaction allowed between the various nodes of the network is specified using the structure only. There may exist different ANN structure with different performance for

1

2

Y. Chen, B. Yang and J. Dong

et al.,11,29,30 proposed a new evolutionary system called EPNet for evolving architecture and weights of ANNs simultaneously. EPNet is a kind of hybrid technique. Here architectures are modified by mutation operators that add or delete nodes/connections. Weights are trained by a modified back-propagation algorithm with an additive learning rate and by a simulated annealing algorithm. A more recent attempt for structure optimization of ANNs was the neuroevolution of augmenting topologies (NEAT),32 which aims at the evolution of topologies and allows the evolution procedure to evolve an adaptive neural network with plastic synapses by designating which connections should be adaptive and in what ways. Byoung-Tak Zhang et al. proposed a method called evolutionary induction of the sparse neural trees.12 Based on the representation of the neural tree, the architecture and the weights of higher order sigma-pi neural networks are evolved by using genetic programming and breeder genetic algorithm, respectively. In this paper, a flexible neural tree model is proposed. Based on pre-defined instruction/operator sets, the flexible neural tree can be created and evolved, in which over-layer connections, different activation functions for different nodes/neurons are allowed. Therefore, the flexible neural tree model can be viewed as a kind of irregular multi-layer flexible neural network. The proposed approach is a combination of the modified PIPE algorithm13,14,21 – 23,49 to find the best structure of the neural tree for a given problem and a simple random search algorithm to optimize the parameters embedded in the neural tree. The paper is organized as follows: Sec. 2 gives the representation and calculation of neural tree model. A hybrid learning algorithm for evolving the neural tree models is given in Sec. 3. Section 4 presents some simulation results for function approximation, time series prediction and system identification problems in order to show the effectiveness and feasibility of the proposed method. Finally in Sec. 5 some concluding remarks are presented. 2. Representation and Calculation of Neural Tree The commonly used representation/encoding methods for an ANN are direct encoding scheme and indirect encoding scheme. The former uses a

fixed structure (connection matrix or bit-strings) to specify the architecture of a corresponding ANN.15,16 The latter uses rewrite rules, i.e. cellular encoding17 and graph generation grammars18 to specify a set of construction rules that are recursively applied to yield an ANN. In this paper an alternative approach is proposed, in which a flexible neural network is directly constructed by using neural tree representation. There is no additional effort needed to encode and decode between the genotype and the phenotype of a neural network. This is due to the fact that a neural tree can be directly calculated as a flexible neural network. Therefore, the direct neural tree representation reduces some computational expenses when calculating the object function. It suffers a possible problem in terms of search space, the search space may becomes larger if there is no additional technique to limit the architecture of neural tree. In Ref. 31, based on a simple computation, the size of the sigma-pi neural tree structure space is 231 ∗ 5125 , if 2 function operators F = {Σ, Π}, 5 terminals T = {x1 , x2 , x3 , x4 , x5 }, and maximum width 5 and maximum depth 3 are used. It is clear that any exhaustive search method for finding an optimal solution is impossible. To cope with the huge search space problem and make the use of expert’s knowledge, we propose a method in which the instructions of root node, hidden nodes and input nodes are selected from three instruction sets. The instruction sets are as follows, I0 = {+3 , +4 , . . . , +N0 }

(1)

I1 = {+2 , +3 , . . . , +N1 , x0 , x1 , . . . , xn }

(2)

I2 = {x0 , x1 , . . . , xn } .

(3)

Based on the above instruction sets, an example of the flexible neural tree is shown in Fig. 1, which contains input, hidden and output layers. In this case, the user can freely choose the instruction sets according to a given problem, i.e. selecting a bigger N0 and a smaller N1 should create a neural tree with bigger width in the root node (bigger number of hidden neurons) and smaller width in hidden nodes. Note that the instructions of output-layer (root node), hidden-layer, and input-layer are selected from instruction sets I0 , I1 and I2 with the specific probabilities, respectively. Note also that the

Nonlinear System Modelling via Optimal Design of Neural Trees

Output layer

Second hidden layer

outp = f (a, b; netp ) = exp −

First hidden layer

Input layer

where yi (i = 1, 2, . . . , p) is the output of ith child of the node +p . Then, the output of node +p is calculated by

+6

x0

x0

x1

+3

x1

+2

3

x2 +3

x2

x0 x1 +2

x2

x2

x1

+3

x0 x1 x2

Fig. 1. A typical representation of neural tree with three instruction sets and three input variables x0 , x1 and x2 .

instruction set I1 contains nonterminal and terminal instructions. This is because we consider that it is useful to create a neural tree with over-layer connections. Where nonterminal instruction +N has N arguments and the terminal instruction has no arguments. In the creating of a neural tree, if a nonterminal instruction, i.e. +p (p = 2, 3, 4, . . .) is selected, p weights are created randomly and used for representing the connection strength between the node +p and its children. In addition, two parameters ap and bp are randomly created as flexible activation function parameters and attached to node +p . A candidate flexible activation function used in our experiments is shown in the following equation: 2 ! x−a (4) f (a, b; x) = exp − b Let Id,w denote the instruction of a node with depth d and width w. The probabilities of selecting instructions for output layer, hidden layer and input layer in the instruction sets I0 , I1 and I2 are initialized as follows: 1 P (Id,w ) = , ∀Id,w ∈ Ii , i = 0, 1, 2 (5) li where li is the number of instructions in the instruction set Ii . For any nonterminal node, i.e. +p , its input is calculated by p X wi ∗ y i (6) netp = i=1

netp − ap bp

2 !

(7)

The overall output of flexible neural tree can be computed from left to right by depth-first method recursively. Note that there is no flexible activation function used for the root node, it returns a weighted sum of a number of nonlinear terms only.

3. The Proposed Hybrid Algorithm 3.1. PIPE and MPIPE Tree-structure based evolutionary algorithms including Genetic Programming (GP), Probabilistic Incremental Program Evolution (PIPE) and the recent variants of GP, i.e. Gene Expression Evolution (GEP)33 and Multi Expression Evolution (MEP) have been an active research area in recent years. It is well known that an expression corresponding to f1 (x) = x + x2 + x3 is usually evolved much fast than one corresponding to f2 (x) = 2.71x + 3.14x2 by GP for function regression due to the fact that it is difficult for GP to find the proper parameters in the polynomial. But if real parameters are embedded into the tree, GP can be made fast for finding the above polynomial. This also formulates a new research direction to improve the efficiency of GP and to develop new tree-structure based computational models.22 PIPE is a recent discrete method for automated program synthesis, which contains probability vector coding of program instructions, population-based incremental learning and tree-coded programs like those used in variants of GP.13 The main principle of PIPE algorithm is that it increases the probability of the best program being found by using the adaptive tuning of the probability distribution for choosing the proper instructions. Like GP, PIPE algorithm cannot be used for the discovering of structure and parameters of a nonlinear system simultaneously, and the evolved solution is usually redundant. To cope with this, the neural tree is constrained by introducing specific instruction sets, and a specified data structure (real values) is embedded into the nodes of the neural tree

4


in our modified version of PIPE. The main principle of PIPE evolution strategy is maintained but some modifications have been made in order to evolve the flexible neural trees. These modifications can be described as follows: • Different initialization method (parameters and initial probability of selecting instructions); • Different creation methods of the neural tree; • Different calculation method of the neural tree. • Different initial mutation probability. 3.2. Structure optimization

of required iterations. We use clr = 0.1, which turned out to be a good compromise between precision and speed.13 (3) Define the mutation probability as P Mp =

PM p (l0 + l1 + l2 ) · |PROGb |

(11)

where the user defined parameter PM defines the overall mutation probability and |PROGb | denotes the number of nodes in the program. All the probabilities P (Id,w ) are mutated with probability PMP according to

The main learning procedure of MPIPE is taken from the PIPE,13 which can be summarized as follows:

P (Id,w ) = P (Id,w ) + mr · (1 − P (Id,w ))

(1) Initially a population of neural trees (including the embedded parameters) is randomly generated according to predefined probabilities of selecting instructions (Eq. 5). Note that the initial individuals are created by using a probabilistic prototype tree (P P T ). The P P T is generally a complete n−ary tree. At each node Nd,w , it contains a variable probability vector P d,w . Each component Pd,w (I) denotes the probability for choosing instruction I at node Nd,w . (2) Let PROGb and PROGel be the best program of current generation (best program) and the one found so far (elitist program), respectively. Define the probability and the target probability of the best program as Y P (Id,w ) (8) P (PROGb ) =

where mr is the mutation rate defined by user. (4) Prune the probability prototype tree. If the P P T subtree attached to nodes that contain at least one probability vector component above a predefined prune threshold TP , then the subtree is pruned. (5) Repeat this process according to new probabilities of selecting instructions in the instruction sets until the best structure is found or maximum number of MPIPE iterations is reached.

Id,w :used to generate PROG b

and PTARGET = P (PROGb ) +(1 − P (PROGb )) · lr ·

ε + F IT (PROG el ) ε + F IT (PROGb )

(9)

where F IT (PROGb ) and F IT (PROGel ) denote the fitness of the best and elitist program. The lr is the learning rate and ε is the fitness constant (see Table 1). In order to increase the probability P (PROGb ), repeat the following process until P (PROGb ) ≥ PTARGET : P (Id,w ) = P (Id,w )+clr ·lr·(1−P (Id,w )) lr

(10)

where c is a constant influencing the number of iterations. The smaller clr the higher the approximation precision of PTARGET and the number

(12)

In order to learn structure and parameters of a neural tree model simultaneously, a balance between structure optimization and parameter learning should be taken. In fact, if a structure of the evolved model is not appropriate, it is not useful to spend much time on parameter learning. On the contrary, if the best structure has been found, further structure optimization under the command of decreasing fitness value should destroy the best structure. In this paper, a technique for balancing structure optimization and parameter learning is proposed. If a better structure is found then do random search for a number of steps: maximum allowed steps or stop at the case that no better parameter vector is found for a significantly long time (say 100 to 2000 in our experiments). Where a criterion of the better structure is distinguished as follows: if the fitness value of the best program is smaller than the fitness value of the elitist program, or the fitness values of two programs are equal but the nodes of the former is lower than the later, then we say that a better structure is found.


3.3. Parameter optimization For learning of parameters (weights and activation parameters) of a neural tree model, there are a number of learning algorithms, such as GA, EP, gradient descent based learning method and so on, that can be used for tuning of the parameters. A random search algorithm19 is selected only because its simplicity and effectiveness in the aspects of local and global search. Given a parameter vector θ(k) = [λ1 (k), λ2 (k), . . . , λn (k)], where k is random search step. Let x(k) = [x1 (k), x2 (k), . . . , xn (k)] denote the small random disturbance vector which is generated according to a probability density function. The random search algorithm used can be summarized as follows: 1. Choose an initial value of the parameter vector to be optimized randomly, θ(0), calculate the objective function, F (θ(0)), and set k = 0. 2. Generate random search vector x(k). — calculate F (θ(k) + x(k)). If F (θ(k) + x(k)) < F (θ(k)), the current search is said to be success and θ(k + 1) = θ(k) + x(k). else, — calculate F (θ(k) − x(k)). If F (θ(k) − x(k)) < F (θ(k)), the current search is said to be success too and θ(k +1) = θ(k)−x(k). otherwise, — the search is said to be failure, and  + −   θ(k) If Ker > Ker , Ker > Ker + − θ(k + 1) = θ(k) + x(k) If Ker < Ker   + − θ(k) − x(k) If Ker ≥ Ker

(13)

where Ker ≥ 1 is the maximum error ratio. The user defined parameter Ker = 1.001 for our + − experiments. Ker and Ker are defined by + Ker =

F (θ(k) + x(k)) F (θ(k))

(14)

− Ker =

F (θ(k) − x(k)) F (θ(k))

(15)

3. If a satisfied solution is found then stop, else set k = k + 1 and go to step 2. Note that the effectiveness of random search depends largely on the random search vector x(k).

5

Here, the x(k) is created and adapted by using similar technique with the Evolution Strategy (ES). x(k) = N (0, σ(k)2 ) σi (k) = σi (k − 1)eτ¯z eτ zi , i ∈ 1, 2, . . . , n

(16)

where x(k) is the parameter vector to be optimized and τ , τ¯ and σi are the strategy parameters. The values for τ and τ¯ are fixed to q √ √ τ = ( 2 n)−1 , τ¯ = ( 2n)−1 (17)

The zi and z are sampled from a standard normal distribution, which characterize the mutation exercised on the strategy and the objective parameters. The σi is also called step-sizes without self-adaption. Based on above parameter adjustment mechanism and the selection procedure of the random search algorithm the proposed learning method (without population) works just fine for parameter learning of a neural tree model. 3.4. The used general learning algorithm The general learning procedure for the optimal design of flexible neural trees can be described as follows. (1) Set initial values of parameters used in the MPIPE and the random search algorithms. Set elitist program as NULL and its fitness value as a biggest positive real number of the computer available. Create an initial population (the flexible neural trees and their corresponding parameters); (2) Do structure optimization using MPIPE algorithm as described in Subsec. 3.2, in which the fitness function is calculated by Mean Square Error (MSE): F it(i) =

P 1 X j (y − y2j )2 P j=1 1

(18)

where P is the total number of training samples, y1i and y2i are the actual and model outputs of i-th sample. F it(i) denotes the fitness value of i-th individual; (3) If a better structure is found, then go to step 4), otherwise go to step 2); (4) Parameter optimization by random search algorithm as described in Subsec. 3.3. In this stage, the tree structure or architecture of flexible neural tree model is fixed, and it is the best tree

6


taken from the end of run of the MPIPE search. All of the parameters used in the best tree formulate a parameter vector to be optimized by random search algorithm; (5) If the maximum number of random search is reached, or no better parameter vector is found for a significantly long time (100 steps) then go to step 6); otherwise go to step 4); (6) If satisfactory solution is found, then stop; otherwise go to step 2).

I0 = {+2 , +3 , . . . , +8 }, I1 = {+2 , x0 , x1 } and I2 = {x0 , x1 }. Where x0 and x1 denote input variables x1 and x2 of the static nonlinear function respectively. The initial parameters used in MPIPE are shown in Table 1. The evolved neural tree model is obtained at generation 15 with MSE value 0.000737 for training data Table 1. Parameters used in MPIPE algorithm for architecture optimization of the neural tree. Parameters

4. Illustrative Examples

Values

Population size P S Elitist learning probability Pel Learning rate lr Fitness constant ε Overall mutation probability PM Mutation rate mr Prune threshold TP Maximum random search steps Initial connection weights Initial parameters ap and bp

100 0.01 0.01 0.000001 0.4 0.4 0.999 2000 rand[−1, 1] rand[0, 1]

The developed neural tree model is applied here in conjunction with four problems: approximation of a static nonlinear function, prediction of the BoxJenkins time series, prediction of the Mackey-Glass time series and identification of a non-linear system. Well-known benchmark examples are used for the sake of an easy comparison with existing models. The data related to the first two examples are available on the web site of the Working Group on Data Modelling Benchmark-IEEE Neural Network Council.20 For each benchmark problem of the following examples, the instruction sets are selected according to the following ideas: the instruction set I0 is selected according complexity estimation of the problem at hand; the instruction set I1 is selected containing all terminal instructions and additional nonterminal instructions {+2 , . . . , +p }, here p is a user defined integer number. Note that if p is large than the input number, according to our experiments, the node in the neural tree may contain redundant input information and create some difficulties for the training process, therefore we usually select the p with interval [1, number of inputs]; and finally the instruction set I2 contains all the inputs variables.

Fig. 2. Evolved neural tree for approximating a static nonlinear function.

4.1. Approximation of a non-linear function

Table 2. The optimized flexible activation function parameters for static function approximation.

A non-linear static benchmark modelling problem20,24 is under consideration. The system is described with the equation: −1.5 2 y = (1 + x−2 ) , 1 ≤ x 1 , x2 ≤ 5 . 1 + x2

+6 w1

(+, f2)

(+, f 3)

(+, f 4)

(+, f 5) w16

w7

x1

(+, f 6)

(+, f 7) (+, f 8) (+, f 9) (+, f 10) x0 (+, f 11) (+, f 12) x0 w30

w17

x0

x0

x0

x0 x0 x1 x x x x1 0 1 0

f1 a 2.74

(19)

50 training and 200 test samples are randomly generated within interval [1, 5]. The static nonlinear function is approximated by using the neural tree model with the pre-defined instruction sets

x1

(+, f 1)

w6

w2

b 2.47 f5

a 2.26

b 2.40 f9

a 0.27

b 0.46

f2 a 3.18

f3 b 3.17

f6 a b 1.88 2.59 f10 a b −0.10 0.58

a 4.76

x0

x1 x1 x1

f4 b 2.48

f7 a b 1.67 2.15 f11 a b 2.67 2.57

a 0.08 f8 a 0.74 f12 a 1.70

b 2.12 b 0.93 b 2.42


7

Table 3. Comparison of models for the non-linear function approximation.

real output model output error

Author

Learning method

MSE

output and error

Lin27 Sugeno25 Delgado28 Russo26 Kukolj29 This model

Back-propagation Fuzzy+WLS Fuzzy Clustering GA+NN+fuzzy Fuzzy clustering+LS Neural Tree

0.005 0.01 0.231 0.00078 0.0015 0.00086

time

Fig. 3. Outputs of actual system, the neural tree model and the approximation error: for training data samples.

and validation data set, respectively. Table 3 contains comparison results of different models for the static function approximation and provides, in addition, results achieved with the neural tree model developed in this paper. It is obvious that the proposed neural tree model worked well for generating an approximating model of the static non-linear system.


4.2. Application to Box-Jenkins time series

output and error

time

Fig. 4. Outputs of actual system, the neural tree model and the approximation error: for test samples.

set and 0.00086 for validation data set, respectively. The optimized neural tree is shown in Fig. 2. The optimal weights from w1 to w30 are 2.09, −0.01, 3.01, 4.16, 0.72, 4.16, −0.49, 1.15, 1.96, 2.31, 1.58, 2.81, 0.08, 2.12, 1.13, 4.26, 0.65, 0.14, −0.33, −0.86, 0.69, 0.53, 0.28, −0.02, 0.71, 3.13, 0.38, −0.36, −0.97, −0.93, respectively. The optimized activation functions from f1 to f12 are listed in Table 2. Figures 3 and 4 present the outputs of actual static nonlinear function and the evolved neural tree model, and the prediction errors for training data set

In this subsection, the proposed neural tree model is applied to time series prediction problem, using the gas furnace data (series J) of Box and Jenkins (1970). The data set was recorded from a combustion process of a methane-air mixture. It is well known and frequently used as a benchmark example for testing identification and prediction algorithms. The data set consists of 296 pairs of input-output measurements. The input u(t) is the gas flow into the furnace and the output y(t) is the CO2 concentration in outlet gas. The sampling interval is 9s. Follows the previous researches28,34,35 and in order to make a meaningful comparison, the inputs of the prediction model is selected as u(t − 4) and y(t − 1), and the output is y(t). In this study, 200 data samples are used for training the neural tree model and the remaining data samples are used for testing the performance of evolved model. The used instruction sets for creating the neural tree model are I0 = {+2 , . . . , +8 }, I1 = {+2 , x0 , x1 } and I2 = {x0 , x1 }. Where x0 and x1 denotes the input variables of the neural tree model u(t − 4) and y(t−1), respectively. The parameters used in MPIPE evolution process are same as those described in Table 1.

8


+4 w4

w1

w2

w5

(+, f2)

w6

(+, f5)

(+, f4)

Model Name

w3

(+, f1)

x1

Table 4. Comparison of models for time series prediction.

w11

w12 w13

w14

x1

x1 x0

x0

ARMA36 Tong’s model37 Pedrycz’s model38 Xu’s model39 Sugeno’s model40 Surmann’s model41 Lee’s model42 Lin’s model43 Nie’s model44 ANFIS model45 FuNN model46 HyFIS model34 Neural Tree model

(+, f3)

w7

w8

x0

x1

w9

x1

Inputs

w10

x0

MSE

5 2 2 2 2 2 2 5 4 2 2 2 2

0.71 0.469 0.320 0.328 0.355 0.160 0.407 0.261 0.169 0.0073 0.0051 0.0042 0.00066

Fig. 5. The evolved neural tree model for prediction of Box-Jenkins data. Table 5. The optimal flexible activation function parameters for Box-Jenkins model.

f1

real outpt model output error

f2

f3

f4

f5

a b a b a b a b a b 1.37 1.55 0.21 1.22 0.43 1.10 0.77 0.73 0.86 1.04

output and error

0.22, 0.20, 1.27, 0.09, 0.50, −0.26, 1.04 and 86, 0.respectively. The optimal flexible activation 0.functions are listed in Table 5. Table 4 shows the comparison results of different models for Box-Jenkins data prediction problem.

time

Fig. 6. The time series data, output of the neural tree and the prediction error for training and test samples.

After 7 generations of the evolution, the optimal neural tree model was obtained with the MSE 0.000664. The MSE value for validation data set is 0.000701. The evolved neural tree is shown in Fig. 5 and a comparison has been made to show the actual output, the neural tree model output and the prediction error (see Fig. 6). The optimal weights from w1 to w14 are 0.41, 0.16, −0.93, 1.23, 0.48, −0.14,

4.3. Application to Mackey-Glass time series The chaotic Mackey-Glass differential delay equation is recognized as a benchmark problem that has been used and reported by a number of researchers.12,47 The mackey-Glass chaotic time series is generated from the following equation: ax(t − τ ) dx(t) = − bx(t) . dt 1 + x10 (t − τ )

(20)

Where a = 0.2, b = 0.1, and τ > 17, the equation shows chaotic behavior. To make the comparison with original neural tree model12 fair, we predict the


Table 6. Comparison of prediction results for the test data of the Mackey-Glass problem.

+5 w1 x9

x3

x8

x9 (+3, f)

x8

x1

Fig. 7. The evolved neural tree model for predicting of Mackey-Glass time series data.

Data for training

Data for test

0.14

0.12

0.1

0.08

x(t)

Pop Size Generation Error Units Hidden Layers

Iba47

Zhang12

This model

80 1740 4.7 × 10−6 78 13 8

200 60 6 × 10−7 59 13 4

100 10 2.5 × 10−7 9 1 3

w8

w7

x6

Method

w5

w2 w w 3 4

w6

9

0.06

0.04 actual output model output error

0.02

0

1, . . . , 9) denotes x(t), x(t + 1), x(t + 2), x(t + 3), x(t+4), x(t+5), x(t+6), x(t+7), x(t+8) and x(t+9), respectively. The parameters used in MPIPE evolution process are same as those described in Table 1. After 10 generations of the evolution, the optimal neural tree model was obtained with the MSE 2.28 × 10−7 . The MSE value for validation data set is 2.55 × 10−7 . The evolved neural tree is shown in Fig. 7 and a comparison has been made to show the actual output, the neural tree model output and the prediction error (Fig. 8). The optimal weights from w1 to w8 are 0.931, 1.138, 0.126, −1.12, 0.0057, 2.355, 1.64, and 0.68, respectively. The optimal flexible activation function parameters of the hidden unit are a = 3.917 and b = 1.914. Table 6 shows the comparison results of different models for Mackey-Glass prediction problem. It is obvious that the proposed flexible neural tree model P Q performs better than the original − neural tree model both on the number of adjustable parameters or on the prediction accuracy.

0.02 0

50

100

150

200

250

300

350

400

450

500

t

Fig. 8. The time series data, the output of the neural tree and the prediction error for training and test samples.

x(t + 10) using the inputs variables x(t), x(t + 1), x(t + 2), x(t + 3), x(t + 4), x(t + 5), x(t + 6), x(t + 7), x(t + 8) and x(t + 9). 500 sample points used in our study. The first 100 data pairs of the series were used as training data, while the remaining 400 data pairs were used to validate the model identified. The used instruction sets for creating the neural tree model are I0 = {+3 , . . . , +10 }, I1 = {+2 , . . . , +10 , x0 , x1 , . . . , x9 } and I2 = {x0 , x1 , . . . , x9 }, respectively. Where xi (i = 0,

4.4. Nonlinear system identification A second-order non-minimum phase system with gain 1, time constants 4s and 10s, a zero at 1/4s, and output feedback with a parabolic nonlinearity is chosen to be identified.48 With sampling time T0 = 1s this system follows the nonlinear difference equation y(k) = −0.07289[u(k − 1) − 0.2y 2 (k − 1)]

+ 0.09394[u(k − 2) − 0.2y 2 (k − 2)]

+ 1.68364y(k − 1) − 0.70469y(k − 2) . (21) where the input lie in the interval [−1, 1]. A training and a test sequence of 1000 samples each were generated. The input sequence for

10


Table 7. The optimal flexible activation function parameters for dynamic system.

f1 b 0.79

a 0.42

b 1.06

u

a −0.03

f2

+4 w1 x0

w2

x2

w3 w4 x3 w5 x0

(+3, f1 )

time

(a)

w7

w6

x0

(+3, f2 )

w9

w10

u

w8

x3

Fig. 9. Evolved neural tree for identification of a second order non-minimum phase system.

x0

x3

training consists of pulses with random amplitude in the range [−1, 1] and with random duration between 1 and 10 sampling periods (Fig. 10(a)). The input sequence for test consists of specific pulses shown in Fig. 10(b). The used instruction sets are I0 = {+2 , +3 , . . . , +10 }, I1 = {+2 , +3 , u(k−1), u(k−2), y(k− 1), y(k − 2)} and I2 = {x0 , x1 , x2 , x3 }. Where x0 , x1 , x2 and x3 represents u(k − 1), u(k − 2), y(k − 1) and y(k −2), respectively. The output is y(k). The parameters used in MPIPE evolution process are also the same as those described in Table 1. After 5 generations of the evolution, the optimal neural tree model was obtained with the MSE value 0.000394. The MSE value for test data set is 0.000437. It is clear that Both the training error and test error of the neural tree model are smaller than the errors of the local linear neural-fuzzy model.48 The evolved optimal weights from w1 to w10 are 0.05,

time

(b) Fig. 10. Input signals for generating the excitation and test data sets of the dynamic system. The left-hand side shows the input signal for creating the training data set and the right-hand side for the test data set.

−0.39, −0.77, 1.72, 0.48, 0.86, 0.51, 0.49, −0.18 and −0.74, respectively. The optimal flexible activation functions are shown in Table 7. The evolved neural tree is shown in Fig. 9. A comparison has been made to show the actual output, the neural tree model output and the identification error of the dynamic system. (see Fig. 11). From above simulation results, it can be seen that the proposed neural tree model works very well for nonlinear function approximation, time series prediction and dynamic system identification problems.



output and error

time

(a) real output model output error

output and error

Simulation results for the benchmark problems of nonlinear static function approximation, time series prediction and system identification showed the feasibility and effectiveness of the proposed method. The key problem for finding an appropriate neural tree to model a nonlinear system at hand is how to find a optimal or near-optimal solution in the neural tree structure space and related parameter space. This paper provides a method, alternatively searches between the two spaces by using MPIPE and a random search algorithm. But other tree-structure based methods and parameter learning algorithms can also be used to solve the problem. Further research should be focused on developing new methods to reduce the search space and to find fast convergence algorithm. Acknowledgments This research was partially supported by the National Natural Science Foundation of China (NSFC), Project No. 69902005 and Provincial Natural Scifence Foundation of Shandong, Project No. Y2001G09.

References

11

time

(b) Fig. 11. Comparison between outputs of dynamic system and simulated neural tree model and identification error: for training data set (a) and for validation data set (b).

5. Conclusion In this paper, a flexible neural tree model and its design and optimization algorithm are proposed. In the viewpoint of calculation structure, the flexible neural tree model can be viewed as a flexible multi-layer feedforward neural network with over-layer connections and free activation function parameters. The work demonstrates that it is possible to find an appropriate way to evolve the structure and parameters of artificial neural networks simultaneously by using flexible neural tree representation.

1. S. Omatu et al., Neuro-control and its Applications (Springer-Verlag, 1996). 2. G. F. Miller et al., Designing neural networks using genetic algorithms, Proc. 3rd Int. Conf. Genetic Algorithm and Their Applications (Morgan Kaufmann, San Mateo, CA, 1989), pp. 379–384. 3. D. Whitley et al., Genetic algorithms and neural networks: optimizing connections and connectivity, Parallel Computing 14 (1990), 347–361. 4. D. B. Fogel et al., Evolving neural networks, Biological Cybernetics 63 (1990), 487–493. 5. N. Saravanan et al., Evolving neural control systems, Int. J. Intelligent Systems 10 (1995), 23–27. 6. J. R. McDonnell et al., Evolving recurrent perceptrons for time-series modeling, IEEE Trans. on Neural Networks 5 (1994), 24–38. 7. S. E. Fahlman et al., The cascade-correlation learning architecture, Advances in Neural Information Processing Systems 2 (1990), 524–532. 8. J. P. Nadal, Study of a growth algorithm for a feedforward network, Int. J. Neural Systems 1 (1989), 55–59. 9. R. Setiono et al., Use of a quasi-newton method in a feedforward neural network construction algorithm, IEEE Trans. on Neural Networks 6 (1995), 273–277. 10. P. J. Angeline et al., An evolutionary algorithm that constructs recurrent neural networks, IEEE Trans. on Neural Networks 5 (1994), 54–65.

12


11. X. Yao et al., A new evolutionary system for evolving artificial neural networks, IEEE Trans. on Neural Networks 8 (1997), 694–713. 12. B. T. Zhang et al., Evolutionary induction of sparse neural trees, Evolutionary Computation 5 (1997), 213–236. 13. R. Salustowicz and J. Schmidhuber, Probabilistic incremental program evolution, Evolutionary Computation 5 (1997), 123–141. 14. Y. Chen et al., System identification and control using probabilistic incremental program evolution algorithm, J. Robotics and Machatronics 12 (2000), 657–681. 15. K. Balakrishnan et al., Evolutionary design of neural architectures, Tech. Rep. No. CS-TR-95-01, Ames, IA: Iowa State University, Department of Computer Science, Artificial Intelligence Laboratory (1995). 16. R. K. Belew et al., Evoving networks: using genetic algorithms with connectionist learning, in Artificial Life II, SFI Studies in the Science of Complexity, eds. C. G. Langton, C. Taylor, J. D. Farmer & S. Rasmussen 10 (1991). 17. H. Kitano, “Designing neural networks using genetic algorithms with graph generation system,” Complex Cybernetics 4 (1990), 461–476. 18. F. Gruau, “Genetic synthesis of modular neural networks,” in Proc. 5th Int. Conf. on Genetic Algorithms, eds. S. Forrest (1993), pp. 318–325. 19. J. Hu, et al., RasID — Random search for neural network training, Journal of Advanced Computational Intelligence 2 (1998), 134–141. 20. Working Group on Data Modeling Benchmark, Standard Committee of IEEE Neural Network Council. http://neural.cs.nthu.edu.tw/jang/benchmark/ 21. Y. Chen and S. Kawaji, Evolutionary control of discrete-time nonlinear system using pipe algorithm, Proc. IEEE Int. Conf. Systems, Man and Cybernetics (1999), pp. 1078–1083. 22. S. Kawaji and Y. chen, Evolving neurofuzzy systems by hybrid soft computing approaches for system identification, J. Advanced Computational Intelligence 5 (2001), 220–228. 23. Y. Chen and S. Kawaji, Evolving ANNs by hybrid approaches of PIPE and random search algorithm for system identification,” Proc. 3rd Asian Control Conf. (ASCC’2000) (2000), pp. 2206–2211. 24. M. Sugeno et al., A fuzzy-logic approach to qualitative modeling, IEEE Trans. Fuzzy Syst. 1 (1993), 7–31. 25. M. Russo et al., Generic fuzzy learning, IEEE Trans. Evol. Comput. 4 (2000), 259–273. 26. Y. Lin et al., Using fuzzy partitions to create fuzzy systems from input-output data and set initial weights in a fuzzy neural network, IEEE Trans. Fuzzy Syst. 5 (1997), 614–621. 27. M. Delgado et al., Fuzzy clustering-based rapid prototyping for fuzzy rule-based modeling, IEEE Trans. Fuzzy Syst. 5 (1997), 223–233.

28. D. Kukolj et al., Design of adaptive Takagi-SugenoKang fuzzy models, Applied Soft Computing 2 (2002), 89–103. 29. X. Yao, Evolving artificial neural networks, Proc. see IEEE 87 (1999), 1423–1447. 30. X. Yao, Y. Liu and G. Lin, Evolutionary programming made faster, IEEE Transactions on Evolutionary Computation 3 (1999), 82–102. 31. B.-T. Zhang, A bayesian evolutionary approach to the design and learning of heterogenous neural trees, Integrated Computer-Aided Engineering 9 (2002), 73–86. 32. K. O. Stanley and R. Miikkulainen, Evolving neural networks through augmenting topologies, Evolutionary Computation 10 (2002), 99–127. 33. C. Ferreira, Genetic representation and genetic neutrality in gene expression programming, Advances in Complex Systems 5 (2002), 389–408. 34. J. Kim et al., HyFIS: Adaptive neuro-fuzzy inference systems and their application to nonlinear dynamical systems, Neural Networks 12 (1999), 1301–1319. 35. H. Surmann et al., Learning feed-forward and recurrent fuzzy systems: A genetic approach, J. Systems and Architecture 47 (2001), 649–662. 36. G. E. P. Box, Time series analysis, Forecasting and control (San Francisco, Holden Day, 1970). 37. R. M. Tong, The evaluation of fuzzy models derived from experimental data, Fuzzy Sets and Systems 4 (1980), 1–12. 38. W. Pedtycz, An identification algorithm in fuzzy relational systems, Fuzzy Sets and Systems 13 (1984), 153–167. 39. C. W. Xu et al., Fuzzy model identification and selflearning for dynamic systems, IEEE Trans. on Systems, Man aand Cybernetics 17 (1987), 683–689. 40. M. Sugeno et al., Linguistic modelling based on numerical data, Proc. IFSA’91 (1991), pp. 234–247. 41. H. Surmann et al., Self-organising and genetic algorithm for an automatic design of fuzzy control and decision systems, Proc. FUFIT’s93 (1993), pp. 1079–1104. 42. C. C. Lee et al., A combined approach to fuzzy model identification, IEEE Trans. on Systems, Man and Cybernetics 24 (1994), 736–744. 43. Y. Lin et al., A new approach to fuzzy-neural system modelling, IEEE Trans. on Fuzzy Systems 3 (1995), 190–198. 44. J. Nie, Constructing fuzzy model by self-organising counterpropagation network, IEEE Trans. on Systems, Man and Cybernetics 25 (1995), 963–070. 45. J. S. R. Jang et al., Neeuro-fuzzy and soft computing: a computational approach to learning and machine intelligence (Prentice-Hall, Upper saddle River, NJ, 1997). 46. Kasabov et al., FuNN/2 — A fuzzy neural network architecture for adaptive learning and knowledge acquisition, Inforation Science 101 (1997), 155–175. 47. H. Iba et al., System identification using structured


genetic algorithm, Proc. 5th Int. Conf. Genetic Algorithms (1993), 279–286. 48. N. Oliver, Nonlinear System Identification — From Classical Approaches to Neural Networks and Fuzzy Models (Springer-Verlag, 2001).

13

49. Y. Chen and S. Kawaji, Evolving the basis function neural networks for system identification, J. Advanced Computational Intelligence 5 (2001), 229–238.