A Hierarchical Evolutionary Algorithm for Constructing ... - Springer Link

0 downloads 0 Views 186KB Size Report
constructing and training the wavelet network for ... simple analytical form and good spatial-spectral ..... perature for transformers using neural networks. IEEE.
Neural Comput & Applic (2002)10:357–366 Ownership and Copyright  2002 Springer-Verlag London Limited

A Hierarchical Evolutionary Algorithm for Constructing and Training Wavelet Networks Yongyong He, Fulei Chu and Binglin Zhong Department of Precision Instruments, Tsinghua University, Beijing, P. R. China

The wavelet network has been introduced as a special feed-forward neural network supported by the wavelet theory, and has become a popular tool in the approximation and forecast fields. In this paper, an evolutionary algorithm is proposed for constructing and training the wavelet network for approximation and forecast. This evolutionary algorithm utilises the hierarchical chromosome to encode the structure and parameters of the wavelet network, and combines a genetic algorithm and evolutionary programming to construct and train the network simultaneously through evolution. The numerical examples are presented to show the efficiency and potential of the proposed algorithm with respect to function approximation, sunspot time series forecast and condition forecast for a hydroturbine machine, respectively. The study also indicates that the proposed method has the potential to solve a wide range of neural network construction and training problems in a systematic and robust way. Keywords: Approximation; Evolutionary programming; Forecast; Genetic algorithm; Wavelet network

1. Introduction Condition forecasting is an indispensable part of the process of condition monitoring and fault diagnosis for machines. Effective condition forecasting would reduce expensive and dangerous machine failure and plant shutdown. Predicting the time series of the operating parameters of machines is the main Correspondence and offprint requests to: Y. He, Department of Precision Instruments, Tsinghua University, Beijing 10084, P. R. China. Email: heyy얀pim.tsinghua.edu.cn

approach for condition forecasting, but it is not easy to implement with satisfactory accuracy, because the time series of the operating parameters is always severely nonlinear and non-stationary due to the time-varying characteristics of the fault development. Conventional methods for forecasting the time series (linear or nonlinear) are mainly based on the classical Auto-Regressive (AR) and Auto-RegressiveMoving-Average (ARMA) models, such as the Vector Auto-regressive (VA) model, the bilinear model and the threshold auto-regressive model [1,2]. These models can achieve acceptable forecast results for the linear system, but have also proved to be ineffective for nonlinear systems. In addition, these methods are always difficult to implement due to their complex modelling and high computation costs. Neural network theory is another approach towards function approximation and time series forecasting, and it has obtained wide approval in the fields of approximation and forecast [3–5]. Cybenko’s [6] research has made the neural network into a powerful tool for approximating and forecasting. The wavelet theory has found many applications in numerical analysis and signal processing [7], the Wavelet Transform (WT) of which provides another novel approach towards the function approximation problem [8]. Wavelet networks, which combine the wavelet theory and feed-forward neural networks, utilise wavelets as the basis function to construct a network, being of universal approximation performance as wavelet decomposition. Furthermore, wavelet networks have more freedom by introducing two new parameters for the dilation and translation of the WT, and therefore can provide a better performance of approximation and forecasting than ordinary basis function networks, such as the Radial Basis Function (RBF) network [9,10].

358

Wavelet networks have two versions [11]: one uses the orthonormal wavelet basis to construct the network. For this sort of wavelet network, the structure of the network is determined once the orthonormal wavelet basis has been determined. Therefore, there is no constructing problem for such a network. In addition, linear optimisation techniques, such as the pseudo inverse and least mean square techniques, can be used directly to train the network, just as for the training of the RBF network. However, to generate an orthonormal basis, the wavelet function has to satisfy strong restrictions [12,13], and some compromise is necessary between the regularity and compactness of the wavelet function. Therefore, another version of the wavelet network uses nonorthogonal wavelet families, in particular a wavelet frame, to construct the network. Informally speaking, a frame is a ‘redundant basis’. It is more convenient in practice to use a redundant wavelet family than an orthonormal wavelet basis for constructing the wavelet network, because admitting redundancy allows us to construct a wavelet function with a simple analytical form and good spatial-spectral localisation properties. So, the second version of the wavelet network has been adopted widely in various application areas. However, how to construct such wavelet network (i.e. how to determine the hidden layer) is a problem, and a number of systematic approaches have been proposed. The main idea of these approaches is based on the ‘hill climbing’ algorithm [11,14,15], and the network architecture constructed by such methods is not optimal but a trade-off between optimality and efficiency. In addition, the training of the network is still mainly based on the gradient-based algorithm, and the local minimum problem has still not been overcome [9,10]. Recently, the Genetic Algorithm (GA) has been used to train the network [16,17], but such attempts always separate the network training from the network construction, the efficiency of which is therefore limited. In this paper, to improve the performance of the wavelet network, an evolutionary algorithm is proposed for constructing and training the wavelet network for approximation and forecasting. This evolutionary algorithm uses the hierarchical chromosome to encode the structure and parameters of the wavelet network, and combines a genetic algorithm and evolutionary programming to construct and train the network simultaneously through evolution. By means of this evolutionary algorithm, the structure of the network will be more reasonable, and the local minimum problem in the network training will also be overcome effectively. The paper is organised as follows. In Section 2, we recall some results from wavelet theory. The

Y. He et al.

general framework of the wavelet network is introduced with respect to function approximation in Section 3. The evolutionary strategy for constructing a wavelet network is presented in detail in Section 4. The numerical examples are shown with respect to function approximation, sunspot time series forecasting and condition forecasting for hydroturbine machines, respectively, in Section 5. Finally, some conclusions are drawn in Section 6.

2. Wavelet Transforms and Wavelet Networks We briefly recall some basic concepts about wavelet transforms that will be useful for developing wavelet networks. 2.1. Wavelet Transforms (WT) We define a square integral function ␺(t) (namely ␺(t) 苸 L2(t)) as a family of functions which satisfies the following Eq.:

冕 ⬁

⫺⬁

兩␺ˆ (w)兩2 dw ⬍ ⬁ 兩w兩

(1)

where ␺ˆ (w) is the Fourier transform of ␺(x), and ␺(x) is called the ‘mother wavelet function’. A wavelet family can be obtained from the single function by dilations and translations:



␾ ⫽ ␺ab(x)

冉 冊

⫽ 兩a兩⫺1/2 ␺

x⫺b , z



(2)

a 苸 R⫹, b 苸 R

where ␺ab(x) is defined as a continuous wavelet from the mother wavelet function ␺(x) (i.e. the wavelet basis), and a and b are the dilation and translation parameters. Assuming f(x) 苸 L2 (R) is a finite energy function, then the wavelet transform is defined as follows: Wf(a,b) ⫽⬍ f, ␺ab ⬎ ⫽ 兩a兩⫺1/2 f(x) ⫽ C␺ ⫽

1 C␺



R

冕 冕 R⫹



R

⫹⬁

⫺⬁

冉 冊

f(x) ␺

x⫺b dx a

1 W (a,b)␺ab (x)dadb a2 f

兩␺ˆ (␻)兩2兩␻兩⫺1 d␻

(3)

(4)

(5)

Algorithm for Constructing and Training Wavelet Networks

359

It can been seen from Eq. (3) that the translation parameter b controls the window position of the wavelet basis, and the change of parameter a would not only influence the frequency spectrum structure of the continuous wavelet, but also the window size and the form. Therefore, for the function f(x), the resolution of its local structure can be adjusted by parameters a and b. It also can been seen from Eq. (4) that, in the same way as for Fourier analysis, a wavelet transform decomposes the function f(x) into a wavelet function series, and uses such a series to approximate the function f(x). However, different from Fourier analysis, the wavelet function is constructed by dilation and translation, and therefore has excellent localisation properties. In addition, if ␺ is chosen as an appropriate wavelet according to some conditions, the family ␾ is dense in L2(R). Hence, the collection of all linear combinations of elements from the family ␾ is dense in L2(R), i.e. ∀f(x) 苸 L2(R), and it can be approximated by the linear combination of the wavelet functions. For the multidimensional function f(x1, x2, . . .,xn), the wavelet function can be obtained by the scalar product of ␺:

写␺

(xi)

(6)

i⫽1

The corresponding wavelet function family is defined as

␾ ⫽ {␺ab (x) ⫽ det(A⫺1/2)␺[Ax

(7)

⫺ b]: b 苸 Rn, A ⫽ diag(a), a 苸 Rn⫹} For more details about wavelet transform, see Daubechies [12].

yj ⫽

wij␺aibi (x) ⫹ ␪j

j ⫽ 1,2,· · ·,N

Most Feed-forward Neural Networks (FNN) are the expansion models of the basis function (such as sigmoid basis, radial basis, spline basis, etc.), and which have good performance in function approximation. The network can be viewed as a complex nonlinear mapping model:

冘冘 P

N

(yj ⫺ yˆ jp)2

Wj,N · S(WM,j · x ⫺ ␪j)

where P is the number of training samples, and yˆ jp is the desired output of the pth sample.

(8)

j⫽1

where M and N are the node numbers of the input and output layers, respectively, L is the node number of the hidden layer, S(.) is the activating function of the node in the hidden layer, Wj,N and WM,j are the connection weights from the input layer to the hidden layer and from the hidden layer to the output

(10)

p⫽1 j⫽1

L

y ⫽ f(x) ⫽

(9)

where wij is the connection weight from the hidden layer to the output layer, and ␪j is the parameter for zeroing. Introducing the parameter ␪j into the network can make the network able to approximate the function with a nonzero mean. The network is trained to determine the parameters (ai, bi, Wij, ␪j, i ⫽ 1, · · ·, L, j ⫽ 1, · · ·,N) by minimising the mean square function of the training error, which is described as follows: 1 E⫽ 2

2.2. Wavelet Networks (WN)



冘 L

i⫽1

n

␺(x) ⫽ ␺(x1,x2, . . .,xn) ⫽

layer, respectively, and ␪j is the threshold of the hidden node j. Wavelet networks are just feed-forward neural networks with only one hidden layer consisting of the wavelet function from the wavelet family. If the wavelet family (7) constitutes an orthonormal basis of L2(Rn), then selecting elements of this family for the construct wavelet network of the form in Eq. (8) will obviously be able to approximate any function of L2(Rn) [11]. More generally, family (7) may constitute a frame of L2(Rn) instead of a basis [12]. In this case, family (7) is redundant to span the L2(Rn) space. It is more convenient in practice to use a redundant wavelet family than an orthonormal wavelet basis for constructing the wavelet network, because admitting redundancy allows us to construct wavelet functions with a simple analytical form and good spatial-spectral localisation properties. In this paper, we use such a non-orthogonal wavelet family to construct the wavelet network. The structure of the wavelet network is described in Fig. 1, and the output of the network is

Fig. 1. Structure of a wavelet network.

360

Y. He et al.

The wavelet function is not the global function (infinite energy function, such as the sigmoid function), but the local function. Therefore, among the whole input scope of the network, the hidden nodes with the wavelet function influence the networks output only in some local range. This can obviate the interaction between the nodes and facilitate the training process and generalisation performance. The radial basis is also the local function, but it does not have the spatial-spectral zooming property of the wavelet function, and therefore cannot represent the local spatial-spectral characteristic of the function. So, the wavelet network should have a better performance for approximation and forecasting than the traditional FNN.

3. Evolutionary Algorithm for Constructing and Training of Wavelet Networks 3.1. Introduction to Evolutionary Computation [18,19] Evolutionary Computations (ECs) refer to a class of population-based stochastic search algorithms that are developed from the ideas and principles of natural evolution. They include Genetic Algorithm (GAs), Evolutionary Programming (EP) and Evolution Strategies (ES). One important feature of all these algorithms is their population-based search strategy. Individuals in a population compete and exchange information with each other in order to perform certain tasks. A general framework of ECs can be described by Algorithm 1 in Fig. 2. ECs are particularly useful for dealing with large complex problems which generate many local optima. They are less likely to be trapped in local minima than traditional gradient-based search algorithms. They do not depend upon gradient information, and thus are quite suitable for problems where such information is unavailable or very costly to obtain or estimate. They can even deal with problems where no explicit and/or exact objective

Fig. 2. A general framework of ECs.

function is available. These features make them much more robust than many other search algorithms. Fogel [20] and Back et al. [21] give a good introduction to various evolutionary computations for optimisation. Various combinations of ECs, usually genetic algorithms and Artificial Neural Networks (ANN), have been investigated [17]. Much research concentrates on the acquisition of parameters for fixed network structures. Other work allows a variable topology, but always disassociates structure construction from connection weight training. Comparatively, for wavelet networks, very little work has been done with ECs. Yao uses EP for training the wavelet network [16]. It is a good attempt, but in his work the construction of the network is not considered. In this paper, we propose a hierarchical evolutionary algorithm which is based on the hierarchical chromosome and combines GAs and EP, to associate the constructing and training for the wavelet network. 3.2. Hierarchical Chromosome According to the research results in biology, the chromosome is constructed by a series of genes arrayed in a hierarchical manner. In such a hierarchical chromosome, some genes control the state (be active or be inhibitive) of the others. Borrowing such ideas, we utilise a hierarchical structure chromosome for our evolutionary algorithm to construct and train the wavelet network, described in Fig. 3. This hierarchical chromosome consists of a control layer chromosome and a parameter layer chromosome. The genes in the control layer chromosome control the corresponding chromosomes in the parameter layer. The binary representation scheme is used for encoding of the control layer chromosome. The bits of ‘1’ in the binary string indicate that the corresponding parameter chromosome is active (i.e. valid in evolution), and the bits of ‘0’ in the binary string indicate that the corresponding parameter chromosome is inhibitive (i.e. invalid in evolution). Real-number representation schemes can be adopted to encode the parameter layer chromosome. Such a hierarchical chromosome contains more information and has a more flexible representation of genes in the evolution, therefore it can be used to deal with more complex problems. If such a hierarchical chromosome is used to encode the structure and connection weights of the wavelet network simultaneously, (i.e. the control layer chromosome is used to encode the structure and the parameter layer chromosome is used to encode the connection weights), then

Algorithm for Constructing and Training Wavelet Networks

361

Fig. 3. Hierarchical chromosome for wavelet network.

such chromosome contains the information of the structure and connection weights of the network. With an evolutionary algorithm with such a chromosome, we can achieve optimal construction and training of the network simultaneously. For a given function approximation or forecast problem, the input and output layers of the wavelet network can be determined beforehand. Therefore, only the hidden nodes (i.e. the wavelet function) of the network needs to be determined in the process of construction. Thus, the hidden layer of the network can be directly encoded by the control layer chromosome. Each bit in the control layer chromosome represents one hidden node. During evolution, in some generations, if some bit is ‘1’, then the corresponding hidden node is valid in the network, and corresponding network parameters in the parameter layer chromosome are evolved by the genetic operations. If the bit is ‘0’, then the corresponding hidden node is invalid in the network, and the evolution of corresponding network parameters in the parameter layer chromosome pauses in this generation. The hierarchical chromosome for the wavelet network (described in Fig. 1) is presented in Fig. 3, where L is the maximal node number in the hidden layer set beforehand, and the last bit of the control gene string is set as ‘1’ to control the zeroing parameter of the network.

are created from the uniform distribution over the range of [0.5, 0.5]. Usually, the larger population implies the better solution and longer running time. Rather than using a fixed initial population size, we perform several experiments, varying the initial population size, and select the best one. 3.3.2. Fitness evaluation. In EC, to improve the solution, each individual is evaluated using some measure of fitness. A fitness value is computed for each individual in the population, and the objective is to find the individual that has the highest fitness for the problem considered. The objective of the optimal construction and training of the wavelet network is that the training error and network complexity reach their minimum simultaneously; the result should be a trade-off between the training error and network complexity. Therefore, the fitness function can be computed as f(t) ⫽

1 ␥ · 1gN ⫹ ␤ · E

(11)

where N is the number of the hidden nodes (i.e. the number of the bits of ‘1’ in the control layer chromosome), E is the training error function described as Eq. (10), t is the evolution step, and ␥ and ␤ are the coefficients. 3.3.3. Genetic operator

3.3. Evolutionary Algorithm Design Based on a Hierarchical Chromosome The hierarchical chromosome described in Fig. 3 is utilised for our evolutionary algorithm. According to the encoding scheme and the properties of the two kinds of chromosome, the genetic algorithm is used to deal with the control layer chromosome, and the parameter layer chromosome is tackled by the evolutionary programming. 3.3.1. Initialisation. We configure the evolutionary alogrithm so that it creates a fixed number of initial individuals at random. For the control layer, the chromosomes are created randomly from a binary string. For the parameter layer, the chromosomes

1. Selection. The selection operation selects two individuals for mating. In our work, the classical roulette wheel method combined with the elite strategy is implemented. During roulette wheel selection, two mates are selected for reproduction with probability values in direct proportion to their fitness values. Therefore, the fitter individuals will contribute a greater number of offspring in the succeeding generation. Meanwhile, the elite strategy forces the best individual of the current generation to always survives in the next generation. The selection process is illustrated in Fig. 4. 2. Crossover. The crossover operation is a process by which new individuals (offspring) are created from existing ones (parents) during reproduction.

362

Y. He et al.

Fig. 4. Selection procedure.

In our work, for the control layer chromosome only, two-point crossover is implemented with a probability of pc (苸(0.5, 1.0)) by choosing two random points in the selected pair of strings, and exchanging the substrings defined by the chosen points. Figure 5(a) illustrates this process. 3. Mutation. The role of the mutation operation is to introduce new genetic materials (genes) to the chromosomes with a probability of pm (苸(0.005, 0.1)), thus preventing the inadvertent loss of useful genetic material in earlier phases of evolution. But the mutation scheme of GAs and EP is different. In our work, two-element swap mutation is implemented, as in Fig. 5(b), for the control layer chromosome. The mutation scheme of EP is utilised to deal with the parameter layer chromosome as follows: p ⫽ p ⫹ ␣T(t)N(0,1)

(12)

where p is the network parameter (i.e. ai, bi, wij and ␪j), N(0, 1) is the standard normal distribution, ␣ is the severity coefficient of the mutation and ␣ 苸 [0, 1]. T(␩t) is the network’s temperature of the individual ␩t in the tth generation, which is defined as follows: f(␩t) T(␩t) ⫽ 1 ⫺ fsum(t)

(13)

where f(␩t) is the fitness of the individual ␩t in the

tth generation, and fsum(t) is the fitness summation of the tth generation. Thus, the temperature of a network is determined by how close the network is to being a solution for the task. This measure of the network’s performance is used to anneal the structure and parametric similarity between parent and offspring, so that networks with a high temperature are mutated severely, and those with a low temperature are mutated only slightly. This initially allows a coarse-grained search, and a progressively finer-grained search as a network approaches a solution to the task. T(␩t) is related to the concept of temperature in a simulated anneal [22], where a higher temperature indirectly increases the variety of states that can be visited by the system. While large parametric mutations are occasionally necessary to escape parametric local minima during search, it is more likely they will adversely affect the offspring’s ability to perform better than its parent. To compensate, Eq. (13) is revised as follows:



T(␩t) ⫽ U(0,1) 1 ⫺



f(␩t) fsum(t)

(14)

where U(0, 1) is the uniform distribution over the range of [0, 1].

4. Experimental Results and Discussion In this section, to investigate the feasibility and effect of the proposed evolutionary algorithm for the wavelet network, three experiments are presented with respect to one-dimension function approximation, the sunspots time series forecast, and the condition forecast for hydroturbine machines, respectively. All algorithms are implemented in MATLAB. 4.1. Approximation of One-dimension Function The selected function is a piecewise function defined as follows:

Fig. 5. Crossover and mutation operation. (a) Two-point crossover scheme, (b) two-element swap mutation scheme.

Algorithm for Constructing and Training Wavelet Networks

f(x) ⫽ ⫺2.186x ⫺ 12.864



363

(15) ⫺10 ⱕ x ⬍ ⫺2

4.246x

⫺2 ⱕ x ⬍ 0

10e⫺0.05x⫺0.5 · sin[(0.3x ⫹ 0.7)x]

0 ⱕ x ⱕ 10

The wavelet function we have taken is the so2 called ‘Gaussian-derivative’ function ␺(x) ⫽ ⫺xe⫺.x . The maximal number of the hidden nodes is set to 20, i.e. the length of the control layer chromosome is 20. The wavelet network with only one input node and one output node is employed. The evolutionary parameters are determined as follows: PopFsise ⫽ 50, pc ⫽ 0.85, pm ⫽ 0.08, MaxFgen ⫽ 200. The coefficients in the fitness function (described in Eq. (11)) determined as ␥ ⫽ 0.1 and ␤ ⫽ 20. The severity coefficient in Eq. (12) is set as ␣ ⫽ 0.25. 2000 of sample are drawn from Eq. (15). 1000 sets of sample are used to train the network, and the rest are used to test the network. After 100 generations through evolution by the proposed evolutionary algorithm, the wavelet network with nine hidden nodes in the hidden layer is obtained. Figure 6 shows the results for approximation of the selected piecewise function, and a comparison of the approximation performance with other methods is presented in Table 1. 4.2. Forecast of Sunspots Time Series The sunspots time series is a classical example of a combination of periodic and chaotic phenomena, and serves as one of the well-known benchmarks for non-stationary and nonlinear time series [23]. Much work has been done in trying to analyse the sunspots time series using linear and nonlinear methods. To make the comparison between the wavelet network and other forecasting results from the literature, the sunspots for the years 1700–1920 were chosen to be the training samples, and a single step forecast of the sunspots for the years 1921– 1955 was utilised.

Fig. 6. Approximation results of the one-dimension function.

The wavelet function we have taken is still the 2 ‘Gaussian-derivative’ function ␺(x) ⫽ ⫺xe⫺.x . The maximal number of hidden nodes is set to 30, i.e. the length of the control layer chromosome is 30. The wavelet network with four input nodes and one output node is employed. The evolutionary parameters are determined as follows: PopFsise ⫽ 70, pc ⫽ 0.80, pm ⫽ 0.10, MaxFgen ⫽ 300. The coefficients in the fitness function (described in Eq. (11)) are determined as ␥ ⫽ 0.15 and ␤ ⫽ 18. The severity coefficient in Eq. (12) is set as ␣ ⫽ 0.30. After 150 generatio through evolution by the proposed evolutionary algorithm, the wavelet network with 24 hidden nodes in the hidden layer is obtained. Figure 7 shows forecast results of the sunspots time series, and a comparison of the forecast performance with other methods is presented in Table 1. 4.3. Condition Forcast of Hydroturbine Machine The hydroturbine machine is a Guangzhou (China) 300 MW Pumped Storage Power Generator Unit (show in Fig. 8). Its primary parameters are as follows: the operating speed is 500 rpm, the rated power is 306 MW and the weight of the rotor is 600 t. The inner diameter of the stator and the outer diameter of the rotor of the generator are 4400 mm and 4311 mm, respectively. The diameter of the shaft is 1000 mm. The diameter of the runner is 3.985 m. The PSTA-I condition monitoring and fault diagnosis system used for this unit was designed by Tsinghua University, which can monitor 46 parameters (including vibration and non-vibration parameters) simultaneously. While there are many parameters, only the vibration-based data of the upper guide in the x-direction was used as an example. From the records, 145 sets of data are selected from within 145 days since December 27 1998, with only one set (512 points) each day. All the selected sets of data are sampled under the same

Fig. 7. Forecast results of the sunspots time series.

364

Y. He et al.

Fig. 9. Forecast results of PP value for hydroturbine machine.

Fig. 8. Guangzhou (China) 300 MW Pumped Storage Power Unit. 1. Power house structure; 2. Upper guide bearing; 3. Generator rotor; 4. Lower guide bearing; 5. Thrust bearing; 6. Turbine Guide bearing; 7. Spiral case; 8. Runner; 9. Draft tube.

operating parameters (pump work conditions, the same switch parameters, and in a power range of 300–350 MW). From these sets of data, a number of quantitative feature parameters such as the Peak-Peak (PP) value, Arithmetic Mean (AM) value and Crest Factor (CR) can be extracted. Since there are no references clearly indicating which feature parameter can be the most sensitive to changes in the condition of the hydroturbine, 40 commonly used feature parameters were employed in our work. The first 100 points of every feature parameter were used to train the network, and the remaining 45 points for testing. To make the results of various feature parameters comparable, all feature parameters are normalised as standard time series with zero mean and standard variance. For the sake of space limitations, the results of only the PP value are presented in this paper. All the parameters of the wavelet network and evolutionary algorithm are set the same as in Section 4.2, and after 150 generations through evolution by the proposed evolutionary algorithm, the wavelet network with 19 hidden nodes in the hidden layer is obtained. Figure 9 shows the forecast results, and a comparison of the forecast performance with the other methods is presented in Table 1. From the above experimental results, we can see

that a wavelet network constructed and trained by the proposed method can provide better performance not only for function approximation, but also for forecasting than the other models. These results show that the proposed evolutionary algorithm for constructing and training the network is feasible and effective. In addition, from Table 1 it can been seen that having more hidden nodes (wavelet function) does not imply better network performance. The reason should be that the wavelet function family we used to construct the network is redundant. If using such a redundant wavelet function family to construct the wavelet network, selecting the wavelet function (regressor) is crucial to the approximation performance of the network, and therefore is an optimisation problem with respect to the network performance. This fact also demonstrates that the proposed method is reasonable. Finally, it can be seen that the wavelet network gives a much better approximation and forecast performance than the BP network and time series model, the reasons for which are discussed in Section 1.

5. Conclusion Actually, the wavelet network is still the feedforward neural network, which utilises wavelets as the basis function of a FNN. Therefore, the wavelet network has the merit not only of the wavelet transform but also of neural networks. However, if the frame of the wavelet function family is used to construct the wavelet network, how to construct a sound network is a problem, as is how to train the network efficiently. These two optimisation problems have not yet been resolved satisfactorily. In this paper, an evolutionary algorithm is proposed to deal with such optimisation problems. In this evolution-

Algorithm for Constructing and Training Wavelet Networks

365

Table 1. Comparison of approximation and forecast performance*. Models

HWNN WNN BP Burg

Function approximation

Forecast of sunspots time series

Condition forecast of hydroturbine

Number of hidden node

RMS of approximate error

Number of hidden node

RMS of forecast error

Number of hidden node

RMS of forecast error

9 11 11 5 (Order)

0.435 0.505 1.318 3.845

24 29 29 11 (Order)

27.4562 38.3564 103.5678 168.3582

19 23 23 11 (Order)

0.0617 0.0839 0.1938 0.4245

*HWNN-The wavelet network obtained by the proposed method; WNN-The wavelet network in the literature [10]; BP-The BP network in the literature [10]; Burg-The AR mode by AIC criterion.

ary algorithm, the hierarchical chromosome, which includes the control layer chromosome and the parameter layer chromosome, is adopted to encode the network structure and parameters. Using such a hierarchical chromosome, the proposed algorithm combines GAs and EP to achieve network constructing and training simultaneously by evolution. By this method, the structure of the wavelet network can be more reasonable, and the local minimum problem in the training process will be overcome efficiently. Therefore, the wavelet network obtained will give a better approximation and forecasting performance. With respect to function approximation, sunspots time series forecasting and condition forecasting for a hydroturbine machine have been presented. The experimental results show that the proposed method for the construction and training of the wavelet network is feasible and effective. The study also indicates that the proposed method has the potential to solve a wide range of neural network construction and training problems in a systematic and robust way.

3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

Acknowledgements. This research is financially supported by the projects from the National Natural Science Foundation of China (Grant No. 50105007 and Grant No. 19990510).

13. 14. 15.

References 16. 1. Cao P, Cui D, Zhang Z. A real-time fault detection algorithm based on time series analysis for LRE. Journal of Propulsion Technology 1996; 17(1): 33–36 2. Tse PW, Atherton DP. Prediction of machine deterioration using vibration based fault trends and recurrent

17.

neural networks. Trans ASME, Journal of Vibration and Acoustics 1999; 121: 355–362 Park J, Sandberg IW. Universal approximation using radial-basis-function networks. Neural Computation 1991; 3(2): 246–257 Gabriela A. Applications of the approximation theory by neural networks. Neural Network World 2000; 10(5): 787–795 He Q, Jennie S, Daniel T. Prediction of top-oil temperature for transformers using neural networks. IEEE Trans Power Delivery 2000; 15(4): 1205–1211 Cybenko G. Approximation by superposition of a sigmoidal function. Mathematics of Control, Signals and Systems 1989; 2(1): 303–314 Chui CK. Wavelets: A Tutorial in Theory and Applications. New York: Academic, 1992 Delyon B, Juditsky A. Accuracy analysis for wavelet approximations. IEEE Trans Neural Networks 1995; 6(2): 332–348 Zhang J, Gilbert GW, Miao Y. Wavelet neural networks for function Learning. IEEE Trans Signal Processing 1995; 43(6): 1485–1496 Zhang Q, Benveniste A. Wavelet networks. IEEE Trans Neural Networks 1992; 3(6): 889–898 Zhang Q. Using wavelet network in nonparametric estimation. IEEE Trans Neural Networks 1997; 8(2): 227–236 Daubechies I. Ten lecture on wavelets. CBMS-NSF regional series in applied mathematics. Philadelphia, PA: SIAM, 1992 Mallat SG. Multiresolution approximation and wavelets orthonormal bases of L2 (R). Trans Am Math Soc 1989; 315(1): 69–81 Wong K, Leung AC. On-line successive synthesis of wavelet networks. Neural Processing Letters 1998; 7: 91–100 Kan K, Wong K. Self-construction algorithm for synthesis of wavelet networks. Electronic Letters 1998; 34(20): 1953–1955 Yao S, Wei CJ, He ZY. Evolving wavelet networks for function approximation. Electronics Letters 1996; 32(4): 360–361 He Y, Chu F, Zhong B. Artificial neural networks design and implementation based on evolutionary computation. Control and Decision 2001; 16(3): 257–262

366

18. Holland JH. Adaptation in Natural and Artificial Systems. Ann Arbor, MI: The University of Michigan Press, 1975 19. Fogel LJ. Artificial Intelligence through Simulated Evolution. New York: Wiley, 1996 20. Fogel DB. An introduction to simulated evolutionary optimisation. IEEE Trans Neural Networks 1994; 5(1): 3–14

Y. He et al.

21. Back T, Hammel U, Schwefel HP. Evolutionary computation comments on the history and current state. IEEE Trans Evolutionary Computation 1997; 1(4): 3–17 22. Kirkpatrick S, Gelatt CD, Vecchi MP. Optimisation by simulated annealing. Science 1983, 220: 671–680. 23. Pandit SM, Wu SM. Time Series and System Analysis with Applications. J Wiley, 1983