Forecasting Time Series by Bayesian Neural Networks

Forecasting Time Series by Bayesian Neural Networks Tieling Zhang, Arkira Fukushige HAL Corporation 6-21-17—701 Nishikasai, Edogawa-Ku, Tokyo 134-0088, Japan

Abstract — A brief review on application and development of neural networks to time series forecasting is presented, in which the main point is placed on Bayesian neural networks (BNNs) and Bayesian evolutionary neural trees (BENTs). Furthermore, the BENTs techniques are applied to time series forecast of daily balance of accounts of customers in a branch of a bank.

1. INTRODUCTION In real world, often is it necessary for us to forecast the future situations, states and developing tendencies of some events. To do this is beneficial for us to take effective actions and/or measures in order to gain as much as possible in the near future. Concerning financial time series, for example, understanding the dynamics of these processes to predict the direction and magnitude of change in these processes could lead to significant profit. Thus, time series analysis has attracted much research interest since the problem existed long ago. Many techniques have been developed [1]-[4]. These techniques include traditional statistical methods such as the TAR (threshold autoregressive) model [5], BoxJenkins method [6], genetic algorithms (GAs) and neural networks. Among them, neural networks are perhaps the most significant forecasting tool to be applied to the financial and business markets in recent years. The using practicability for economic forecast has already been demonstrated in a variety of applications, such as stock market and currency exchange rate prediction, market analysis and forecasting time series of political economy [7]-[11]. In the present paper, a brief review on application of neural networks to time series prediction is proposed, which involves feed-forward neural networks (NNs), recurrent NNs, neurofuzzy networks, evolutionary artificial NNs and neurowavelet networks, Bayesian NNs and Bayesian evolutionary neural trees etc. These are shown in Section 2. The Bayesian NNs and BENTs are mainly addressed in this paper, which are given in Section 3. BENTs models are developed with respect to forecasting the daily balance of customers’ accounts in a branch of a bank is discussed in Section 4. Section 5 is summarization of the paper. 2. A BRIEF REVIEW ON APPLICATION OF NEURAL NETWORKS TO TIME SERIES FORECAST The early used neural networks to time series analysis are feed-forward NNs as the one reported by White [12] in 1988. In [12], a two-layer neural network on a series of length 1000 of IBM stocks was applied. Rather than to obtain predictions, it was used to test the efficient market hypothesis. At that moment, there was not clear evidence that the neural networks could be the best model for financial time series.

0-7803-7278-6/02/$10.00 ©2002 IEEE

Later, Bosarge (1993) [13] suggested an expert system with a neural network at the core of forecasting. He found significant nonlinearities in different time series (S&P 500, Crude Oil, Yen/Dollar, Eurodollar, and Nikkei-index) and an expert system associated with a neural network was able to improve the quality of forecast considerably. Similar results have been also reported by Wong (1990) [14], Tsibouris and Zeidenberg (1995) [15], and some other researchers [16]. Neural networks have recently emerged as a successful tool in the fields of sequence recognition and prediction (see, e.g., [17], [18]). This is due to the versatility of a three-layer feed-forward neural network in approximating an arbitrary static nonlinearity, and the computational efficiency of the back-propagation algorithm [19]. In addition, the promise of neural networks comes from that they can emulate unanticipated features of the time series without requiring insight into the underlying mechanism so as to have ability to approximate functions. Moreover, neural networks have the direct relationship with classical statistical models, such as Box-Jenkins models [6], one of the most widespread classes of models for time series prediction. These characteristics provide the basic motivation for the use of neural networks in time series prediction. Feed-forward multilayer (perceptron) networks are used for time series prediction is straightforward because time series prediction is based on a series observations ending up at time t to seek for an accurate estimate at time (t+1). Clearly it is supposed that this process is predictable in some way. Normally, these observed values are fed into a designed network through the input layer and back-propagation algorithm is utilized to train the network. Feed-forward NNs can realize a finite impulse response but cannot store information for an indefinite time series. They learn memoryless transformation that captures the dependence of the current value of the series on the specified past observations. That is, they are powerful to deal with stationary time series. The method, however, may fail when the temporal contingencies span unknown intervals. In addition, feed-forward NNs are inadequate for handling a certain class of time series [19]. Thereupon, several modifications were made to overcome these limitations. Jordan-Elman architecture [18] is an example where a limited set of carefully chosen recurrent connections is used. These networks do not try to achieve credit assignment back through time but instead use the previous state as part of the current input. Such a simple approach may be seen as a natural extension of feed-forward networks. In order to make neural networks adaptive to more complex stochastic process, another class of neural networks as referred to recurrent networks in a broad net topology has been made fast development. Roughly speaking, recurrent

networks are networks with one or more cycles. The presence of cycles in a network leads naturally to an analysis of the network as a dynamic system, in which the state of the network at one moment in time depends on the state at the previous moment in time. In some cases, however, it is more natural to view the cycles as providing a specification of simultaneous constraints that the nodes of the network must satisfy and another point of view is that the cycles need not involve any analysis of time-varying behavior. These two points of view can in principle be reconciled by thinking of the constraints as specifying the equilibrium states of a dynamic system. References [20] and [21] gave an extensive overview on types of recurrences, time-windows and time delays in neural networks. By combining several types of feedback and delay, one can obtain the general multi-recurrent networks (MRN). For details concerning architectures, theoretical background of recurrent neural networks (RNNs) applied to time series predictions, one can refer to [19], [21]-[24]. Recurrent networks have an advantage over feed-forward nets in dealing with time-varying stochastic process. It is well known that the success of a neural architecture in solving a particular problem (or class of problem) critically depends on the network topology (or the structure of the phenotype in evolutionary design system). For example, a multilayer perceptron (MLP) can only learn an input-output mapping which is static though MLPs are frequently considered for time series analysis. Hence, a pure feed-forward neural network is incapable of discovering or responding to temporal dependencies in its environment; a recurrent network is needed for this task [25]. Though neural networks are more powerful than stochastic methods for time series prediction, their drawback is that design of an efficient architecture and the choice of the parameters involved require longer processing time. In fact, learning neural network weights can be considered as a hard optimization problem for which the learning time scales exponentially as the problem size grows [26]. That is becoming prohibitive. It seems then natural to devote attention to automatic procedures that exploit more powerful techniques for efficiently searching the space of network architectures [27]. After the type of architectures of a neural network is determined for a specified problem, to exploit an efficient learning method for training the network is important. Heuristic algorithm is often utilized to training process [28]-[29]. More recently, genetic algorithms (GAs) have been widely used to optimize neural networks [30]-[32]. In [33], the GAs and the simulated annealing are employed with the aim to optimize the neural network architecture and to train the network with a particular model of back-propagation. In [34], Breeder Genetic Algorithms (BGAs) [35]-[36] have demonstrated to be superior to GAs in designing neural networks for nonlinear system identification. In [27], BGAs are utilized to deal with the network topology optimization and the choice of the best technique to update the weights in the back-propagation training at the same time. Furthermore, BGAs are required to optimize the related parameters for

0-7803-7278-6/02/$10.00 ©2002 IEEE

time series prediction problem. Of course this procedure does not guarantee to find the optimal architecture, but it allows rapidly attaining one closing to the optimal performance. Many researchers are interested in evolving artificial neural networks (ANNs) by means of GAs. The combination between ANNs and evolutionary search procedures is referred to evolutionary ANNs (EANNs) in which evolution is another fundamental form of adaptation in addition to learning [31], [37]. A distinct feature of EANNs is their adaptability to a dynamic environment. That is, EANNs can adapt to an environment as well as change in the environment. In a broader sense, EANNs can be regarded as a general framework for adaptive systems, i.e., the systems that can change their architectures and learning rules appropriately without human intervention [37]. A general review in details on EANNs can be referred to [31] and [37]. Chen and Lu [38] introduced several application examples of EANNs in financial market. Particularly, a genetic adaptive neural network (GANN) is able to approximate, in a high accuracy, the complex and nonlinear option-pricing function used to produce the simulated option prices. More recently, time series prediction has very important practical applications in a diverse range of fields. As a result, there has been considerable interest in the application of intelligent technologies to time series analysis because of nonlinearity and nonstationary stochastic process of studied systems. These technologies include neural networks and fuzzy logic. Fuzzy reasoning is capable of handling imprecise and uncertain information while neural networks are capable of treating information involved in real plant data. The combination of the advantages of both fuzzy reasoning and neural networks is referred to neuro-fuzzy networks or fuzzy neural networks (FNNs). Fuzzy neural networks have demonstrated superior prediction capabilities as compared with the conventional neural networks [39]-[40]. Maguire, et al. [41]-[42] proposed a fuzzy neural network with three layers and a nine-input single-output structure where each input domain is portioned into two fuzzy sets. This architecture can be readily implemented on the Matlab neural network toolbox and trained using conventional back-propagation algorithms. Maguire, et al. [43] applied this network to chaotic time series prediction. The advantage of this network results from the approximation that the number of input fuzzy sets can represent the number of rules in a fuzzy system. A conventional fuzzy reasoning system, which has n inputs and where each input domain is partitioned into p fuzzy sets, requires pn rules. This relationship between the computational complexity and the problem dimensions is one of the drawbacks of a conventional fuzzy approach which is commonly referred to as the “course of dimensionality” [44]. Singh and Quek [45] developed a novel self-organizing fuzzy neural network based on the Yager Rule of Inference [46], named POP-Yager FNN. Its structure is a combination of two 3-layer feed-forward networks. The training for the POP-Yager FNN consists of two phases. The two-phase learning process effectively configures the network without any need for fine tuning. The POP-Yager network is unique in its ability to handle both crisp and fuzzy data effectively. It

is used to traffic flow prediction in Singapore city [45]. It seems that there are fewer practical examples for applying FNNs to time series predictions. Their adaptability and effectiveness to chaotic time series need to be studied. Moreover, in order to advance prediction accuracy of neural networks, wavelet transform techniques have begun to be applied to time series forecasting. In [47], the wavelet technique is combined with a Dynamical Recurrent Neural Network (DRNN). This can be referred to neuro-wavelet networks. First, wavelet transform is used to decompose the time series into varying scales of temporal resolution. The latter provides a sensible decomposition of the data so that the underlying temporal structures of the original time series become more tractable. Then, a DRNN is trained on each resolution scale with the temporal-recurrent back-propagation (TRBP) algorithm. By virtue of its internal dynamics, this general class of dynamically connected network approximates the underlying law governing each resolution level by a system of nonlinear differential equations. The individual wavelet scale forecasts are afterwards recombined to form the current estimate. The predictive ability of this approach is assessed with the sunspot series. Neural networks have demonstrated to be one of the most powerful tools for time series forecast. However, two main difficulties should be well dealt with: One is controlling the complexity of the model and the other is that the conventional neural network models are lack of tools for analyzing output results, e. g., confidence intervals, confidential level (95% or 5%) and quantiles. Bayesian approach provides consistent way to do inference by combining the evidence from data and prior knowledge from the problem. The Bayesian methods use probability to quantify uncertainty in inferences and the results of Bayesian learning is a probability distribution expressing our beliefs regarding how likely the different predictions are. Predictions are made by integrating over the posterior distribution. In case of insufficient data the prior dominates the solution, and the effect of the prior diminishes with increased evidence from the data [48]. The combination between Bayesian approach and neural networks is referred to Bayesian neural networks (BNNs). MacKay and Neal are two of the researchers who are foremost to introduce the Bayesian approach to training neural networks. MacKay [49] introduced Bayesian approach based on Gaussian approximation. Neal [50] adopted hybrid Monte Carlo method that facilitates Bayesian learning for neural networks with no approximations. The main advantages of Bayesian neural networks are [48]: • Automatic complexity control: Bayesian inference techniques allow the values of regularization coefficients to be selected using only the training data, without the need of using separate training and validation data. • Possibility to use prior information and hierarchical models for the hyper-parameters. • Predictive distributions for outputs. In the following section, BNNs as well as one specific type of them named Bayesian evolutionary neural trees are described with a little more details.

0-7803-7278-6/02/$10.00 ©2002 IEEE

3. BAYESIAN NEURAL NETWORKS 3.1 Bayesian Approach to Neural Networks Bayesian approach is based on Bayes’ Rule. The rule is used to learning and choice of architectures of neural networks. Consider a multivariate regression problem involving prediction of a noisy vector y of target variables given the values of a vector x of input variables. Bayesian learning starts from defining a model, and prior distribution P(θ) for the model parameters. The prior distribution expresses our initial beliefs about parameter values before any data is observed. After new data D = {(x(1), y(1)), …, (x(n), y(n))} are observed, the prior distribution will be updated to posterior distribution using Bayes’ rule P ( D θ ) ⋅ P(θ ) P(θ |D) = ∝ L(θ |D) P(θ) (1) P( D) where L(θ|D) is likelihood function of unknown model parameters to the observed data. In case of independent and exchangeable data points, the likelihood function is n

L(θ |D) =

∏ P(y

(i )

x (i ) , θ ) ,

(2)

i =1

where n is the number of data points. To predict the new output y(n+1) for the new input x(n+1), predictive distribution is obtained by integrating the predictions of the model with respect to the posterior distribution of the model parameters P(θ |D) = P ( y ( n +1) x ( n +1) , D )

∫

= P (y ( n +1) x ( n +1) , θ ) P (θ D) dθ Ω

(3)

where Ω is the space of all possible parameters. Note that predictive distribution for y(n+1) is implicitly conditioned on hypotheses that hold throughout and to be more explicit as the following [51]

∫

P(y(n+1)|D, H) = P (y ( n +1) x ( n +1) , θ , H ) P (θ D, H ) dθ, Ω

(4)

where H refers to the set of hypotheses or assumptions used to define the model. Practically, the posterior distribution for parameters in Eq. (4) is very complex and with many modes. As a result, evaluating the above integral is a difficult task. Neal [50] introduced Markov Chain Monte Carlo (MCMC) method to perform this kind of difficult integrations. Then the MCMC methods have been utilized by other authors [51]. In most cases, Bayesian techniques are utilized for MLP learning. However, another two kinds of BNNs deserve more attention: One is dynamic Bayesian NNs and the other Bayesian evolutionary neural trees (BENTs) in order for prediction of sophisticated time series. 3.2 Dynamic Bayesian Neural Networks Kjærulff [52] made a detailed description of dynamic Bayesian networks: Definition and theoretical basis. Many scientific fields (e.g. medical, economic, biological) involve repeated observations of a collection of random

quantities. For these domains, a dynamic model is rather preferred. In fact, the estimation of probability distributions of domain variables based on appropriate prior knowledge and observation of other domain variables is reliable only for a limited period of time, and further, upon arrival of new observations, both these and the old observations must be taken into account in the reasoning process. Thus, to cope with such dynamic systems using Bayesian networks needs to interconnect multiple instances of static networks. Obviously, as time evolves, new ‘slices’ must be added to the model and old ones cut off. This introduces the notion dynamic Bayesian networks (DBNs). In general, a dynamic model may be defined as a sequence of sub-models each representing the state of a dynamic system at a particular point or in time; henceforth, such a time instance will be referred to as a time slice. Thus, a DBN consists of a series of, most often structurally identical sub-networks interconnected by temporal relations. To make estimates of variables of a dynamic system in a way that makes full use of the information about past observations for the system requires a compact representation of this information [52]. A DBN can be schematically depicted in figure 1. The next step is predicted based on the current time window. Backward smoothing

t1

…

tN−1

Junction tree

Junction tree

∑ j

where yj are inputs to the ith neuron. Another useful neuron type is Pi unit that calculates the product of weighted inputs from its lower layer as neti = wij y j . (6)

∏ j

The output of a neuron is computed by sigmoid transfer function 1 yi = f(neti) = , (7) 1 + e − neti where neti is the net input to the unit computed by Eqs. (5) and (6). A typical neural tree is shown in figure 2, where F = {∑, ∏}, T = {x1, x2, …, x6}, and dmax = 5. ∑1

Time window tN

pN−1

p1

length from the root node to any terminal node of the tree. Each nonterminal node gets input signals from lower nodes and has a single output. It is calculated according to the neuron type of the nonterminal node. One of the most popular neuron types is sigma unit, which computes the sum of weighted inputs from the lower layer by neti = wij y j , (5)

w10

…

w20

Junction tree

w21

…

tN +1

w22

w40

+1

w41

x6

w23

w30 w31

∏4

x4

x3

+1 Forecasting

w42 w43

w50

x1

+1

x2

w60

Directed acyclic graph A DBN consists of a series of models p1, …, pN+1, where the model pN defines a time window [52]

3.3 Bayesian Evolutionary Neural Trees (BENTs) A neural tree is a net like a tree with branches, which is composed of nodes and weights of connecting links between two nodes. The architecture of a neural tree is shown in figure 2 where terminal nodes are those that do not have inputs, whereas nonterminal nodes are those with inputs as well as output. The nonterminal nodes represent neural units, each has a neural type. One neuron type is an element of the basis function set F = {neuron types}. Each terminal node is labeled with an element from the terminal set T = {x1, x2, …, xn}, where xi is the ith component of the external input vector x. Each link (j, i) represents a direct connection from j to node i, where node i is parent of node j and j is child of the node i. There is also a value wij which is associated with each link. In a neural tree, the root node is an output unit. The depth of a neural tree, dmax, is defined as the longest path

0-7803-7278-6/02/$10.00 ©2002 IEEE

x2

+1

w33

w32

∏5

+1

pN+1

Fig. 1

w14

∏3

x1

tN + w −1

pN

tN+1 + f −1

∑2

+1

w13

w12

w11

x4

x5 w51

w52

∑6

x1 w62

w61

x3

x5

Fig. 2 Architecture of a neural tree

3.3.1 Bayesian Inference Given a set of observed training data D, the posterior probability P(A|D) of each model A (a neural tree) can be calculated based on Eqs. (1)–(3) as follows: P(A|D) = P(D|A) ⋅P(A)Ô P(D) (8) where P(A) is the prior probability of the model, P(D|A) is the likelihood of the model for the data, and P(D) is computed as

∫

P(D) = P( D | A) P( A)dA .

(9)

With this posterior probability, the expected value of output for the unknown data x can be calculated as E[fA(x)] =

∫

Ω

f A (x) P ( A | D )dA ,

(10)

where fA is the function implemented by a model A, Ω is the space of all possible model A. In applications, however, it is not easy to evaluate the integration in Eq. (10) as discussed in

Section 3.1 and numerical calculation is impossible especially for high dimensional function fA. This value can be approximated by the Bayesian evolutionary algorithms.

Set g ← g+1, and continue with step 2.

3.3.2 Bayesian Evolution of Neural Trees Zhang et al. [53]-[54] proposed evolutionary neural trees by Bayesian evolutionary algorithms. The algorithms start with an initial population of individuals and iteratively produce the next generation of fitter individuals. The performance of individuals is measured by a fitness function. Usually, the fitness is measured in terms of a set of fitness cases or the training data D = {(x c , y c )}cN=1 . (11) And the individual can be considered as a model A that can describe many time series problems by the following input-output mapping yc = f(xc) + ε(xc), (12) where the noise ε(xc) is assumed to be zero-mean Gaussian. The next generation of individuals is then selected according to their fitness values on the data D. New generations are produced repeatedly till the termination condition is satisfied. The above evolutionary procedure can be outlined in the following steps: 1) Generate an initial population A(0) of γ individuals from the prior distribution of models, that is, e.g., each number of nodes ki is produced by the Poisson distribution and wi is set by the Gaussian distribution. Set g←1. 2) Estimate the likelihood of each individual in the population. In generation g, the error of Ei(g) of neural trees is evaluated as follows:

We design BENT models for time series forecast of balance of accounts of customers in a branch of a bank. The purpose is to forecast the minimum amount of cash reserved for a period of time in the future in order to satisfy withdrawals of the customers at any time without default. Data were selected based on record everyday. The first 300 data points are used to evolve neural trees and the remaining 200 data points are applied to testing the predictive accuracy. The BENT models are considered as follows: A nonterminal node has input branches less than 5, the input size is set 25, the standard deviation of noise is 0.05, the depth of a tree is limited up to 6, and the maximum number of evaluations is 107. Each candidate population is regulated 1.2 times of the parent. Because the work is going forward, the above parameters and other ones are subject to changes due to the accuracy required. The details of the work are presented in the symposium and will be given in a separate paper.

N

Ei(g) =

∑(y c =1

c

− f Ai (x c )) ,

(13)

where N represents there are maximum N input vectors. This value is used to calculate the likelihood of each individual in the population. 3) Calculate the posterior distribution of Ai in the population A(g) by Eq. (8) and update the best model as g g −1 Amap = max{ Amap , arg max {P(Ai |D)}}. (14) A i ∈A(g)

4) Generate λ offspring into a population A ′(g) by sampling from the expected offspring distribution P( Ai′ |D) = ∑ P(Ai |D) P( Ai′ | Ai), (15) A i ∈A(g) where the transition probability P( Ai′ | Ai) is determined by probabilities of crossover, replacement and mutation. 5) Select γ individuals from A ′(g) into the next generation with acceptance probability Pa (Ai, Ai′ ) = min{1, P( Ai′ |D)ÔP(Ai |D)}. (16) With a neural tree is selected, weights of the tree are adjusted in stochastic hill-climbing. All components of weight vector w are changed just one time at random with the expression wj′ = wj + N(0, 1), j=1, 2, …, ki−1, ki is the number of nodes in tree Ai and N(0, 1) is the standard normal distribution. 6) If the termination condition is satisfied, return the optimal model AMAP and the posterior distribution P(AMAP|D). Otherwise, update the priors with the current distributions.

0-7803-7278-6/02/$10.00 ©2002 IEEE

4. APPLICATION OF BENTs

5. SUMMARY A superficial review on application of neural networks to time series forecast is presented. Some other features are not expounded but they are also important for the problems involved in here. Bayesian evolutionary NNs are promising techniques that will be further developed and widely used for time series forecast. This is due to most time series concerned now are of complex and nonstationary stochastic process. DBNNs and BENTs involve hybrid Monte Carlo methods that need powerful computing system. In recent years, concurrent computation techniques have been advanced much. Taking advantage of the power of an MPP (massively parallel processing) supercomputer, exploratory data analysis on time series problems can be performed in a quick and efficient manner. References [1]

[2] [3] [4]

[5] [6]

[7]

[8]

S. E. Makridakis, “The accuracy of extrapolation (time series) methods: Results of a forecasting competition,” Journal of Forecasting, Vol. 1, pp. 111-153, 1982. E. Chatfield, The Analysis of Time Series, Chapman and Hall, New York, fourth edition, 1991. M. Berthold and D. Hand (eds.), Intelligent Data Analysis: An Introduction, Springer, 1999. 50. A. S. Weigend and N. A. Gershenfeld (eds.), Time Series Prediction: Predicting the Future and Understanding the Past, Addison-Wesley, Redwood City, CA, 1994. H. Tong and K. Lim, “Threshold autoregression, limit cycle and cyclical data,” Journal of Royal Statistical Society B, 42:245, 1980. G. E. P. Box, G. M. Jenkins and G. C. Reinsel, Time series Analysis: Forecasting and Control (third edition), Englewood Cliffs, NJ: Prentice Hall, 1994. F. M. Thiesing and O. Vornberger, “Sales Forecasting Using Neural Networks,” Proceedings ICNN’97, Houston, Texas, 9-12 June 1997, vol. 4, pp. 2125-2128, IEEE. F. M. Thiesing, U. Middelberg and O. Vornberger, “A Neural Network Approach For Predicting The Sale Of Articles In Supermarkets,” Eufit’95, Third European Congress on Intelligent

[9]

[10]

[11]

[12]

[13]

[14] [15]

[16]

[17]

[18]

[19]

[20]

[21]

[22] [23]

[24]

[25]

[26] [27]

[28]

[29] [30]

Techniques and Soft Computing, Aachen, Germany, 28-31 Aug 1995. A. Refenes, M. Azema-Barac, L. Chen and S. Karoussos, “Currency exchange rate prediction and neural network design strategies,” Neural Computing & Applications, 1(1):46-58, 1993. C. Haefke and C. Helmenstein, “Neural Networks in the Capital Markets: An Application to Index Forecasting,” Computational Economics, Vol. 9, pp. 37-50, 1996. I. Jagielska and J. Jaworski, “Neural network for predicting the performance of credit card accounts,” Computational Economics, Vol. 9, pp. 77-82, 1996. H. White, “Economic Prediction Using Neural Networks: The Case of IBM Daily Stock Returns,” Proceedings of the IEEE International Conference on Neural Networks II, pp. 451-458, 1988. W. E. Bosarge, “Adaptive Processes to Exploit the Nonlinear Structure of Financial Market,” R. R. Trippi and E. Turban (eds.): Neural Networks in Finance and Investing, Probus Publishing, pp. 371-402, 1993. F. S. Wong, “Time Series Forecasting Using Backpropagation Neural Networks,” Neuralcomputing, Vol. 2, pp. 147-159, 1990. G. Tsibouris, and M. Zeidenberg, “Testing the Efficient Markets Hypotheses with Gradient Descent Algorithms,” A. P. Refenes (ed.): Neural Networks in the Capital Markets, Wiley, pp. 127-136, 1995. R. Herbrich, M. Keilbach, T. Graepel, P. Bollmann-Sdorra, K. Obermayer, “Neural Networks in Economics: Background, Applications and New Developments,” Advances in Computational Economics, 11:169-196, 1999. A. Aussem, F. Murtagh and M. Sarazin, “Dynamic recurrent neural networks and pattern recognition methods for time series prediction: Application to seeing and temperature forecasting in the context of ESO’s VLT Astronomical Weather Station,” Vistas in Astronomy: Special Issue on Neural Networks, 1994. M. C. Mozer, “Neural net architectures for temporal sequence processing,” A. S. Weigend and N. A. Gershenfeld (eds.), Time Series Prediction: Forecasting the Future and Understanding the Past (SFFI Studies in the Sciences of Complexity), Addison-Wesley, Redwood City, CA, pp. 243-264, 1993. A. Aussem, F. Murtagh, M. Sarazin “Dynamical recurrent neural networks – Towards environmental time series prediction,” Int. J. of Neural Systems, Vol. 6, no. 2, pp.145-170, 1995. C. Ulbricht, “Multi-Recurrent Networks for Traffic Forecasting,” Proc. of the Twelfth National Conference on Artificial Intelligence, AAAI Press/MIT Press, Cambridge, MA, pp. 883-888, 1994. C. Ulbricht, “State Formation in Neural Networks for Handling Temporal Information,” Dissertation, Institut fuer Med.Kybernetik u. AI, Univ. Vienna, 1995. G. Dorffner, “Neural Networks for Time Series Processing,” Neural Network World, 6(4), pp. 447-468, 1996. T. Koskela, M. Varsta, J. Heikkonen, and K. Kaski, “Time series prediction using recurrent SOM with local linear models,” Int. J. of Knowledge-Based Intelligent Eng’g Systems, 2(1):60-68, 1998. M. Bianchini, M. Gori, “Optimal learning in artificial neural networks: A review of theoretical results,” Neuralcomputing, Vol. 13, pp. 313-346, 1996. K. Balakrishnan and V. Honavar, “Evolutionary design of neural architectures — A preliminary taxonomy and guide to literature,” Technical Report CS TR 9501, Department of Computer Science, Iowa State University, Ames, IA 50011, January 1995. D. E. Rumelhart and J. L. McLelland, Parallel Distributed Processing, MIT Press, 1986. I. De Falco, A. Della Cioppa, A. Iazzetta, P. Natale and E. Tarantino, “Optimizing Neural Networks for Time Series Prediction,” Third On-line World Conference on Soft Computing in Engineering Design and Manufacturing (WSC3), 21-30 Jun 1998, Hosted on the Internet. J. Hiestermann, “Learning in neural nets by genetic algorithms,” Parallel Processing in Neural Networks and Computers, North-Holland, pp. 165-168, 1990. R. Battiti, G. Tecchiolli, “Training neural nets with reactive tabu search,” IEEE Trans. on Neural Networks, 6 (5): 1185-1200, 1995. J. D. Shaffer, D. Whitley and L. J. Eshelman, “Combination of genetic Algorithms and Neural Networks: A survey of the State of the Art,” J. D. Shaffer, D. Whitley (eds.), Combination of Genetic Algorithms and Neural Networks, pp. 1-37, 1992.

0-7803-7278-6/02/$10.00 ©2002 IEEE

[31] [32]

[33]

[34]

[35]

[36]

[37] [38]

[39]

[40] [41]

[42]

[43]

[44] [45]

[46]

[47]

[48]

[49] [50] [51]

[52]

[53]

[54]

X. Yao, “A Review of Evolutionary Artificial Neural Networks,” Int. J. Intelligent Systems, 8 (4), pp. 539-567. 1993. K. O. Stanley and R. Miikkulainen, “Evolving Neural Networks through Augmenting Topologies,” Tech. Rep. TR-AI-01-290, Dep’t of Computer Sciences, The Univ. of Texas at Austin, June 28, 2001. S. Stepniewski, Keane A. J. Newblock, “Pruning back propagation neural networks using modern stochastic optimization techniques,” Neural Computing & Applications, 1996. I. De Falco, A. Della Cioppa, P. Natale and E. Tarantino, “Artificial neural networks optimization by means of evolutionary algorithms,” Soft Computing in Engineering Design and Manufacturing, Springer-Verlag, London, 1997. H. Mühlenbein and D. Schlierkamp-Voosen, “Predictive models for the breeder genetic algorithm I. Continuous parameter optimization,” Evolutionary Computation, 1 (1): 25-49, 1993. H. Mühlenbein and D. Schlierkamp-Voosen, “The science of breeding and its application to the breeder genetic algorithm (BGA),” Evolutionary Computation, 1 (4): 335-360, 1993. X. Yao, “Evolutionary Artificial Neural Networks,” Proceedings of the IEEE, 87 (9), pp. 1423-1447, September 1999. S.-H. Chen and C.-F. Lu, “Would Evolutionary Computation Help in Designs of Artificial Neural Nets in Forecasting Financial Time Series?” Proc. of the 1999 Congress on Evolutionary Computation (CEC’99), Vol. 1, IEEE Press, pp. 267-274, 1999. Roger J. S. Jang, “Predicting chaotic time series with IF-THEN rules,” Proc. IEEE 2nd Int. Conf. on Fuzzy Systems, pp. 1079-1084, Vol. 2, 1993. Roger J. S. Jang, C.-T. Sun, “Neuro-fuzzy modeling and control,” Proc. of the IEEE, Vol. 83, no. 3, pp. 378-406, March 1995. L. P. Maguire and J. G. Campbell, “Fuzzy reasoning using a three layer neural network,” Proc. Int. Fuzzy Systems Association Conf., Brazil, Vol. 2, pp. 627-631, 1995. L. P. Maguire, T. M. McGinnity, A. A. Hashim and J. G. Campbell, “On-Line Identification in Control Systems Using a Fuzzy Neural Network,” EUFIT’96: The 4th European Congress on Intelligent Techniques and Soft Computing, Germany, September 1996. L. P. Maguire, B. Roche, T. M. McGinnity and L. J. McDaid, “Predicting chaotic time series using a fuzzy neural network,” Information Sciences, Vol. 112, no. 1-4, pp. 125-136, 1998. M. Brown and C. Harris, “Neurofuzzy adaptive modeling and control,” Prentice-Hall, 1994. A Singh and C. Quek, “POP-Yager: A novel self-organizing fuzzy neural network based on the Yager inference,” September 1999 at http://www.ntu.edu.sg/sce/Link00/ss00-7.pdf J. M. Keller, R. R. Yager and H. Tahani, “Neural network implementation of fuzzy logic,” Fuzzy Sets and Systems, 45 (1): 1-12, 1992. A. Aussem and F. Murtagh, “Combining neural network forecasts on wavelet-transformed time series,” Connection Science—Special Issue on Combining Neural Nets, Vol. 9, no. 9, pp. 113-121, 1997. J. Lampinen and A. Vehtari, “Bayesian Techniques for Neural Networks – Review and Case Studies,” Proc. of EUSIPCO 2000, the Xth European Signal Processing Conference, Tampere, Finland, Sept. 5-8, 2000. David J. C. MacKay, “A practical Bayesian framework for backpropagation networks,” Neural Computation, 4 (3): 448-472, 1992. R. M. Neal, “Bayesian Learning for Neural Networks,” Vol. 118 of Lecture Notes in Statistics, Springer-Verlag, July 1996. A. Vehtari and J. Lampinen, “Bayesian Neural Networks for Industrial Applications,” 1999 IEEE Midnight-Sun Workshop on Soft Computing Methods in Industrial Applications, Kuusamo, Finland, June 16-18, 1999. U. Kjærulff, “A computational scheme for dynamic Bayesian networks,” Report R-93-2018, June 1993, Institute For Electronic Systems, Aalborg University, DK-9220 Alborge, Denmark. B.-T. Zhang and J.-G. Joung, “Time series prediction using committee machines of evolutionary neural trees,” The 1999 Cong. on Evolutionary Computation (CEC99), Washington D.C., 1999. D.-Y. Cho and B.-T. Zhang, “Bayesian evolutionary algorithms for evolving neural tree methods of time series data,” Proceedings of the 2000 Congress on Evolutionary Computation (CEC00), Vol. 2, pp. 1451-1458, 2000.