neural networks for time-series forecasting - Springer Link

11 downloads 307 Views 2MB Size Report
Overfitting is a major concern in the design of a neural network, especially for small data sets. When the ... researchers was the effort expended to develop custom software. We spent .... Cambridge, U. K.: Cambridge University Press. Refenes ...
NEURAL NETWORKS FOR

TIME-SERIES FORECASTING

William Remus Department ofDecision Science, University ofHawaii

Marcus O'Connor School ofInformation Systems, University ofNew South Wales

ABSTRACT Neural networks perfonn best when used for (1) monthly and quarterly time series, (2) discontinuous series, and (3) forecasts that are several periods out on the forecast horizon. Neural networks require the same good practices associated with developing traditional forecasting models, plus they introduce new complexities. We recommend cleaning data (including handling outliers), scaling and deseasonalizing the data, building plausible neural network models, pruning the neural networks, avoiding overfitting, and good implementation strategies. Keywords: Discontinuities, forecasting, neural networks, principles, seasonality.

Research has given us many methods for forecasting, many of which rely on statistical techniques. Since 1980, much research has focused on detennining the conditions under which various methods perform the best (Makridakis et al. 1982; Makridakis et al. 1993). In general, no single method dominates all other methods, but simple and parsimonious methods seem to perform best in many of the competitive studies. In the early 1980s, researchers proposed a new forecasting methodology to forecast time series, neural networks. We provide principles for the use and estimation of neural networks for time-series forecasting and review support for their merits which varies from mathematical proofs to empirical comparisons.

USING NEURAL NETWORKS Neural networks are mathematical models inspired by the functioning of biological neurons. There are many neural network models. In some cases, these models correspond closely to J. S. Armstrong (ed.), Principles of Forecasting © Springer Science+Business Media New York 2001

246

PRINCIPLES OF FORECASTING

biological neurons, and in other cases, the models depart from biological functioning in significant ways. The most prominent, back propagation, is estimated to be used in over 80 percent of the applications of neural networks (Kaastra and Boyd 1996); this model is explained in the Appendix. Rumelhart and McClelland (1986) discuss most of the neural network models in detail. Given sufficient data, neural networks are well suited to the task of forecasting. They excel at pattern recognition and forecasting from pattern clusters. The key issue to be addressed is in which situations do neural networks perform better than traditional models. Researchers suggest that neural networks have several advantages over traditional statistical methods. Neural networks have been mathematically shown to be universal approximators of functions (Cybenko 1989; Funahashi 1989; Hornik, Stinchcombe and White 1989) and their derivatives (White, Hornik and Stinchcombe 1992). This means that neural networks can approximate whatever functional form best characterizes the time series. While this universal approximation property offers little value if the functional form is simple (e.g., linear), it allows neural networks to model better forecasting data with complex underlying functional forms. For example, in a simulation study, Dorsey and Sen (1998) found that neural networks gave comparable levels of model fit to properly specified polynomial regression models. Neural networks, however, did much better when the polynomial form of a series was not known. Theoretically, neural networks should be able to model data as well as traditional statistical methods because neural networks can approximate traditional statistical methods. For example, neural networks have been shown to approximate ordinary least squares and nonlinear least-squares regression (White 1992b, White and Stinchcombe 1992), nonparametric regression (White 1992a), and Fourier series analysis (White and Gallant 1992). Neural networks are inherently nonlinear (Rumelhart and McClelland 1986; Wasserman 1989). That means that they estimate nonlinear functions well (White 1992a, 1992b; White and Gallant 1992; White and Stinchcombe 1992). Neural networks can partition the sample space and build different functions in different portions of that space. The neural network model for the Boolean exclusive OR function is a good example of such a model (Wasserman 1989, pp. 30--33). Thus, neural networks have a capability for building piecewise nonlinear models, such as forecasting models that incorporate discontinuities. It might seem that because of their universal approximation properties neural networks should supercede the traditional forecasting techniques. That is not true for several reasons. First, universal approximation on a data set does not necessarily lead to good out-of-sample forecasts (Armstrong 2001). Second, if the data fit the assumptions of a traditional model, generally the traditional model will be easier to develop and use. Thus, while neural networks seem a promising alternative to traditional forecasting models, we need to examine the empiricalliterature on their forecasting performance. Researchers have compared point-estimate forecasts from neural networks and traditional time series techniques (neural networks provide point-estimate forecasts but not prediction intervals). Sharda and Pati1 (1990) used 75 of III time-series subset of the M-Competition data and found that neural network models were as accurate as the automatic Box-Jenkins (Autobox) procedure. The 36 deleted series did not contain enough data to estimate either of the models. Foster, Collopy, and Ungar (1992) also used the M-Competition data. They found neural networks to be inferior to Holt's, Brown's, and the least-squares statistical models for

a

Neural Networks for Time-Series Forecasting

247

time series· of yearly data,. but comparable with quarterly data; they did not compare the models on monthly data. Kang (1991) compared neural networks and Autobox on the 50 M-Competition series. Overall, Kang found Autobox to have superior or equivalent mean absolute percentage error (MAPE) to that for 18 different neural network architectures. In addition Kang compared the 18 neural network architectures and Autobox models on seven sets of simulated time-series patterns. Kang found the MAPE for the 18 neural network architectures was superior when the data included trend and seasonal patterns. Kang also found that neural networks often performed better when predicting points on the forecasting horizon beyond the first few periods ahead. These results are mixed; thus, we were inspired to attempt a more comprehensive comparison of neural networks and traditional models (Hill, O'Connor and Remus 1996). The traditional models we considered were Box-Jenkins and deseasonalized exponential smoothing. Deseasonalized exponential smoothing was found to be one of the most accurate methods and Box-Jenkins a bit less accurate in the two major comparative studies of traditional forecasting methods (Makridakis et al. 1982, Makridakis et al. 1993). In addition, we used the method based on combining the forecasts from six other methods from the first competition and a naive model. The data was a systematic sample of the Makridakis et al. (1982) competition data. We standardized many other procedural differences between the earlier studies discussed. Exhibit 1 shows the results from Hill, O'Connor and Remus (1996), the MAPE for the neural networks and several other reference methods. They were calculated on the holdout data sets from the Makridakis et al. (1982) competition; the forecast horizons are as in the competition. Exhibit 1 MAPE for neural networks and other reference methods (number of series) Annual (16)

Quarterly (19)

Monthly (63)

Neural networks

14.2

15.3

13.6

Deseasonalized exponential smoothing

15.9

18.7

15.2

Box-Jenkins

15.7

20.6

16.4

Judgment

12.5

20.5

16.3

Combined methods

12.6

21.2

16.7

• Neural networks may be as accurate or more accurate than traditional forecasting methods for monthly and quarterly time series. Neural networks may be better than traditional forecasting methods for monthly and quarterly time series. The M-Competition data contained annual data, quarterly data, and monthly series; thus, the models were compared across the data period used. Foster, Collopy and Un-

248

PRINCIPLES OF FORECASTING

gar (1992) found neural networks to be inferior to traditional models for annual data but comparable for quarterly data; they did not compare the models on monthly data. We found that neural networks outperformed the traditional models (including BoxJenkins) in forecasting monthly and quarterly data series; however, they were not superior to traditional models with annual series (Hill, O'Connor and Remus, 1996) (see Exhibit 1). • Neunl networks may be better than traditional extnpolative forecasting methods for diseontinuous series and often are as good as tnditional forecasting methods in other situations.

Some of the M-Competition series had nonlinearities and discontinuities in the modelestimation data (Armstrong and Collopy 1992; Carbone and Makridakis 1986; Collopy and Armstrong 1992; Hill, O'Connor and Remus 1996). For example, in the monthly series used by Hill, O'Connor and Remus (1996), only 57 percent of the series were linear; the remaining 43 percent included nonlinearities or discontinuities or both. We compared the effectiveness of the forecasting models with linear, nonlinear, and discontinuous series. Hill, O'Connor and Remus (1996) found that nonlinearities and discontinuities in the model estimation data affected the forecasting accuracy of the neural networks. In particular, although neural networks performed well overall for all monthly series, they seemed to perform better in series with discontinuities in estimation data. • Neural networks are better than traditional extrapolative forecasting methods for long-term forecast horizons but are often no better than traditional forecasting methods for shorter forecast horizons.

Some models, such as exponential smoothing, are recommended for short-term forecasting, while regression models are often recommended for long-term forecasting. Sharda and Patil (1992) and Tang, de Almeida and Fishwick (1990) found that for time series with a long history, neural network models and Box-Jenkins models produced comparable results. Hill, O'Connor and Remus (1996) compared neural network models with the traditional models across the 18 periods in the forecast horizon. The neural network model generally performed better than traditional models in the later periods of the forecast horizon; these findings are consistent with Kang's (1991). In a simulation study, Dorsey and Sen (1998) also found that neural networks strongly dominated polynomial regression models in the later periods of the forecast horizon when estimating series with polynomial features. • To estimate the panmeters chancterizing neunl networks, many observations may be required. Thus, simpler traditional models (e.g., exponential smoothing) may be preferred for smaU data sets.

Many observations are often required to estimate neural networks. Particularly in the quarterly and monthly M-Competition series, the number of observations for model estimation varied widely. In many cases, there may not be enough observations to estimate the model (Sharda and Patil 1990). The reason for this is simple; neural networks have more parameters to estimate than most traditional time-series forecasting models.

Neural Networks for Time-Series Forecasting

249

ESTIMATING NEURAL NETWORKS We adapted our principles for estimating neural networks from Armstrong's principles for estimating forecasting models (2001) and from results specific to neural networks. All of the general principles Armstrong presented are apply to neural networks. The following principles are of critical importance:

• Clean the data prior to estimating the neural network model. Data should be inspected for outliers prior to model building. This principle applies equally to neural networks and other forecasting models (Refenes 1995, pp. 56--60). Outliers make it difficult for neural networks to model the true underlying functional form.

• Scale and deseasonalize data prior to estimating the model. Scale the data prior to estimating the model to help the neural network learn the patterns in the data (Kaastra and Boyd 1996). As Hill, O'Connor and Remus (1996) did, modelers usually scale data between values of plus one and minus one. As in regression modeling, other transformations are occasionally applied to facilitate the modeling; Kaastra and Boyd (1996) give several examples. Often, a time series contains significant seasonality and deseasonalizing the data prior to forecasting model estimation is the standard approach. Wheelwright and Makridakis (1985) found that prior deseasonalization improved the accuracy of traditional statistical forecasting methods for the M-Competition quarterly and monthly data. Deseasonalization is commonly done with neural networks. Hill, O'Connor and Remus (1996) statistically deseasonalized their time series prior to applying the technique. Is deseasonalization necessary or can neural networks model the seasonality that is likely to be present in a time series? Given that neural networks have been shown to be universal approximators of functions (Cybenko 1989), it seems reasonable to expect them to be able to model the patterns of seasonality in a time series. On the other hand, Kolarik and Rudorfer (1994) found neural networks had difficulty modeling seasonal patterns in time series. Nelson et al. (1999) used data from the M-Competition to investigate the ability of neural networks to model the seasonality in the series. They partitioned a systematic sample of 64 monthly series into two subsets based on the Makridakis et al. (1982) assessment of the existence of seasonality in those series. In those series with seasonality (n = 49), the MAPE for neural networks based on deseasonalized data (12.3%) was significantly more accurate than neural networks based on nondeseasonalized data (15.4%). In those series without seasonality (n = 15), the MAPE for neural networks based on deseasonalized data (16.9%) was not significantly more accurate than neural networks based on nondeseasonalized data (16.4%). Nelson et al. (1999) also performed post-hoc testing to establish that the above findings are valid across the functional form of the time series, the number of historical data points, and the periods in the forecast horizon. These results suggest that neural networks may benefit from deseasonalizing data just as statistical methods do (Wheelwright and Makridakis 1985, p. 275).

• Use appropriate methods to choose the right starting point. The most commonly used estimation method for neural networks, backpropagation, is basically a gradient descent of a nonlinear error, cost, or profit surface. This means that

250

PRINCIPLES OF FORECASTING

finding the best starting point weights for the descent is crucial to getting to the global optimal and avoiding local optimal points; this has been noted by many researchers including, most recently, Faraway and Chatfield (1998). Typically, researchers choose the neural network starting point weights randomly. It is much better to choose an algorithm to help one find good starting points. As shown by Marquez (1992), one such method is the downhill simplex method of Neider and Mead (1965); the necessary computer code can be found in Press et al. (1988). • Use speeiaUzed methods to avoid loeal optima.

When estimating neural network models, it is possible to end up at a local optimum or not to converge to an optimum at all. One can use many techniques to avoid these problems. Our preference is the downhill simplex method of Neider and Mead (1965) for overcoming these problems; Marquez (1992) gives an example of its use. Thus, one can use the downhill simplex method both initially and to local optimum. Researchers have suggested many other methods to deal with this problem, including using a momentum term in gradient descent rule (Rumelhart and McClelland 1986), using genetic algorithms (Sexton, Dorsey and Johnson 1998), local fitting of the network (Sanzogni and Vaccaro 1993), and using a dynamically adjusted learning rate (Marquez 1992). This principle and the previous one deal with problems associated with any gradient descent algorithm (e.g., back propagation). Some researchers prefer to use nonlinear programming algorithms to try to avoid these problems. Eventually, such an algorithm will replace the currently popular back-propagation algorithm. • Expand the network until there is no significant improvement in fit.

As noted by many researchers, including most recently Faraway and Chatfield (1998), a lot of the art of building a successful model is selecting a good neural-network design. Since it has been shown mathematically that only one hidden layer is necessary to model a network to fit any function optimally (Funahashi 1989), we generally use only one hidden layer. If the network has n input nodes, Hecht-Nelson (1989) has mathematically shown that there need be no more than 2n+ 1 hidden layer nodes. To select the number of input nodes in time-series forecasting, we generally start with at least as many input nodes as there are periods in one cycle of the time series (e.g., at least 12 for monthly data). We then expand the network by incrementally increasing the number of input nodes until there is no improvement in fit. Then we prune the network back. This is the easiest way to build the neural-network model while avoiding overfitting. It is also common to start with a large network and reduce it to an appropriate size using the pruning methods (Kaastra and Boyd 1996 discuss this approach). If one identifies a clear lag structure using traditional means, one can use the structure to set the number of nodes. Hill, O'Connor and Remus (1996) used one output node to make a forecast; they used this forecast value to create another forecast further into the future. They did this iteratively (as in the Box-Jenkins model), this is often called a moving-window approach. Zhang, Patuwo and Hu (1998) make compelling arguments for developing neural-network models that forecast several periods ahead simultaneously. Hill, O'Connor and Remus (1996) initially used the simultaneous forecasting method but changed to the iterative method to avoid overfitting problems. We suspect that many forecasters face similar problems that will lead them to use

Neural Networks for Time-Series Forecasting

251

network structures like those used by Hill, O'Connor and Remus (1996). When there is no overfitting problem, the capability to generate multiple forecasts may be useful. • Use pruning techniques when estimating neural networks and use holdout samples when evaluating neural networks. Overfitting is a major concern in the design of a neural network, especially for small data sets. When the number of parameters in a network is too large relative to the size of the estimation data set, the neural network tends to "memorize" the data rather than to "generalize" from it The risk of overfitting grows with the size of the neural network. Thus, one way to avoid overfitting is to keep the neural network small. In general, it is useful to start with one hidden layer using at least as many input nodes as are in one seasonal cycle; there are mathematical proofs to show no fitting advantage from using more than one hidden layer. If the seasonal cycles are not stable, one can increase the starting number of input nodes. Then one prunes the network to a small size. For example, Marquez (1992) used Seitsma and Dow's (1991) indicators to determine where in the network to prune and then pruned the network using the methods of Weigend, Hubermann and Rumelhart (1990). Even small neural networks can often be reduced in size. For example, if a neural network has four input nodes, three intermediate nodes, and one output node, the fully connected network would have 23 parameters; many more than 23 observations would be needed to avoid overfitting. Larger networks would require hundreds of data points to avoid overfitting. Refenes (1995, pp. 28, 33-54) discusses details of pruning and alternative approaches. One needs holdout (out-of-sample) data to compare models. Should any overfitting have occurred, the comparative measures of fit on the holdout data would not be over estimated since overfitting affects only measures based on the estimation sample. • Obtain software that has built-in features to address the previously described problems. The highest cost to Hill, O'Connor and Remus (1996) and to many other neural network researchers was the effort expended to develop custom software. We spent many hours building the software and developing procedures to make the forecasts. Fortunately most of these problems are now solved in off-the-shelf neural-network software packages. The capabilities of the packages are always improving, so one should consult recent reviews of the major packages. In looking over software specifications, look for built-in support procedures, such as procedures for finding good start points, avoiding local optimum, performing pruning, and simplifying neural networks. • Build plausible neural networks to gain model acceptance. Neural networks suffer from the major handicap that their forecasts seem to come from a black box. That is, examining the model parameters often does not reveal why the model made good predictions. This makes neural-network models hard to understand and difficult for some managers to accept. Some work has been done to make these models more understandable. For example, Benitez, Castro and Requena (1997) have mathematically shown that neural networks can be thought of as rule-based systems. However, the best approach is to carefully reduce the net-

252

PRINCIPLES OF FORECASTING

work size so that resulting network structures are causally plausible and interpretable. This requires selecting good software to support the model estimation.

• Use three approaches to ensure that the neural-network model is valid. Adya and Collopy (1998) describe three validation criteria: (1) comparing the neural network forecasts to the forecasts of other well-accepted reference models, (2) comparing the neural network and traditional forecasts' ex ante (out-of-sample) performance, and (3) making enough forecasts to draw inferences (they suggest 40 forecasts). Armstrong (2001) gives more details . Because neural networks are prone to overfitting, one must always validate the neural network models using at least these three validation criteria. Neural-network researchers often partition their data into three parts rather than just two. One portion is for model estimation, the second portion is for model testing, and the third is for validation. This requires a lot of data.

CONCLUSIONS Neural networks are not a panacea, but they do perform well in many situations. They perform best when the estimation data contain discontinuities. They may be more effective for monthly and quarterly series than for annual series. Also neural networks perform better than statistical methods dofor forecasting three or more periods out on the forecast horizon. Another strength of neural networks is that they can be automated. Neural networks might be superior to traditional extrapolation models when nonlinearities and discontinuities occur. Neural networks may be better suited to some task domains than others, and we need more research to define these conditions. We have given some guidelines on the issues and pitfalls forecasters face in estimating neural network models, which are similar to those they face with traditional extrapolation models. Forecasters need to take time to master neural networks and they need good software. The research cited above on neural networks is largely based on experience with time series forecasting tasks. These principles should generalize to many non-time series forecasting models since neural networks have been mathematically shown to be universal approximators of functions and their derivatives, to be equivalent to ordinary linear and nonlinear leastsquares regression, and nonparametric regression. Research on neural networks is growing exponentially. Concerned pmctitioners should read the periodic reviews of the emerging litemture like that of Zhang, Patuwo and Hu (1998). However, the standards many researchers use fall short of those discussed by Adya and Collopy (1998) and Armstrong (2001). Thus, the practitioners should apply the standards of Adya and Collopy (1998) and Armstrong (2001) when evaluating the emerging litemture.

APPENDIX: WHAT ARE NEURAL NETWORKS? Neural networks consist of interconnected nodes, termed neurons, whose design is suggested by their biological counterparts. Each neuron has one or more incoming paths (Exhibit 2). Each incoming path i has a signal on it (Xi)' and the strength of the path is characterized by a

Neural Networks for Time-Series Forecasting

253

weight (wJ. The neuron sums the path weight times the input signal over all paths; in addition, the node may be biased by an amount (Q). Mathematically, the sum is expressed as follows: sum = L

WjXj+

Q

Exhibit 2 A neuron Neuron

XI

1-------------------------------------------------, I I

I I

,

: I I

Transform

x.

I I

I I I I I I I IL_________________________________________________II

The output (Y) of the node is usually a sigmoid shaped logistic transformation of the sum when the signals are continuous variables. This transformation is as shown below:

Y = 11(1 + e-SUDI) Learning occurs through the adjustment of the path weights (Wj) and node bias (Q). The most common method used for the adjustment is called back propagation. In this method, the forecaster adjusts the weights to minimize the squared difference between the model output and the desired output. The adjustments are usually based on a gradient descent algorithm. Many neurons combine to form a network (Exhibit 3). The network consists of an input layer, an output layer, and perhaps one or more intervening layers; the latter are termed hidden layers. Each layer consists of multiple neurons and these neurons are connected to other neurons in adjacent layers. Since these networks contain many interacting nonlinear neurons, the networks can capture fairly complex phenomenon. Exhibit 3 A neural network INPUT

LAYER

HIDDEN

LAYER

OUTPUT

LAYER

254

PRINCIPLES OF FORECASTING

REFERENCES M. & F. Collopy (1998), "How effective are neural networks at forecasting and prediction? A review and evaluation," Journal ofForecasting, 17,451--461. (Full text at hops.wharton.upenn.edulforecast) Annstrong, J. S. (2001), "Evaluating forecasting methods," in J. S. Annstrong (ed.), Principles ofForecasting. Norwell, MA: Kluwer Academic Publishers. Annstrong, J. S. & F. Collopy (1992), "Error measures for generalizing about forecasting methods: Empirical comparisons," International Journal of Forecasting, 8, 69-80. (Full text at hops.wharton.upenn.edulforecast) Benitez, J. M., J. L. Castro & I. Requena (1997), "Are artificial neural networks black boxes?" IEEE Transactions on Neural Networks, 8, 1156-1164. Carbone, R. & S. Makridakis (1986), "Forecasting when pattern changes occur beyond the historical da~" Management Science, 32, 257-271. Collopy, F. & J. S. Annstrong (1992), "Rule-based forecasting: Development and validation of an expert systems approach to combining time series extrapolations," Manage-

Ady~

ment Science, 38, 1394-1414.

Cybenko, G. (1989), "Approximation by superpositions of a sigmoidal function," Mathematics ofControl, Signals, and Systems, 2, 303-314.

Dorsey, R. E. & S. Sen (1998), "Flexible fonn estimation: A comparison of polynomial regression with artificial neural networks," Working paper: University of Mississippi. Faraway, J. & C. Chatfield (1998), "Time series forecasting with neural networks: A comparative study using the airline ~" Applied Statistics, 47, Part 2, 231-250. Foster, B., F. Collopy & L. Ungar (1992), ''Neural network forecasting of short, noisy time series," Computers and Chemical Engineering, 16, 293-297. Funahashi, K. (1989), "On the approximate realization of continuous mappings by neural networks," Neural Networks, 2, 183-192. Hecht-Nelson, R. (1989), "Theory of the backpropagation neural network," Proceedings of the International Joint Conference on Neural Networks. Washington, DC, I, 593-605. Hill, T., M. O'Connor & W. Remus (1996), "Neural network models for time series forecasts," Management Science, 42, 1082-1092. Hornik, K., M. Stinchcombe & H. White (1989), "Multilayer feedforward networks are universal approximators," Neural Networks, 2, 359-366. Kaas~ I. & M. Boyd (1996), "Designing a neural network for forecasting financial and economic time series," Neurocomputing 10, 215-236. Kang, S. (1991), An investigation ofthe use offeedforward neural networks for forecasting, Ph.D. Dissertation, Kent, Ohio: Kent State University. Kolarik, T. & G. Rudorfer (1994), "Time series forecasting using neural networks," APL Quote Quad, 25,86--94.

Makridakis, S., A. Andersen, R. Carbone, R. Fildes, M. Hibon, R. Lewandowski, J. Newton, E. Parzen & R. Winkler (1982), "The accuracy of extrapolation (time series) methods: Results ofa forecasting competition," Journal ofForecasting, 1, 111-153. Makridakis, S., C. Chatfield, M. Hibon, M. J. Lawrence, T. Mills, K. Ord & L. F. Simmons (1993), "The M2-Competition: A real-time judgmentally based forecasting competition," Journal ofForecasting, 9, 5-22. Makridakis, S., M. Hibon, E. Lusk & M. Belhadjali (1987), "Confidence intervals: An empirical investigation of the series in the M-Competition," International Journal of Forecasting, 3,489-508.

Neural Networks for Time-Series Forecasting

255

Marquez, L. (1992), Function approximation using neural networks: A simulation study, Ph.D. Dissertation, Honolulu, Hawaii: University of Hawaii. NeIder, J. & R. Mead (1965), "The downhill simplex method," Computer Journal, 7,308310. Nelson, M., T. Hill, W. Remus & M. O'Connor (1999), ''Time series forecasting using neural networks: Should the data be deseasonalized first?" Journal ofForecasting, 18, 359-370. Press, W., B. Flannery, S. Teukolsky & W. Vettering (1988), Numerical Recipes in C: The Art ofScientific Computing. Cambridge, U. K.: Cambridge University Press. Refenes, A. P. (1995), Neural Networks in the Capital Markets. C~ichester, UK: Wiley. Rumelhart, D. & J. McClelland (1986), Parallel Distributed Processing. Cambridge, MA: MIT Press. Sanzogni, L. & J. A. Vaccaro (1993), "Use of weighting functions for focusing oflearning in artificial neural networks," Neurocomputing, 5, 175-184. Seitsma. J. & R. Dow (1991), "Creating artificial neural networks that generalize," Neural Networks, 4, 67-79. Sexton, R. S., R. E. Dorsey & J. D. Johnson (1998), "Toward global optimization of neural networks: A comparison of the genetic algorithms and backpropagation," Decision Support Systems, 22, 171-185. Sharda. R. & R. Patil (1990), ''Neural networks as forecasting experts: An empirical test," Proceedings ofthe 1990 IJCNN Meeting, 2, 491-494. Sharda, R. & R. Patil (1992), "Connectionist approach to time series prediction: An empirical test," Journal ofIntelligent Mamifacturing, 3, 317-323. Tang, Z., C. de Almeida & P. Fishwick (1990), "Time series forecasting using neural networks vs. Box-Jenkins methodology," Simulation, 57, 303-310. Wasserman, P. D. (1989), Neural Computing: Theory and Practice. New York: Van Nostrand Reinhold. Weigend, A., B. Hubermann & D. Rumelhart (1990), "Predicting the future: A connectionist approach," International Journal o/Neural Systems, 1, 193-209. Wheelwright, S. & S. Makridakis (1985), Forecasting Methods for Management, 4th ed., New York: Wiley. White, H. (1992a), \'Connectionist nonparametric regression: Multilayer feedforward networks can learn arbitrary mappings," in H. White (ed.), Artificial Neural Networks: Approximations and Learning Theory. Oxford, UK: Blackwell. White, H. (1992b), "Consequences and detection of nonlinear regression models," in H. White (ed.), ArtifiCial Neural Networks: Approximations and Learning Theory. Oxford, UK: Blackwell. White, H. & A. R. Gallant (1992), "There exists a neural network that does not make avoidable mistakes," in Artificial Neural Networks: Approximations and Learning Theory. H. White (ed.), Oxford, UK: Blackwell. White, H., K. Hornik & M. Stinchcombe (1992), "Universal approximation of an unknown mapping and its derivatives," in H. White (ed.), Artificial Neural Networks: Approximations and Learning Theory. Oxford, UK: Blackwell. White, H. & M. Stinchcombe (1992), "Approximating and learning unknown mappings using multilayer feedforward networks with bounded weights," in H. White (ed.), Artificial Neural Networks: Approximations and Learning Theory. Oxford, UK: Blackwell.

256

PRINCIPLES OF FORECASTING

Zhang, G., B. E. Patuwo & M. Y. Hu (1998) "Forecasting with artificial neural networks: The state of the art," International Journal ofForecasting, 14, 35--62.

Acknowledgments: We appreciate the valuable comments on our paper made by Sandy Balkin, Chris Chatfield, Wilpen Gorr, and others at the 1998 International Symposium of Forecasting Conference in Edinburgh.