Artificial Neural Network Time Series Modeling For ...

Chiang Mai J. Sci. 2008; 35(1)

1

Chiang Mai J. Sci. 2008; 35(1) : 1-16 www.science.cmu.ac.th/journal-science/josci.html Contributed Paper

Artificial Neural Network Time Series Modeling For Revenue Forecasting Siti Mariyam Shamsuddin, Roselina Sallehuddin & Norfadzila Mohd Yusof* Faculty of Computer Science & Information System, University Technology of Malaysia * Author for correspondence. e-mail: Received: 20 May 2007 Accepted: 20 May 2007.

ABSTRACT The objective of this study is to investigate the effect of applying different number of input nodes, activation functions and pre-processing techniques on the performance of backpropagation (BP) network in time series revenue forecasting. In this study, several preprocessing techniques are presented to remove the non-stationary in the time series and their effect on artificial neural network (ANN) model learning and forecast performance are analyzed. Trial and error approach is used to find the sufficient number of input nodes as well as their corresponding number of hidden nodes which obtain using Kolmogorov theorem. This study compares the used of logarithmic function and new proposed ANN model which combines sigmoid function in hidden layer and logarithmic function in output layer, with the standard sigmoid function as the activation function in the nodes. A cross-validation experiment is employed to improve the generalization ability of ANN model. From the empirical findings, it shows that an ANN model which consists of small number of input nodes and smaller corresponding network structure produces accurate forecast result although it suffers from slow convergence. Sigmoid activation function decreases the complexity of ANN and generates fastest convergence and good forecast ability in most cases in this study. This study also shows that the forecasting performance of ANN model can considerably improve by selecting an appropriate pre-processing technique. Keywords : Artificial neural network; Forecasting; Data-preprocessing; Input nodes, Activation function 1. INTRODUCTION

Knowing future better has attracted many people for thousands of years. The forecasting methods vary greatly and will depend on the data availability, the quality of models available, and the kinds of assumptions made, amongst other things. Generally speaking, forecasting is not an easy task and therefore it has attracted many researchers to explore it.

Artificial neural network (ANN) has found increasing consideration in forecasting theory, leading to successful applications in various forecasting domains including economic, business financial [5, 13, 24] and many more. ANN can learn from examples (past data), recognize a hidden pattern in historical observations and use them to forecast future values. In addition to that, they are able to deal with incomplete information

2


or noisy data and can be very effective especially in situations where it is not possible to define the rules or steps that lead to the solution of a problem. Despite of many satisfactory characteristics [26], of ANNs, building an ANN model for a particular forecasting problem is a nontrivial task. Several authors such as [2, 7, 13, 19, 23, 26] have provided an insight on issues in developing ANN model for forecasting. These modeling issues must be considered carefully because it may affect the performance of ANNs. Based on their studies, some of the discussed modeling issues in constructing ANN forecasting model are the selection of network architecture, learning parameters and data pre-processing techniques apply to the time series data. This study examines the effectiveness of BP network model as an alternative tool for forecasting. It provides a practical introductory guide in the design of an ANN for forecasting time series data. We use the time series corresponding to the revenue collection in Royal Malaysian Customs Department to illustrate this process. This research attempts to study the behavior of ANN models when several of its parameters are altered. This study examines the effect of network parameters through trial and error approach by varying network structures based on the number of input nodes, activation functions and data pre-processing techniques in designing of BP network forecasting model. The relevancy of applying difference non-linear activation functions in hidden and output layers of ANN model is also examined in this study.

the human brain. This is clearly an overstatement. Although loosely inspired by the structure of the brain, current ANN technology is far from being able to simulate even a simple brain function. Nevertheless, ANNs do offer a computational approach that is potentially very effective in solving problems which do not lend themselves to analysis by conventional means. The key difference between ANNs and other problem solving methods is that ANNs in a real sense learn by example and modify their structure rather than having to be programmed with specific, preconceived rules. In other words, ANNs can be seen as a nonparametric statistical procedure that uses the observed data to estimate the unknown function. The basic unit of any ANN is the neuron or node (processor). Each node is able to sum many inputs x1, x2,…, x3 whether these inputs are from a database or from other nodes, with each input modified by an adjustable connection weight (Figure 2.1). The sum of these weighted inputs is added to an adjustable threshold for the node and then passed through a modifying (activation) function that determines the final output.

ARTIFICIAL NEURAL NETWORK (ANN) ANN technolog y is an advanced computerized system of information processing. In many discussions they are credited with operating in the same way as

2.1 Feed-Forward Network Figure 2.2 depicts the feed-forward network where computation proceeds along the forward direction only. The output obtained from the output neurons constitutes

2.

Figure 2.1 Working of node.


3

the network output. Each layer will have its own index variable: k for output neurons, j for hidden, and i for input neurons. In a feedforward network, the input vector, x, is propagated through a weight layer, Wij, o j = f(net j )

(2.1)

net j = ∑ Wijo i + θ j j

(2.2)

o k = f(net k )

(2.3)

net k = ∑ W jk o j +θ k k

(2.4)

horizon. Thus, a one-step-ahead forecast can be performed through a ANN with one output unit and a k-step-ahead forecast can be performed with k output units. Figure 2.3 illustrates an MLP forecasting model in which two past terms in the series being used to forecast the value of the series in a one-stepahead forecast.

where f is an output function (possibly the same as f above in hidden layer).

Figure 2.3 MLP network for one-step-ahead forecasting based on two lagged terms.

Figure 2.2 A feed-forward network. MULTILAYER PERCEPTRON (MLP) MLP network is also the type of ANN most commonly used in time series forecasting [7]. The univarite time series modeling is normally carried out through this ANN by using a determined number of the time series lagged terms as inputs and forecasts as outputs. The number of input nodes determines the number of prior time points to be used in each forecast, while the number of output nodes determines the forecast 2.2

2.3 Backpropagation (BP) Algorithm The goal of any training algorithm is to find the logical relationship from the given input/output or to reduce the error between the real results and the results obtained with the ANN as small as possible [27]. Any network structure can be trained with BP when desired output patterns exist and each function that has been used to calculate the actual output patterns is differentiable. The BP algorithm is the most computationally straightforward algorithm for training the MLP. As with conventional gradient descent (or ascent), BP works by, for each modifiable weight, calculating the gradient of error (or cost) function with respect to the weight and then adjusting it accordingly. The most frequently used error function to be minimized in BP at the output layer is the mean sum squared of error, E defined below 1

Error, E = 2

∑ (t k

k

− o k )2

(2.5)

where tk is the target output and ok is the actual

4


outputs of the neuron. For easier understanding, we have summarized the consecutive steps involved in BP algorithm in the following table.

Table 2.2 Eight steps in designing an ANN forecasting model.

Table 2.1 The summarized steps in the BP algorithm. Step 1 Obtain a set of training patterns. Step 2

Set up ANN model that consist of number of input neurons, hidden neurons, and output neurons.

Step 3

Set learning rate (h) and momentum rate (a).

Step 4

Initialize all connections (Wij and Wjk) and bias weights (qk and qj ) to random values.

Step 5

Set the minimum error, Emin.

Step 6

Start training by applying input patterns one at a time and propagate through the layers then calculate total error.

Step 7

Backpropagate error through output and hidden layer and adapt weights, Wjk and qk.

Step 8

Backpropagate error through hidden and input layer and adapt weights Wij and qj.

Step 9

Check if Error