multi-context-recurrent neural network for load

MULTI-CONTEXT-RECURRENT NEURAL NETWORK FOR LOAD FORECASTING Tarik Rashid Department of Computer Science University College Dublin Belfield, Dublin 4, Ireland [email protected]

Tahar Kechadi Department of Computer Science University College Dublin Belfield, Dublin 4, Ireland [email protected]

ABSTRACT A recurrent neural network is studied in this paper. A multi–context–recurrent neural network is defined and trained with back propagation, and is then applied to the short–term energy load forecasting task. The idea is to predict a daily maximum load for an arbitrary month ahead. A multi–context–recurrent neural network model was simulated and trained with different training sets to predict the daily maximum load. A multi-context-recurrent neural network is compared with the simple recurrent neural network, and the results are compared and discussed.

by researchers, and provide an alternative to the ANNs [13]. They help to overcome the above limitations when dealing with time series applications, but nevertheless the SRN faces difficulties due to the simplicity of the network’s architecture itself [17, 7]. As an alternative, the multi–context–recurrent neural network is presented, and described in section 2.4. In the next section we describe the forecasting system development task. In section 3 experimental results are presented. Finally we conclude and outline future directions in section 4.

KEY WORDS Artificial neural networks, Recurrent neural networks. Multi–context–recurrent neural networks, Short–term load forecasting,

2

1

The load forecasting task depends on the past and current information of the load for a period of time. Our forecasting system can be processed as follows: obtain and analyze the historical data; pre-processing and normalizing information; choosing the training and testing set; choosing the type of network and it’s parameters; choosing a suitable learning algorithm; and finally implementation.

Introduction

Load forecasting for a power utility system can be used to minimize the operation cost, and to design and plan new plants. Energy load can be classified into three types: long-term covers a period of 10 to 20 years; medium-term covers a period of a few weeks and is used for scheduling fuel suppliers and maintenance programs; short-term covers the day-to-day operation scheduling of the power system. Three techniques have been developed for energy load forecasting: exact parametric model, non-parametric (stochastic) and heuristic based methods. The parametric technique involves formulating a mathematical model of load by examining quantitative relationships between the factors affecting the energy load. An example of this technique is the ARMA model [6, 15]. The nonparametric method computes the load forecast from historical data, in this case the energy load can be calculated as an average of the past loads [1]. Heuristic based methods include artificial neural networks (ANNs) [8, 15]. ANNs deviate from the statistical model by their ability to map inputs to the outputs. ANNs used for prediction tasks, are characterized by their distributed structure of weights and neurons. They are able to build complex relationships between the variables to be considered without specifying them explicitly [13, 14]. Recurrent neural networks (RNNs), particulary the simple recurrent networks (SRNs) [5] are widely used

453-189

Forecasting and System Development

2.1

Historical Data and Analysis

The historical data used was obtained from the EUNITE 2001 symposium, a forecasting competition, it reflects the behavior of power plant operation of the East Slovakia Electricity Corporation. The data were collected every half hour time interval. It covers the period of 1997 to the end of 1998 and consists of daily maximum load and daily average temperature for the years 1995-1998. The average temperature for January 1999 was available too. The historical data was used and analyzed in [2, 4, 10, 11, 12], and a great deal of ideas and understanding about the task of forecasting has been obtained.

2.2

Pre-processing and Normalization

Prediction depends heavily on the information that is presented to the network, however useless information causes complexity and is time consuming to the network’s performance. The pre–processing involves the selection of input data to the network.

797

sequential processing. Its function mainly lies in the combination of several context layers which weight the influence of shorter or longer histories in the sequence differently [18, 3, 20]. The context layers are forward connected to both the hidden and output layers . This will also speed up the learning of the network and reduce the number of neurons in the hidden layer [17, 7]. This feature of our MCRNN is unique and ensures the stability of the output, and has the ability to memorize and adapt based on previous data. A snapshot of its architecture is illustrated in Figure 1.

In accordance with the analysis of the historical data in the literature, it was appropriate to select the following seven variables as inputs to the network. 1. The forecasted day of the year (d: 1 to 365). 2. The forecasted day of the month (d:1 to 31). 3. The forecasted day of the week (d:1 to 7 / Sun-Mon). 4. The daily average temperature of the forecasted day (Td). 5. The daily average temperature of the day before the forecasted day (Td-1). 6. The daily average temperature for the same day as the forecasted day of last week (Td-7) 7. A digit, 1 to represent holidays, or 0 to represent working days. The output data selection of the network was one variable: 1. The daily maximum load (DML). All the variables were scaled linearly in between 0 and 1, except the item 7 which takes either 1 or 0 as an input to the network.

2.3

Selection of the Training and Testing Set

The goal is to predict the daily maximum load for January 1999. Therefore it is better to have different training set patterns. We have selected three different training sets as follows:Figure 1. The Multi–Context–Recurrent Neural Network. 1. The months Jan 1997 and Jan 1998 (Set 1). 2. The months Jan, Feb, Nov, and Dec of the years 1997 and 1998 (Set 2).

.

2.4.1

3. 24 months of the years 1997 and 1998 (Set 3).

Basic definitions and notations are used to explain the MCRNN architecture.

The third training set is generic and can be used to predict an arbitrary month for the year 1999. One testing set was used:-

• Neurons and layers

1. The month Jan 1999.

2.4

Basic Definition

i, j and k are the indices for the input, hidden and output neurons respectively. l, p and q are indices for context layers. nin , nh and nout are the number of neurons in the input, hidden and output layers respectively.

Selection of the Type of Network

The proposed multi-context recurrent neural network (MCRNN) is based on the simple recurrent network [5] architecture. The network architecture consists of fully interconnected layers. The network has feedback connections from the input, hidden and output layers, each to its own designated context layer. This is a further refinement and improvement of the single-step recurrent network idea for

• Net inputs and outputs t is the current time step and Ii (t), Hj (t), and Ok (t) are the outputs of the neurons of the input, hidden and output layers neurons . Cl′ , Cp′′ , and Cq′′′ are the copies of the previous time steps of the neurons of the input,

798

hidden and output layers. dk (t) is the target of neuron k in the output layer.

S nout

1XX 2 (dks − Oks ) ET = 2 s=1

• Connection weights

Assume that the LGks , is the local gradient of the k th neuron in the output layer and LGjs , is the local gradient of the j th neuron in the hidden layer when input pattern s is presented.

vji is the weight connection from the input layer to the ′ ′′ ′′′ hidden layer. vjl , vjp and vjq are the weight connec′ tions from the context layers to the hidden layer. wjl , ′′ ′′′ wjp and wjq are the weight connections from the context layers to the output layer, and wkj is the weight connection from the hidden layer to the output layer.

′

LGks = −eks f (˜ oks ), LGjs =

2.5

˜ js ) (5) LGks wkj f ′ (h

A gradient descent model is used to calculate the error gradients. The change of the weights of non–recurrent connections:

The back propagation algorithm (BP) [19, 9] is an example of supervised learning. It is based on the minimization of error by the gradient descent method. BP is suitable for applications with a specified length sequence [7]. Since the data sequence length in load forecasting is known, we use the back propagation learning algorithm to train our network to predict the daily maximum load. The BP algorithm has been explained in [7, 9], therefore we briefly describe the implementation of the algorithm to the MCRNN. In a feed forward network the outputs of the neurons of the hidden layer can be expressed as a function of the outputs of the input and context layers:  nin nin X X ′ Hj (t) = f  Ii (t)vji (t) + Cl′ (t)vjl (t)+ i=1

S

S

X X ∂ET ∂ET = LGjs Iis = LGks Hjs , ∂vji ∂wkj s=1 s=1

(6)

The change of the weights for the recurrent connections: S

S

X X ∂ET ∂ET ′′ ′′′ = LGks Cps (t), = LG C (t), ks qs ′′ ∂w ∂w′′′ kq kp s=1 s=1 S

S

X X ∂ET ∂ET ′′′ ′ LGjs Cqs (t), = LGks Cls (t), ′′′ = ′ ∂v ∂w kl jq s=1 s=1

l=1

out

h

′′ Cp′′ (t)vjp (t) +

p=1

out n X

k=1

Learning Algorithm

n X

(4)

k=1

n X q=1



S

S

X X ∂ET ∂ET ′ ′′ = LGjs Cls (t) = LG C (t), js ps ′ ∂v ∂v ′′ jp jl s=1 s=1

′′′ Cq′′′ (t)vjq (t) (1)

(7)

The outputs of the neurons of the context layers can be updated as follows:

The updating of the weights for the next step can be done using a momentum technique as in [7].

Ci′ (t) = Ii (t − 1), Cj′′ (t) = Hj (t − 1), Ck′′′ (t) = Ok (t − 1) (2) The outputs of the neurons of the output layer can be expressed as a function of the outputs of the hidden and context layers:  h n nin X X ′ Ok (t) = f  Hj (t)wkj (t) + Cl′ (t)wkl (t)+

2.5.1

j=1

h

n X p=1

′′ Cp′′ (t)wkp (t) +

The structure of the MCRNN model for the daily maximum load consisted of 7–5–1: seven input neurons, five hidden neurons, and one output neuron. The parameters relied heavily on the size of the training and testing sets. The model was trained with three sets of training as in section 2.4. The learning rates and momentum were varied (0.005-0.02). The training cycles were also varied, again generally 15000–40000 cycles were used. Patterns of training data were divided into subsets and each subset was presented to the modified recurrent network in a time series. Every time a pattern was presented to the model, the weight connection was modified and the history of the states was updated automatically; this continued until the pattern was presented, when the stored activations in the context layers were cleared out. The same process was then repeated for the next subset. The processes were iterative.

l=1

out

n X q=1

Selection of MCRNN Structure and Learning Parameters



′′′ Cq′′′ (t)wkq (t) (3)

Assume s is a pattern in the data set, the difference between the pattern target and its actual output in the output layer will be defined as dks − Oks : The objective is to minimize the network cost. This can be obtained by summing the error over all past patterns of the network, and this can be expressed as:

799

3

Experimental Results

this paper no segregation was made, therefore only one network model was used. The selection of input variables, in section 2.2, facilitates our model to deal with all the data with no exceptions for unusual days. Our network was trained with three different sets. Set 3 included all the data presented to the network (all seasons) in order to predicted a month in winter. The MAPE was 4.32% better than other models and the MAX was slightly higher than the network with training set 2. Bearing in mind that the network trained with set 3 can be used to predict any arbitrary month in the year. The MCRNN being trained with set 3 showed it’s advantages over the simple recurrent network being trained with the same set in terms of performance as shown in both table 1 and 2. Given the varied nature of the historical data, the MCRNN predicted the DML behavior in a compact, robust, natural representation, and outperformed the generalization. The obtained results validate this approach and compare favorably with those examples mentioned in section 2.1. The errors associated with each method heavily depend on homogeneity of data information, choice and size of training sets, the network’s type and it’s parameters. We believe that there are many features that have unusual behavior that may have caused a negative effect on the MCRNN negatively in terms of accuracy and performance. Having said that, we will continue to improve the network to deal better with the days that exhibit unusual behavior.

Several experiments were carried out, with the MCRNN being trained with three different training sets. Another experiment was carried out to test our MCRNN against the simple recurrent network SRN [5]. Training set 3 was selected to train both networks. The performance of the networks with each training set was evaluated with two measurement formulae, namely: The Mean Absolute Percentage Error (MAPE) and Maximum Error (MAX) as in equations (8) and (9). n

M AP E =

100 X Lri − Lpi | | Lri n i=1

M AX = max(|Lri − Lpi |)

(8) (9)

where n is the number of outputs predicted from the network, Lri is the desired value of the DML load for the ith day, and Lpi is the predicted value of the DML for the ith day. Table 1 shows the performance of the MCRNN compared with SRN based on different training sets . Table 2 outlines the predicted values of DML for both networks with three different training sets. Table 1. The range of MAPE and MAX, for three different training sets. Columns mcrnn-s 1, s 2 and s 3 are the errors of our network being trained with three different training sets; whereas the srn-s 3 is the errors on the simple recurrent neural network trained with set 3. Error MAPE% MAX (MW)

4

mcrnn-s1 6.82 73.28

mcrsnn-s2 4.98 50.48

srn-s3 5.06 62.28

5

mcrsnn-s3 4.32 56.43

Acknowledgements

The authors would like to thank the EUNITE 2001 symposium, a forecasting competition, for providing the data. Special thanks to Mr. Peter Briggs, Mr. Maurice Coyle and Mr. Ruawan O’Donhoue of the Smart Media Institute and Multi–Agent Group for their interesting and thoughtful discussions.

Conclusions and Future Work

In this paper a new network (MCRNN) is introduced and defined. The MCRNN is trained with back propagation and applied to the forecasting of the daily maximum load for the month ahead. In general the behavior of historical data varied greatly. The month of January is one of the difficult months in the year to predict because it is preceded by the month of December. December is considered to be an exceptional month because it is approaching the new year, and has more holidays than any other month, which could have an impact on the prediction performance. In [16], whereby we built network models for different features of the historical data based on winter season (weekday and weekends) or load performance (different classes based on the load) in other words many network models were created. Furthermore in [16], the unusual data were discarded (unusual data e.g. days with high daily average temperature where the daily maximum load did not decrease or holidays with high daily maximum load). In

References [1] W. Charytoniuk, M.S. Chen and P. Van Olinda, ‘Nonparametric regression based short-term load forecasting’, In IEEE Tran. on Power Systems, vol 13, no. 3, pp. 725-730, (1998). [2] M.W. Chang, B.J. Chen and C.J. Lin, ‘Eunite network competition: Electricity load forecasting’, In EUNITE 2001 symposium, a forecasting competition, (2001). [3] G. Dorffner ‘Neural Networks for Time Series Processing‘, In Neural Network World, vol 6, no. 4, pp 447– 468, 1996. [4] D. Esp ‘Adaptive Logic Networks for East Slovakian Electrical Load Forecasting’, In EUNITE 2001 symposium, a forecasting competition, (2001).

800

[19] P. J. Werbos, ‘Backpropagation through time: what it does and how to do it’, In Proceedings of the IEEE, volume 78, pp. 1550–1560, (1990).

[5] J.L. Elman, ‘Finding structure in time’, Cognitive Science, 14(2),179–211, (1990). [6] G. Gross and F.D. Galianan, ‘Short-term load forecasting’, In Proceedings of the IEEE, vol 75, no. 12, pp. 1558-1572, (1987).

[20] W H. Wilson, ‘Learning Performance of Networks like Elman’s Simple Recurrent Netwroks but having Multiple State Vectors, In Workshop of the 7th Australian Conference on Neural Networks, Australian National University Canberra, (1996).

[7] B.Q. Huang, T. Rashid and T. Kechadi ‘A new modified network based on the Elman network’, In Proceedings of IASTED International Conference on Artificial Intelligence and Application, Innsbruck, Austria (2004). [8] A. Khotanzad, R. Afkami, T.L. Lu, A. Abaye, M. Davis and D.J. Maratukulam ‘ANNSTLF-A neural network-based electric load forecasting system’, In IEEE Tran. on Neural networks, vol 8, no. 4, pp. 835846, (1997). [9] S. Haykin, ‘Neural Networks, A Comprehensive Foundation’, Macmillan,New York, NY (1994). [10] I.King and John Tindle , ‘Storage of Half Hourly Electric Metering Data and Forecasting with Artificial Neural Network Technology’, In EUNITE 2001 symposium, a forecasting competition, (2001). [11] W. Kowalczyk, ‘Averaging and data enrichment: two approaches to electricity load forecasting’, In EUNITE 2001 symposium, a forecasting competition,(2001). [12] L. Lewandowski, F. Sandner and P. Portzel ‘Prediction of electricity load by modeling the temperature dependencies’, In EUNITE 2001 symposium, a forecasting competition, (2001). [13] F.J. Marin, F. Garcia-Lagos, ‘Global model for shortterm load forecasting using artificial neural networks’, In IEE Proc.-Gener. Transm. Distrib., vol 149, No. 2, (2002). [14] J.S. McMenamin and F.A. Monforte, ‘Short term energy forecasting with neural networks’, In Energy J., 19, (4), pp. 43-61, (1998). [15] D. Park, M. Al-Sharkawi, R. Marks, A. Atlas and M. Damborg, ‘Electric load forecasting using an artificial neural network’, In IEEE Tran. on Power Systems, vol 6, no. 2, pp. 442-449, (1991). [16] T. Rashid and T. Kechadi, ‘Short-Term Energy Load Forecasting Using Recurrent Neural Network‘ In IASTED International Conference on Artificial Intelligence and Soft Computing, Sep, (2004). [17] Dit-Yan, K. W. Yeung,‘A Locally Recurrent Neural Network Model for Grammatical Interface‘, In Infor. processing Proceedings of the International Conference on Neural Information Processing, pages 14681473, (1995). [18] C. Ulbricht, ‘Multi-Recurrent Networks for Traffic Forecasting‘, In National Conference on Artificial Intelligence, pp 883-888, (1994).

801

Table 2. This table shows the predicted values for the month of Jan 1999 with three different training sets. The columns are; D represents the day of the month of Jan 1999, mcrnn-s 1, s 2 and s 3 are our network with three different training sets. The srn-s 3 is the simple recurrent network trained with set3.

D 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

mcrnn-s1 794.71 772.79 745.34 782.14 811.28 721.65 785.73 754.90 770.05 725.78 793.89 802.20 794.90 794.01 785.14 763.31 705.09 784.55 805.02 795.11 786.37 784.07 776.32 711.03 779.05 800.32 794.02 783.78 771.65 775.60 721.44

mcrnn-s2 747.96 734.50 674.97 767.79 785.36 749.03 758.09 755.45 764.10 709.96 789.22 789.48 786.06 789.79 789.45 776.20 710.98 785.47 787.91 788.60 781.85 776.23 777.73 716.18 788.62 791.97 786.51 780.64 771.13 771.93 722.39

srn-s3 762.91 724.10 697.75 780.29 781.84 730.99 769.07 777.28 772.99 711.30 775.77 774.63 781.98 786.64 778.49 761.55 699.57 773.37 776.97 782.35 785.28 787.60 780.31 707.30 782.00 782.60 788.39 784.51 789.71 798.97 718.68

mcrnn-s3 732.95 701.74 714.69 774.43 779.88 726.45 755.52 769.30 761.76 719.36 770.04 775.03 779.88 778.11 768.95 752.24 709.92 770.55 779.67 781.92 779.72 773.33 761.63 722.67 782.69 788.54 788.37 780.76 773.94 769.16 740.04

Target 751 703 677 718 738 709 745 749 734 679 748 739 756 763 752 738 699 782 782 792 801 781 731 708 789 798 791 776 792 763 743

802