Forecasting of natural gas consumption with neural network and neuro ...

26 downloads 0 Views 407KB Size Report
recent years, new techniques, such as artificial neural networks and fuzzy .... Nowadays, a lot of artificial neural network models which are appropriate to use in.
Energy Education Science and Technology Part A: Energy Science and Research 2011 Volume (Issue) 26(2): 221-238

Forecasting of natural gas consumption with neural network and neuro fuzzy system Oguz Kaynar1,*, Isik Yilmaz2, Ferhan Demirkoparan1 1

Cumhuriyet University, Department of Management Information System, Sivas, Turkey, 2 Cumhuriyet University, Department of Geosecience, Sivas, Turkey

Received: 12 April 2010; accepted 14 May 2010

Abstract The prediction of natural gas consumption is crucial for Turkey which follows foreign-dependent policy in point of providing natural gas and whose stock capacity is only 5% of internal total consumption. Prediction accuracy of demand is one of the elements which has an influence on sectored investments and agreements about obtaining natural gas, so on development of sector. In recent years, new techniques, such as artificial neural networks and fuzzy inference systems, have been widely used in natural gas consumption prediction in addition to classical time series analysis. In this study, weekly natural gas consumption of Turkey has been predicted by means of three different approaches. The first one is Autoregressive Integrated Moving Average (ARIMA), which is classical time series analysis method. The second approach is the Artificial Neural Network. Two different ANN models, which are Multi Layer Perceptron (MLP) and Radial Basis Function Network (RBFN), are employed to predict natural gas consumption. The last is Adaptive Neuro Fuzzy Inference System (ANFIS), which combines ANN and Fuzzy Inference System. Different prediction models have been constructed and one model, which has the best forecasting performance, is determined for each method. Then predictions are made by using these models and results are compared. Keywords: ANN; ANFIS; ARIMA; Natural gas; Forecasting

©Sila Science. All Rights Reserved.

1. Introduction Natural gas which doesn’t cause air pollution and is a conservation source of energy, is used in heating, electricity production and a lot of areas of industry in consequence of having features such as high combustion efficiency, no waste product, price advantages and being _____________

*

Corresponding author. Tel.: +90-505-808-3971; Fax: +90-346-219-010. E-mail address: [email protected] (O. Kaynar).

222

O. Kaynar et al. / EEST Part A: Energy Science and Research

26

(2011)

221-238

easily storable and controllable. Turkey, which follows a foreign dependent policy to yield this eco-friendly source of energy, supplies a big part of natural gas requirement from Russia, Iran and Azerbaijan by means of pipelines. Besides, liquefied natural gas (LNG) is supplied by ships to Marmara Ereglisi from Nigeria and Algeria. Although transfer of natural gas from the sources to consumption points is guaranteed by international agreements, natural gas flowing coming from importing countries can be blocked for a variety of reasons and as a result this causes problems like heating in households in winter and production halt in industry. Natural gas is stored up in the form of liquid and gas to meet requirements and eliminate seasonal demand fluctuations. Underground lakes, underground rock salt seams, abandoned mines and natural gas and oil seams are preferred to store natural gas. Turkey has storage capacity of natural gas which is only %5 of total internal consumption and by making new investments in this area; efforts are maintained in the way of increasing storage capacity. It is crucial to make plans about demand, supply, transmission, distribution and quotation to make natural gas sector, in which Turkey is foreign dependent and has low storage capacity, process healthier and develop. Determining natural gas demand properly is one of the most important issues in tasks that will be made. Because of having low storage capacity of natural gas, determining the natural gas demand properly effects agreements about buying natural gas and investments. The more the prediction of natural gas demand becomes accurate; the more planning works will be robust. Two of the data-driven methods that have recently gained popularity as an emerging and challenging computational technology are artificial neural networks (ANNs) and neuro-fuzzy system in time series analysis. Time series models that aim to make predictions about future by means of past observation values are commonly used in a lot of areas such as medicine, engineering, business, economics and finance [1-6]. There are different models created by using of different methods for making predictions by means of time series. The most commonly used and well known among these models are ARIMA models. ARIMA models, which assume a linear relationship exists between data of serial, and are able to model this relationship, are applied to time series that is stationary or made stationary with various statistical methods successfully. But a lot of time series seen in applications do not only have linear relationship. Artificial Neural Networks (ANN), which can model both linear and nonlinear relationships, model a lot of different formed functional structure successfully and approximate any form of function with certain accuracy in the case of disable to determine the functional structure of data set completely [7-9]. Additionally, artificial neural networks known as general function approximator don’t need any preassumption on data set on the contrast of statistical methods because of being a nonparametric method. Because of these reasons ANNs have become one of the alternative methods commonly used in time series analysis in recent years. A wide compilation about studies that ANN is used to predict time series is made by Zhang [10]. Another soft computing technique in time series analysis, which is a very powerful approach for building complex and nonlinear relationship between a given set of input and output data, is fuzzy rule based approach. Fuzzy logic and fuzzy set theory are employed to describe human thinking and reasoning in a mathematical framework. A specific approach in neuro-fuzzy development is the adaptive neuro-fuzzy inference system (ANFIS), which combines the learning capabilities of a neural network and reasoning capabilities of fuzzy logic in order to give enhanced prediction capabilities as compared to using a single methodology alone. Jang [11], who propose ANFIS firstly, used ANFIS to predict. Mackey-Glass chaotic series [11]. Recently, ANFIS takes more and more attention in

O. Kaynar et al. / EEST Part A: Energy Science and Research

26

(2011)

221-238

223

time series analysis of different fields such as energy, meteorology, hydrology, business and finance [11-18]. A lot of studies about prediction of natural gas consumption with ANN exist in literature but a few with ANFIS. Kaynar et al. [19] make predictions with ARIMA and MLP neural network by means of daily and weekly consumption series of Ankara city in Turkey and they determined that MLP neural networks outperforms ARIMA models [19]. Ivezic [20] predicts daily and weekly natural gas consumption of Siber by using historical temperature and natural gas consumption data in his study [20]. Garcia ve Mohagheg [21] propose an ANN model that predicts natural gas output of USA until 2020. Bolen [22] uses ANN to predict LPG export of USA in his dissertation. Khotanzad [23] proposes a two level structure in his study. In the first level he obtains daily natural gas consumption predictions by using two parallel adaptive ANN and in second level, conveying these predictions from an integrating unit, he obtains the last prediction values [23]. Brown et al. [24] compare daily natural gas consumption model that they build by using ANN with linear regression models and show that ANN outperforms [24]. Viet and Mandziuk [25] make predictions of natural gas consumption in a certain region of Poland by using ANN and Fuzzy ANN models and they point out that long-term monthly predictions are better than mid-term weekly predictions. The main objective of this study is to assess the performance of nonlinear techniques which are neural network models and adaptive neuro-fuzzy inference system (ANFIS). Data set used in this study is time series of weekly natural gas consumption of Turkey. Prediction accuracy of ANFIS and Neural Network models, RBF, MLP is compared with ARIMA, which is a linear classical time series model. The next section provides an overview description of MLP, RBF, ANFIS and ARIMA. The third section shows applications of different models created by different techniques on natural gas consumption data and results obtained from these models. Finally, the last section provides findings and concluding remarks. 2. Methodology 2. 1. Artificial neural networks Artificial neural networks are data processing systems devised via imitating brain activity and have performance characteristics like biological neural networks. ANN has a lot of important capabilities such as learning from data, generalization, working with unlimited number of variable. An ANN is typically composed of several layers of many computing elements called nodes. Each node receives an input signal from other nodes or external inputs and then after processing the signals locally through a transfer function, it outputs a transformed signal to other nodes or final result [10]. Nowadays, a lot of artificial neural network models which are appropriate to use in different areas and for certain purposes (MLP, RBF, LVQ, Hopfield, Recurrent, SOM, ART etc.) are developed. The most commonly used in time series analysis among these are multilayer feed forward artificial neural network (Multiple Layer Perceptron-MLP) and radial basis function networks ((RBFN) which are also used in this study. 2. 1. 1 Multi layer feed forwarf artificial neural networks (MLP) Neurons and layers are organized in a feed forward manner in MLP networks. In MLP, first layer is input layer where external information about problem wanted to be solved is received. The last layer is output layer that data manipulated in network is obtained. The layer

224

O. Kaynar et al. / EEST Part A: Energy Science and Research

26

(2011)

221-238

exists between input and output layers is called hidden layer. There can be more than one hidden layer in MLP networks. Fig. 1 shows the architecture of typical MLP network.

Fig. 1. Multi layer perceptron neural network.

Technically, an ANN’s basic job is learning the structure of sample data set, to generalize it. For doing this, network is made to be able to generalize by training with samples of the case [26]. During the training process, input patterns or examples are presented to the input layer of a network. The activation values of the input nodes are weighted and accumulated at each node in the hidden layer. The weighted sum is transferred by an appropriate transfer function to produce the node's activation value. The output of hidden layer j can be calculated as follows: N � � z j � f � v 0 j � � v ij x i � i �1 � �

(1)

where ( x1 , x 2 ....x n ) are inputs values, vij is weight that connects i th input to hidden node j , v0 j is bias, f is nonlinear transfer function. The most widely used activation functions are sigmoid (2) and hyperbolic tangent (3) functions. f ( x) �

1 1 � e �x

ex � e�x f ( x) � x e � e�x

(2) (3)

Output of hidden nodes becomes the input of the output layer. Finally an output value of k th node in the output layer is obtained as followed:

O. Kaynar et al. / EEST Part A: Energy Science and Research p � � y k � f �� w0 k � � w jk z j �� j �1 � �

26

(2011)

221-238

225 (4)

Similary z j is j th hidden layer output, w jk is weigth that connects hidden layer node j to output layer k . Generally linear transfer function is used in the output layer nodes of network. The aim of training is to minimize the differences between the ANN output values and the known target values for all training patterns. The most popular algorithm for training is the well-known back-propagation which is basically a gradient steepest descent method with a constant step size [27]. This algorithm is named as back-propagation because it tries to reduce errors from output to input backwardly. In supervised learning algorithms, a sample data set that consist of input and output values is given to network for training. The given target output values are named as supervisor or teacher in ANN literature. In supervised learning algorithms, weights are adjusted by minimizing error function given in Eq. 5 in training level. E�

1 m � ( yk � tk )2 2 k �1

(5)

where yk represents output that network produce and tk real output value. Connection weights are updated for minimizing error. So it is aimed that the network produces closest output values to real output values. Details of back-propagation algorithm can be examined in Fauset [28]. For using MLP networks in prediction of time series, it is required to determine the architecture of network that contains the number of layers, the number of nodes in each layer and how the nodes are connected. The number of output neuron is adjusted depending on how many terms prediction will be made. It is not so easy to determine the number of neurons using in input layer because what the number of lagged observations used to discover the underlying pattern in a time series is a critical question and the answer of this question shows what the number of input nodes will be. Tang and Fishwick say that it is required the number of input perceptron is p degree of ARIMA (p, d, q) model [29]. But Zhang express this approach is not appropriate because of MA models do not contain AR term, Box-Jenkins models are linear and the number of input nodes can be determined through experiments [10]. A network with one hidden layer is generally sufficient for most problems. The relationship between output value ( yt ) and inputs consists of N number of past observation values ( yt �1 , yt � 2 ,......yt � N ) is given in Eq. 6. p N � � y t � wo � � w j f � v 0 j � � v ij y t �i � � et j �1 i �1 � �

(6)

where wj, vij represent weights between neurons, p is the number of hidden neurons, f is nonlinear activation function used in hidden layer. 2. 1. 2. Radial basis function networks The radial basis function network (RBFN) is traditionally used for strict interpolation problem in multi-dimensional space and has similar capabilities with MLP neural network

226

O. Kaynar et al. / EEST Part A: Energy Science and Research

26

(2011)

221-238

which solves any function approximation problem [30]. RBFs were first used in design of neural network by Broomhead and Lowe, who show how RBF neural network models nonlinear relationship and implement interpolation problems [31]. Advantages of RFBN are that it can be trained in a short time according to MLP and approximate the best solution without dealing with local minimums [32, 33]. Additionally, RBFN are local networks compared to the feed-forward networks that perform global mapping. This means that RBFN uses a single set of processing units, each of which is most receptive to a local region of the input space [34]. Because of its features mentioned above, RBFN were used as alternative neural network model in applications such as function approximation, time series forecasting as well as classifying task in recent years [35-43]. The structure of RBFN is composed of three layers as can be seen in Fig. 2. The main distinction between MLP and RBFN is that RBFN have one hidden layer which contains nodes called RBF units. As its name implies, radially symmetric basis function is used as activation functions of hidden nodes.

Fig. 2. Radial basis function neural network.

The input layer serves only as input distributor to the hidden layer. Differently from MLP, the values in input layer are forwarded to hidden layer directly without being multiplied by weight values. The hidden layer unit measures the distance between an input vector and the centre of its radial function and produces an output value depending on the distance. The center of radial basis function is called reference vector. The closer input vector is to the reference vector, the more the value is produced at the output of hidden node. Though a lot of radial basis functions are suggested for using in hidden layer (Gausian, Multi-Quadric, Generalized Multi-Quadric, Thin Plate Spline), Gaussian function is most widely used in applications. Chen et al. indicate that the choice of radial basis function used in network don’t effect the networks performance significantly [30]. The activation function of the individual hidden nodes defined by the Gaussian function is expressed as follows:

�j � e

� X �C j �� � � 2j ��

2

� � � ��

(7)

j � 1,2...., L

O. Kaynar et al. / EEST Part A: Energy Science and Research

26

(2011)

221-238

227

where � j denotes the output of the j th node in hidden layer, . is Euclidian distance function which is generally used in applications, X is the input vector, Cj is center of the jth Gaussian function, σj is radius which shows the width of the Gaussian function of the jth node and L denotes the number of hidden layer nodes. In the next step, the neurons of the output layer perform a weighted sum using the hidden layer outputs and the weights which connect hidden layer to output layer. Output of network can be presented as linear combination of the basic functions: L

y k � � � j wkj � wk 0

(8)

j �1

where wkj is the weight that connects hidden neuron j and output neuron k , wk 0 is bias for the output neuron. Training of RBF networks contains process of determining of centre vector (Cj), radius value (σj) and linear weight values (wkj). Generally two stage hybrid learning algorithm is used to train RBF networks. In the first stage of hybrid learning algorithm, centre and width of RBFs in hidden layer are determined by using unsupervised clustering algorithms or randomly selected from given input data set. And at the second stage, output weight is calculated. A lot of methods are proposed in literature to determine center and width of reference vector. To choose central vectors from data set: With this method, number of hidden neurons is set to the number of training examples and all input vectors are also used as centers of RBFs. Shortly, for each point in input space, one radial basis function is determined. This case is named as Exact RBF. There are two objections of this case. The former is size problem that cause calculation complexity in a condition of data set is too large. The latter occurs when the training data includes noise. In this case network is over trained with these noisy data, so performance of the system for test data will not as good as performance of training data. To reduce calculating complexity and to deal with overtraining problem, the number of neurons in hidden layer is adjusted smaller than the number of sample in input data set. In this case central vectors are chosen from input vectors randomly. Pruning or growing methods: Pruning or growing methods which start with a number of prespecified hidden neuron and iteratively continues by adding/removing hidden neurons to/from RBFN. The network structure which has minimum testing and training error is selected as final model of RBFN. In this iterative process, parameters of hidden nodes are selected from input vectors randomly or determined by using clustering methods. To determine central vectors with clustering methods, input vectors are devoted to certain number of clusters by using clustering algorithms such as K-means, Fuzzy C-means or Self Organization Map. Then cluster centers are used as RBF centers. The values for the width of RBF are also important parameters. Very narrow width will lead to overfitting, and radius wider than necessary can give even worst results. This value can be determined by user or computed with Eq. (9) below [44] � � d max / 2 M

(9)

where d max is maximum distance of centers, M is the number of clusters. After centers and the radius have been fixed, output layer weights can be calculated by solving matrix form of linear equation presented in equation 10 with least square error method.

228

O. Kaynar et al. / EEST Part A: Energy Science and Research

W � � †Y

26

(2011)

221-238

(10)

where � † is the generalized inverse of � . After � j values are determined, wkj is obtained with solution of Equation 10 with least error square method. 2. 2. Neuro-Fuzzy modeling and ANFIS The ANFIS is a FIS implemented in the framework of an adaptive fuzzy neural network, and is a very powerful approach for building complex and nonlinear relationship between a given set of input and output data [11, 45]. FIS utilizes human expertise and is composed of three conceptual components: a rule base, which contains a selection of fuzzy rules; a database which defines the membership functions (MF) used in the fuzzy rules; and a reasoning mechanism, which performs the inference procedure upon the rules to derive an output. FIS implements a nonlinear mapping from its input space to the output space by using a number of fuzzy if-then rules, each of which describes the local behavior of the mapping. The parameters of the if-then rules (referred to as antecedents or premises in fuzzy modeling) define a fuzzy region of the input space, and the output parameters (also consequents in fuzzy modeling) specify the corresponding output [15]. The main problem in FIS is that there is no systematic procedure to define the membership function parameters. Hence, the efficiency of the FIS depends on the estimated parameters. The construction of the fuzzy rule necessitates the definition of premises and consequences of fuzzy sets. On the other hand, an ANN has the ability of learning from input and output pairs and adapting them to an interactive manner. Main idea of combining fuzzy system and neural networks is to design an architecture that uses fuzzy system to represent knowledge in interpretable manner, in addition to possessing the learning ability of neural network to optimize its parameter [11]. Thus ANFIS eliminates the basic problem in fuzzy system design, defining the membership function parameters and design of fuzzy if-then rules, by effectively using the learning capability of ANN for automatic fuzzy rule generation and parameter optimization [46]. Two types of FIS have been widely used in a various application: the Mamdani model [47] and the Takagi Sugeno model [48]. The difference between two models lies in the consequent part of their rules. The consequence parameter in Sugeno FIS is either a linear equation of premise parameter, called “first-order Sugeno model”, or constant coefficient, “zero order Sugeno model” [45]. For two inputs first-order Sugeno fuzzy model, if –then rules can be expressed as: Rule i : If x is Ai and y is Bi then f i � p i x � qi y � ri

(11)

where pi , qi and ri are linear parameters of consequent part of first-order Sugeno model. As can be seen in Fig. 3, five layers are used to construct this inference system. Each layer contains several nodes having different shape and functions. There is two types of nodes which are fixed nodes denoted by circles and square shaped adaptive nodes. An adaptive node is the nodes whose parameters are adjusted in the training phase of system. Functions of each of layer are briefly described as follow: Layer1 (Input layer): Every node of this layer is adaptive node which generates membership grades related with appropriate fuzzy sets using membership function.

O. Kaynar et al. / EEST Part A: Energy Science and Research

O1,i � � Ai ( x)

26

(2011)

i � 1, 2

O1,i � � Bi � 2 ( y )

221-238

229 (12)

i � 3, 4

where x , y crisps inputs of node i and Ai , Bi are linguistic labels, � Ai , � Bi are membership functions. Different membership functions such as trapezoidal, triangular, bell-shaped, gaussian function, etc. can be used to determine the membership grades. Bell-shaped function is the most popular among these and used in this study. This function is given in Eq. (13): � ( x) �

1 � x � ci 1 � �� � ai

� �� �

2bi

(13)

ai , bi , ci } is parameter set of membership function and called premise parameters. These {}{ parameters are tuned during the training phase.

Fig. 3. ANFIS structure having two inputs and four rules.

Layer 2 (Rule layer): In this layer, all nodes are fixed and labeled as � . In this layer, the T-Norm operator (AND or product) is applied to get one output that represents the results of the antecedent for a fuzzy rule, that is, firing strength. Firing strength is weight degree of ifthen rule in the premise part and it indicates the shape of the output function for that rule. The outputs of the second layer are the products of the corresponding degrees obtaining from previous layer and calculated as: O2,i � wi � � Aj ( x) � � B j ( x )

i � 1,..,4 ; j � 1,2

(14)

Layer 3 (Normalize layer): Every node in this layer is a fixed node, marked by a circle and labeled N, with the node function to normalize the firing strength by calculating the ratio of the i th node firing strength to the sum of all rules’ firing strength. Consequently, wi is given as the normalized firing strength:

230

O. Kaynar et al. / EEST Part A: Energy Science and Research

O3,i � wi �

wi

i � 1,..,4

4

�w k �1

26

(2011)

221-238

(15)

k

Layer 4 (Consequent layer): The node function of the fourth layer computes the contribution of each ith rule’s toward the total output and the function defined as:

O4,i � wi f i � wi ( p i � qi � ri )

i � 1,..,4

(16)

where wi is the i th node’s output from the previous layer. {}{ pi , qi , ri } adjustable parameter set is referred as consequent parameters and also the coefficients of linear combination in Sugeno inference system. Layer 5 (Output layer): The single node computes the overall output by summing all the incoming signals. Accordingly, the defuzzification process transforms each rule’s fuzzy results into a crisp output in this layer. 4

4

O5,i � � wi f i � i �1

�w f i �1 4

i

i

(17)

�w i �1

i

ANFIS uses either gradient descent alone or hybrid learning algorithm, which is the combination of gradient descent and least square estimation (LSE), to adapt the parameters. Gradient method can be used to identify all the parameters in an adaptive network but this method is generally slow and likely to become trapped in local minima. Therefore, Jang [11] proposed two pass hybrid learning algorithm to identify parameters [11]. In the forward pass of hybrid learning algorithm, antecedent parameters ( ai , bi , ci ) are fixed and node outputs go forward until layer 4 and the consequent parameters ( pi , qi , ri ) are identified by the least square estimate approach. When the values of antecedent parameters are fixed, the overall output of system can be expressed as linear combination of consequent parameter. n

f out � � wi f i � w1 f1 � w2 f 2 � w3 f 3 � w4 f 4 i �1

� ( w1 x ) p1 � ( w1 y )q1 � w1 r1 � ( w2 x ) p 2 � (w2 y )q 2 � w2 r2

(18)

� ( w3 x ) p3 � ( w3 y )q3 � w3 r3 � (w4 x) p 4 � ( w4 y )q 4 � w4 r4 based on n entries of training data {}{ xi , y i , f outi } given the values of the premise parameters, Eq. (18) can be expressed in matrix form as:

f � B�

(19)

where � is an unknown matrix, whose elements are in the consequent parameters set. If B matrix is invertible, the least squares estimator � * is given by:

O. Kaynar et al. / EEST Part A: Energy Science and Research

26

(2011)

221-238

� * � B �1 f

231

(20)

otherwise pseude-inverse is used to solve �

*

� * � ( B T B) �1 B T f

(21)

After consequent parameters are determined overall system error is calculated. The overall error measure for n entries training data set is given as: n

n

i �1

i �1

E � � Ei � � (d i � f outi ) 2

(22)

where Ei is the error measure for the i th entry of the given training data set, d i is the desired output of the i th entry and f outi is the output of the ANFIS using the i th entry. In the backward pass, consequent parameters are fixed and the error rate �Ei / �Oi is calculated for i th entry of training data set and for each node output O. By using the chain rule, the derivative of the overall error measure E with respect to � is: ~ n n �E �E �E �O �� � �� ~ �� i �1 �� i �1 O�S �O ��

(23)

where � is any parameter of the premise parameters set. S is the set of nodes whose outputs ~ depend on � and O is a node output belonging to S . Then, premise parameter updated by: �� � ��

�E ��

(24)

The detailed information for these algorithms can be found in Jang et al. [45]. 2. 3. ARIMA models Box-Jenkins model is a statistical forecasting method that has been successfully used in prediction and controlling of single variable time series in short and middle term for a long time. Box-Jenkins approach becomes popular with these features, such as being able to be used with different time series, having strong theoretical basics and success in applications [49]. Box-Jenkins approach is an integration of Autoregressive (AR) and Moving Average (MA) prediction methods and can be applied stationary series. Differencing method is used to provide stationary in nonstationary time series. Autoregressive Integrated Moving Average (ARIMA), which expresses the d order differentiated variable in terms only of its own past value along with current and past errors, is a combination of Autoregressive (AR) and Moving Average (MA) processes. Models are represented as ARIMA (p, d, q) generally. Here, p and q are degrees of Autoregressive (AR) Model and Moving Average (MA) respectively and d is degree of differentiating. ARIMA models can be written as in Eq. (25).

232

O. Kaynar et al. / EEST Part A: Energy Science and Research

Zt � �1Zt�1 � �2 Zt �2 � ... � �p Zt � p � � � at � �1at �1 � �2at �2 � ... � �q at �q

26

(2011)

221-238

(25)

In the model δ is constant, Zt, Zt-1,….., Zt-p and Ф1, Ф2,….. Фp are d order differentiated observations and coefficients for d order differentiated observations, respectively. at, at1,……,at-q represent error terms and Θ1, Θ2,…, Θq represent coefficients for error terms. In ARIMA models, forecasting is conducted in four levels. Firstly, stationary of series is controlled and providing stationary of nonstationary series, appropriate Box-Jenkins model is determined. In the second level, parameters of this model are predicted. Then models correspondence with data set is tested by statistical methods. In the third level, if the model determined in previous level is approved, it is passed to last level; if not, it is turned into first level. In the last level the most appropriate model is used for prediction. 3. Application and results Data used in this study is weekly time series that show natural gas consumption for Turkey between January 2002 and April 2006 and are obtained from Botas A. S. The last 15 observation of data set for MLP, RBF and ARIMA models are used to test predictions. %80 of the rest of data is used for training and %20 for validation in MLP ANFIS models. For RBF, all data except for test data is used for training. Data set is normalized to range [-1, 1] to prevent saturation of hidden nodes before feeding into the neural network and ANFIS models. MATLAB ‘premnmx’ function is used to normalize data. In this study, software is developed using Matlab program for creating artificial neural network and ANFIS models. All created MLP models within the study have three layers architecture. By means of changing the number of neurons in input layer from 1 to 12 and the number of neurons used in hidden layer from 1 to 10, 120 different MLP neural network models are obtained. Linear transfer function is used in the output layer node, whereas tangent-sigmoid transfer function is preferred in the hidden layer nodes. LevenbergMarquardt backpropagation algorithm is used to train all MLP models. Network training parameters, epoch number and goal error rate, which stop training, are chosen 1000 and 0.001 respectively. Besides, to achieve a good accuracy and avoid over fitting, validation vectors are used to stop training early, if the network performance on the validation vectors fails to improve. Feeding network with training data, training process is implemented and the artificial neural network model having minimum squared error (MSE) for testing data is chosen from 120 models. MLP model having 11 input neurons and 2 hidden neurons is determined as the most appropriate model. Two different approaches are used to make prediction with RBF networks. In the first approach as the number of neuron in hidden layer is determined as equal to the number of observation in training data set, reference vectors are equalized with input values in data set. In the second approach, growing method is used to determine number of hidden layer nodes. 30 models are created by increasing the number of neurons in hidden layer from 1 to 30. The center of radial functions in hidden layer is determined by choosing from training data set randomly. However, the appropriate number of input nodes is not easy to determine because it is not known that the value of series at time t is affected by how many past lag values. So, in both approach the number of input neurons is changed from 1 to 12. As a result 12 models are tried for the first approach and 12x30 for the second one. The RBF model having minimum MSE for test data is chosen as the best performance model from created models. For the RBF model having the best performance, the number of input neuron is 1 and the number of hidden layer neuron is 3.

O. Kaynar et al. / EEST Part A: Energy Science and Research

26

(2011)

221-238

233

In constructing ANFIS model, The FIS architecture must be determined before network is trained. This process contains determining of number of membership functions of each input variable, rules and values of parameters belong to these functions. There are two functions in Matlab named genfis1 and genfis2 to determine initial FIS architecture. Basic difference between two functions is about how the rules that partition the input space are created. Genfis1 make grid partition of input space by using all possible combinations of membership functions of each variable. The biggest disadvantage of this method is to produce a lot of rules and correspondingly to have a lot of parameters to be trained. It is considered that there are N numbers of variables, P numbers of membership functions for each variable and L parameters belong to each membership function, P N numbers of rules are created, so P N ( N � 1) numbers of linear parameters(consequent) and NxPxL numbers of nonlinear parameters are needed to be trained. In the case of the number of input variables is big and the data set is small, the situation named curse of dimensionality problem occurs. This causes that parameters are not able to be computed. It is needed to reduce the number of rules to deal with curse of dimension problem. Genfis2 uses subtractive clustering algorithm to reduce number of rules and creates a rule for each cluster. Besides, centers of obtained clusters are initial values of membership functions parameters of input variables. In this study genfis1 and genfis2 rule creating methods are used to determine initial FIS structure and all models are trained by hybrid learning algorithm. In the FIS models which use genfis1 functions, network structures having different lag and the different number of membership functions which is increased starting from two to certain value in the way that curse of dimension problem does not occur, are created. The model having the best prediction accuracy among these models is selected. This model has one input and two membership functions for each input, so two rules. Models that is created by means of genfis2 are formed similarly by increasing the number of input variables starting from 1 to the highest value in the way that curse of dimension problem does not occur. By changing cluster spread value, used in genfis2 function that determines the rules by means of subtractive clustering algorithm, from 0.1 to 1 step by step with the value of 0.05, sub models having different number of valid rules are created. The Model having best performance has seven inputs and three rules. ANFIS model which uses genfis2 function is better than model which uses genfis1 function. Stationary of data set is examined firstly for determining ARIMA model that is suitable for data set. By making nonstationary natural gas consumption time series to be stationary, temporary models are created for determining the best model. ARIMA (4, 1, 1) model is ascertained as the most appropriate model from the temporary models according to AIC, BIC and R2 criteria. In this study, mean absolute percentage error (MAPE) criterion is used to compare various models obtained from ARIMA, MLP, RBF and ANFIS. MAPE is a relative measurement and is easy to interpret. MAPE is also independent of scale, reliable and valid. The smaller the values of MAPE are, the closer the forecasted values are to the actual values. The results are given in Table 1. It can be seen in the table that MAPE values of ARIMA, MLP, RBF and ANFIS models are calculated as 6.410%, 5.477%, 6.186% and 5.468% respectively. The best result is obtained from ANFIS model. Also observation values of data and graph about prediction values are shown in Fig. 4.

234

O. Kaynar et al. / EEST Part A: Energy Science and Research

26

(2011)

221-238

Fig. 4. The graph of observed and predicted values for natural gas consumption.

Table1. Observed and predicted values for weekly natural gas consumption OBSERVED

ARIMA

MLP

RBF

ANFIS

679060 613900 580091 583344 516525 493775 535075 536848 472098 414551 468225 477362 503055 484381 500604 MAPE

659713 664790 621111 596369 590657 525974 497903 527601 525046 482008 436996 461841 464551 483920 495552 6.410%

645465 636458 605377 573474 549467 471664 499303 554054 500118 439668 409451 462676 438897 513767 499130 5.477%

650989 667085 605434 571854 575037 513675 494365 529842 531419 475886 421418 472520 480420 502209 486410 6.186%

651387 645416 596450 583270 578687 512176 496823 518876 512559 473702 436500 472355 459280 487581 482129 5.468%

4. Conclusion In this research the natural gas consumption of Turkey is examined using statistical time series analysis, ANN and ANFIS methods. Data set is weekly natural gas consumption time series that are obtained from official publications in Turkey [50-55]. Different models have been built and best performance model which has lowest RMSE value for training and testing data are determined as final model for each technique. Then predictions are made for last 15 observations by using final models of each technique and prediction accuracies of these models are compared by using MAPE performance criteria.

O. Kaynar et al. / EEST Part A: Energy Science and Research

26

(2011)

221-238

235

The empirical results indicate that MAPE values of final models are very small and very close to each other. So all models provide reasonably good forecasts and they are suitable for weekly natural gas consumption of Turkey. Among the ANN models, MLP model has slightly better performance than RBF model. In ANFIS models, ANFIS final model which use sub clustering algorithm to generate initial FIS structure has better performance than ANFIS final model using grid partition. So sub clustering algorithm can be used to reduce the number of fuzzy rules and parameters to be calculated, to deal with calculation complexity and curse of dimensionality problem. When it is compared to ANFIS and Neural Networks Model, ANFIS has better performance than RBF neural network and approximately same as MLP neural network. Because of the features that ANFIS is less time consuming and more flexible than ANN, by employing fuzzy rules and membership functions incorporating with real-world systems, it can be used as alternate method to Neural network. It can be said that both ANN and ANFIS models outperforms the ARIMA model when their MAPE values are compared. Based on the forecast results, when appropriate network structure and enough number of data are used, ANN and ANFIS, which do not depend on meeting statistical conditions such as the type of relation between variables or the type of data distribution, can be used as an alternative method to statistical methods for prediction of natural gas consumption. In this study natural gas consumption prediction is determined by the help of time series that show only past values of the consumption of natural gas successful predictions are made. Causal models such as regression analysis which uses different variables including temperature, actual price of natural gas or ratio of natural gas price to other energy sources [56-63], the number of residence or office using natural gas for heating, the number of industrial enterprise using natural gas as energy source for production etc., can be employed for natural gas consumption prediction. Thus, different studies can be made by comparing the results of this study and the result of the other studies that consider other factors which affected natural gas consumption.

References [1] Kecebas A, Alkan MA. Educational and consciousness-raising movements for renewable energy in Turkey. Energy Educ Sci Technol Part B 2009;1:157–170. [2] Darici B, Ocal FM. The structure of European financial system and financial integration. Energy Educ Sci Technol Part B 2010;2:133–145. [3] Demirbas A. Energy concept and energy education. Energy Educ Sci Technol Part B 2009;1:85–101. [4] Demirbas A. Concept of energy conversion in engineering education. Energy Educ Sci Technol Part B 2009;1:183–197. [5] Demirbas B. Biomass business and operating. Energy Educ Sci Technol Part A 2010;26:37–47. [6] Demirbas A. Social, economic, environmental and policy aspects of biofuels. Energy Educ Sci Technol Part B 2010;2:75–109. [7] Cybenko G. Approximation by superpositions of a sigmoidal function. Mathematical Control Signals Systems 1989;2:303–314. [8] Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators. Neural Networks 1989;2:359–366. [9] Hornik K. Approximation capability of multilayer feedforward networks. Neural Networks 1991;4:251–257.

236

O. Kaynar et al. / EEST Part A: Energy Science and Research

26

(2011)

221-238

[10] Zhang G, Patuwo BE, Hu MY. Forecasting with artificial neural networks: the state of the art. Int J Forecast 1998;14:35-62. [11] Jang JS. R. ANFIS: adaptive-network-based fuzzy inference system, IEEE Transact Syst Man Cybernetics 1993;23:665–685. [12] Ying L-C, Pan M-C. Using adaptive network based fuzzy inference system to forecast regional electricity loads. Energy Convers Manage 2008;49:205–211. [13] Azadeh A, Saberi M, Gitiforouz A, Saberi Z. A hybrid simulation-adaptive network based fuzzy inference system for improvement of electricity consumption estimation. Expert Syst Applic 2009;36:11108–11117. [14] Sfetsos A. A comparison of various forecasting techniques applied to mean hourly wind Speedy time series. Renew Energy 2000;21:23–35. [15] Nayak P.C, Sudheer K. P, Ragan D.M, Ramasastri K.S. A neuro fuzzy computing technique for modeling hydrological time series. J Hydrol 2004;29:52–66. [16] Chang F-J, Chang Y-T. Adaptive neuro-fuzzy inference system for prediction of water level in reservoir. Adv Water Res 2006;29:1–10. [17] Cheng C-H, Wei L-Y, Chen Y-S. Fusion ANFIS models based on multi-stock volatility causality for TAIEX forecasting. Neurocomputing 2009; 72; 3462–3468. [18] Atsalakis GS, Valavanis KP. Forecasting stock market short-term trends using a neuro-fuzzy based methodology. Expert Sys Appl 2009;36:10696–10707. [19] Kaynar O, Tastan S, Demirkoparan F. Yapay Sinir Agları ile Dogal Gaz Tuketim Tahmini. 10. Ekonometri ve Istatistik Sempozyumu; Erzurum, 2009, 106 (in Turkish). [20] Ivezic D. Short-Term Natural Gas Consumption Forecast. FME Transactions 2006;34:165–169. [21] Garcia A, Mohagheg SD. Forecasting US Natural Gas Production into year 2020: A Comparative Study. Eastern Society of Petroleum Engineers Regional Conference, SPE 91413, 2004. [22] Bolen MS. A New Methodology for Analyzıng and Predicting US. Liquefied Natural Gas Imports Using Neural Networks. Texas A&M University; Master Thesis 2005. [23] Khotanzad A, Elragal H, Lu Tsun-Liang Combination of Artificial Neural-Network Forecasters for Prediction of Natural Gas Consumption. IEEE Transact Neural Networks 2000;11:464–473. [24] Brown RH, Matin L, Kharout P, Piessens LP. Development of artificial neural-network models to predict daily gas consumption. AGA. Forecas Rev 1996;5:1–22. [25] Viet NH, Mandzuik J. Neural and Fuzzy Neural Networks for Natural Gas Consumption Prediction. IEEE XIII Workshop on Neural Networks for Signal Processing 2003, pp. 759–768. [26] Oztemel E. Yapay Sinir Aglari. Istanbul: Papatya Yayincilik, 2003 (in Turkish). [27] Zhang G, Hu M, Patuwo BE, Indro DC. Artificial neural networks in bankruptcy prediction: General framework and cross-validation analysis. Europ J Operat Res 1999;116:16–32. [28] Fausett L. Fundamentals of Neural Networks: Architectures, Algorithms and Applications. Prentice Hall, 1994. [29] Tang Z, Fishwick PA. Feedforward neural nets as models for time series forecasting. ORSA J Computing 1993;5:374–385. [30] Park J, Sandberg IW. Approximation and radial basis function networks. Neural Comput 1993;5:305-316. [31] Broomhead DS, Lowe D. Multivariable functional interpolation and adaptive networks. Complex Systems 1988;2:321–355. [32] Moody J, Darken CJ. Fast learning in networks of locally-tunes processing units. Neural Comput 1989;1:281–294. [33] Park J, Sandberg I.W. Universal approximations using Radial-Basis-Function Network. Neural Comput 1991;3:246-257. [34] Xu K, Xie M, Tang LC, Ho SL. Application of neural networks in forecasting engine systems reliability. Applied Soft Computing 2003;2:255-268. [35] Bianchini M, Frasconi P, Gori M. Learning without local minima in radial basis function networks. IEEE Trans. Neural Networks1995;6:749-755.

O. Kaynar et al. / EEST Part A: Energy Science and Research

26

(2011)

221-238

237

[36] Chen S, Cowan CFN, Grant PM. Orthogonal least squares learning algorithm for radial basis function networks. IEEE Trans. Neural Networks 1991;2:302-309. [37] Sheta AF, Jong KD. Time-series forecasting using GA-tuned radial basis function. Inform Sci 2001;133:221–228. [38] Harpham C, Dawson CW. The effect of different basis functions on a radial basis function network for time series prediction: A comparative study. Neurocomput 2006;69;2161–2170. [39] Rivas VM, Merelo JJ, Castillo PA, Arenas MG, Castellano JG. Evolving RBF neural networks for time-series forecasting with EvRBF. Inform Sci 2004;165:207–220. [40] Yu L, Lai KK, Wang S. Multistage RBF neural network ensemble learning for exchange rates forecasting. Neurocomput 2008;71:3295–3302. [41] Foody GM. Supervised image classification by MLP and RBF neural networks with and without an exhaustively defined set of classes. Int J Remote Sensing 2004;25:3091–3104. [42] Sarimveis H, Doganis P, Alexandridis A. A classification technique based on radial basis function neural Networks. Adv Engin Software 2006;37:218–221. [43] Zhang R, Huang G, Sundararajan N, Saratchandran P. Improved GAP-RBF network for classification problems. Neurocomput 2007;70:3011–3018. [44] Haykin S. Neural Networks - A Comprehensive Foundation. Prentice Hall; 1999. [45] Jang JSR, Sun CT, Mizutani E. Neuro-Fuzzy and Soft Computing, Prentice Hall: United States of America, 1997. [46] Yurdusev MA, Firat M. Neuro-Fuzzy Inference System and Artificial Neural Networks for Municipal Water Consumption Prediction. J Hydroinformat 2009;365:225–234. [47] Mamdani EH, Assilian S. An experiment in linguistic synthesis with a fuzzy logiccontroller. Int J Man-Machine Stud 1975;7:1–13. [48] Takagi T, Sugeno M. Fuzzy identification of systems and its applications to modeling and control. IEEE Transact Syst Man Cybernetics 1985;15:116–132. [49] Frechtling D. C. Practical Tourism Forecasting. Elsevier; 1996. [50] Yumurtaci M, Kecebas A. Renewable energy and its university level education in Turkey. Energy Educ Sci Technol Part B 2011;3:143–152. [51] Ulusarslan D, Gemici Z, Teke I. Currency of district cooling systems and alternative energy sources. Energy Educ Sci Technol Part A 2009;23:31–53. [52] Kirtay E. The role of renewable energy sources in meeting Turkey’s electrical energy demand. Energy Educ Sci Thecnol Part A 2009;23:15-30. [53] Kan A. General characteristics of waste management: A review. Energy Educ Sci Technol Part A 2009:23:55–69. [54] Demirbas AH. Inexpensive oil and fats feedstocks for production of biodiesel. Energy Educ Sci Technol Part A 2009;23:1–13. [55] Sahin Y. Environmental impacts of biofuels. Energy Educ Sci Technol Part A 2011;26:129–142. [56] Demirbas AH. Biofuels for future transportation necessity. Energy Educ Sci Technol Part A 2010;26:13–23. [57] Saidur R. Energy, economics and environmental analysis for chillers in office buildings. Energy Educ Sci Technol Part A 2010;25:1–16. [59] Balat H. Prospects of biofuels for a sustainable energy future: A critical assessment. Energy Educ Sci Technol Part A 2010;24:85–111. [60] Saidur R, Lai YK. Parasitic energy savings in engines using nanolubricants. Energy Educ Sci Technol Part A 2010;26:61–74. [61] Saidur R, Lai YK. Nanotechnology in vehicle’s weight reduction and associated energy savings. Energy Educ Sci Technol Part A 2011;26:87–101. [62] Ertas M, Alma MH. Slow pyrolysis of chinaberry (Melia azedarach L.) seeds: Part I. The influence of pyrolysis parameters on the product yields. Energy Educ Sci Technol Part A 2011;26:143–154.

238

O. Kaynar et al. / EEST Part A: Energy Science and Research

26

(2011)

221-238

[63] Altun S. Fuel properties of biodiesels produced from different feedstocks. Energy Educ Sci Technol Part A 2011;26:165–174.

��������������������������������������������������������������������������� ��������������������������������������������������������������������������������� �����������������������������������������������������