A brief review of recent data mining applications in the ...

7 downloads 43876 Views 174KB Size Report
Mar 31, 2014 - adopted Data Mining techniques in the energy industry. Keywords: Data mining; energy; applications; review. Nomenclature. AMR. Automated ...
April 3, 2014 9:5 WSPC/2335-6804

1450004

International Journal of Energy and Statistics Vol. 2, No. 1 (2014) 49–57 c Institute for International Energy Studies  DOI: 10.1142/S2335680414500045

Int. J. Energy Stat. 2014.02:49-57. Downloaded from www.worldscientific.com by 52.2.249.46 on 09/07/15. For personal use only.

A brief review of recent data mining applications in the energy industry

Mansi Ghodsi The Statistical Research Centre, Executive Business Centre, Bournemouth University, 89 Holdenhurst Road, Bournemouth, BH8 8EB, UK [email protected] Received 5 January 2014 Revised 25 January 2014 Accepted 5 February 2014 Published 31 March 2014 Data Mining has revolutionized the modern world and is rapidly transforming into a key ingredient for the successful discovery and extraction of relationships, hidden patterns and trends in the energy sector. The emergence and availability of Big Data in the energy industry adds to the prolific importance of Data Mining techniques as they provide feasible solutions for mining and exploiting information crucial for management decision making and achieving productivity gains in the energy industry. This succinct review paper seeks to outline the various Data Mining techniques that have recently been adopted for solving energy related problems in the industry. The evidence suggests that a variety of Data Mining techniques have been evaluated for solving energy related problems in the recent past, and these include cluster analysis, classification trees, neural networks, genetic algorithm, dynamic regressions, Bayesian models, support vector machines, k-nearest neighbours, and random forests. This paper finds evidence supportive of the claim that cluster analysis and classification trees are the most frequently adopted Data Mining techniques in the energy industry. Keywords: Data mining; energy; applications; review.

Nomenclature AMR CART DBSCAN BP MLP kWNN GA DR

Automated Meter Reading. Classification and Regression Trees. Density Based Scan. Back Propagation. Multi Layer Perceptron. k-Weighted Nearest Neighbours. Genetic Algorithm. Dynamic Regression.

49

April 3, 2014 9:5 WSPC/2335-6804

Int. J. Energy Stat. 2014.02:49-57. Downloaded from www.worldscientific.com by 52.2.249.46 on 09/07/15. For personal use only.

50

1450004

M. Ghodsi

AR PDF SVM LN RBF PNN GRNN SOM kNN SVR PCA

Auto Regressive. Probability Density Function. Support Vector Machines. Linear Networks. Radial Basis Functions. Probabilistic Neural Networks. Generalized Regression Neural Networks. Self-Organizing Map. K-Nearest Neighbour. Support Vector Regression. Principal Component Analysis.

1. Introduction Data Mining has already been introduced into a variety of diverse fields ranging from chemistry to astronomy [1, 2], and the increasing availability of Big Data in the energy industry adds towards the importance of applying Data Mining techniques in this field. The energy sector in itself is a vital industry to any given economy as it has the potential to aid or halt economic growth given the dependence we have on energy in our day to day lives. A comprehensive coverage and explanations relating to the various definitions of Data Mining and its evolution over time can be found in [3]. In brief, Data Mining is concerned with discovering relationships, hidden trends and patterns [4] which would otherwise have remained unnoticed. The application of Data Mining techniques in the energy industry is far more important today than it was ever before. Firstly, given the increasing uncertainty which continues to intensify the volatility and unpredictability of the energy industry, risk management (i.e. maximising opportunities whilst minimising or preventing associated threats [5]) is imperative towards ensuring the survival and going concern of organizations in the energy sector. Data Mining has the potential of enabling risk management capability in the energy industry by aiding organizations to mine and exploit their data to find previously unknown trends, patterns and relationships which result in lucrative information gains that could enhance their risk management processes. Secondly, following the recent advances in technology and other developments, there has been a surge in the quantity of information available in the energy industry. As the industry moves into high tech, this has resulted in the sector amassing a wealth of information, or Big Data which if analysed using appropriate techniques could uncover useful information for the industry. In fact, a recent poll conducted by KDnuggets shows Big Data to be the hottest Data Mining topic [3]. Thirdly, classical time series analysis and forecasting techniques are incapable of accurately processing the increasingly available Big Data in the energy industry as they have not been programmed to handle such large quantities of information. On the other hand, Data Mining techniques are reputed for their ability of handling Big Data.

April 3, 2014 9:5 WSPC/2335-6804

1450004

Int. J. Energy Stat. 2014.02:49-57. Downloaded from www.worldscientific.com by 52.2.249.46 on 09/07/15. For personal use only.

Data mining in energy: A review

51

In addition to the points mentioned above, the importance of Data Mining in the energy industry can be further asserted to have a variety of reasons along with sound examples. For example, the European electricity industry is evaluating AMR systems which provides a whole range of AMR data which could be used for enabling demand responses and improving daily operations if mined using appropriate techniques [6]. Moreover, in Kim et al. [7] it is noted that the ever increasing volumes of energy related data greatly increases the number of factors that requires consideration which in turn limits the applicability of traditional data analysis methods. Finally, energy data is now unlikely to conform with the parametric assumptions underlying popular time series analysis techniques whilst Data Mining techniques include a range of nonparametric models which enables modelling energy data without incurring any information losses resulting from data transformations. The aim of this review paper is to summarise the recent applications of Data Mining techniques in the energy industry such that the wide range of techniques adopted are captured in the review. Recently, a review of Data Mining and its applications in official statistics was provided in [3] where the authors identify Bayesian regression, decision trees, neural networks, association analysis, genetic programming, inductive learning algorithms, and cluster analysis as the Data Mining techniques which have been evaluated to date. It would be interesting to identify whether there exists differences in the Data Mining techniques applied in the energy sector and official statistics. The remainder of this paper is organized such that a review of Data Mining applications in energy have been presented in Section 2 which is followed by a discussion and conclusion in Section 3. 2. Applications of Data Mining in Energy Presented and summarized in this section are the applications of Data Mining techniques in the energy industry over the years. A brief explanation on the related Data Mining techniques such as cluster analysis, classification methods, and association analysis can be found in [8]. A comprehensive review of Data Mining applications for predicting wind power can be found in [8] up until 2012 and therefore these applications will not be considered in this review as it would constitute a repetition of work already completed. A further review of Data Mining techniques and forecasting problems associated with power systems can be found in [9]. In line with the aim of this paper, the various applications are summarised below based on the related Data Mining technique. 2.1. Cluster analysis Cluster analysis is a method whereby data objects with certain characteristics are grouped into clusters based on their similarities. According to [10], cluster analysis is one of the two most common Data Mining techniques in relation to the discovery of hidden patterns in Big Data. Liu et al. [6] adopts a visual data mining approach alongside clustering to exploit AMR data in order to enhance decision

April 3, 2014 9:5 WSPC/2335-6804

52

1450004

M. Ghodsi

Int. J. Energy Stat. 2014.02:49-57. Downloaded from www.worldscientific.com by 52.2.249.46 on 09/07/15. For personal use only.

making relating to the design of demand response tariffs and differentiated pricing. Their results indicate that the process enables electricity suppliers to gain a better understanding of consumers electricity consumption patterns. K-Means clustering and DBSCAN was adopted in [10] for fault detection analysis in an Italian office building by analysing hourly real time energy and power consumption data. Dent et al. [11] adopted clustering for profiling energy usage in domestic residencies in the UK. K-Means clustering is combined with the SOM technique in [12] to divide the sample into classes which represent various electricity consumption patterns. In [13] the SOM clustering and data visualisation technique is used to classify and aid with demand response policies in new electricity markets.

2.2. Classification trees Khan et al. [10] applied the CART algorithm in combination with clustering methods to analyze real time energy and power consumption data related to an office building in Rome. Decision trees, more specifically the C4.5 algorithm was adopted again in [7] for feature subset selection in order to identify which elements of a building were most likely to improve energy efficiency. A clustering algorithm other than k-means is used in [14] to predict the output PDF related to the power market clearing price in the New England market. Decision trees are used in [15] to extract the rule base to enable constructing a fuzzy system for estimating an electricity demand function. In [12] the C5 decision trees algorithm is exploited for developing a classification model which can group consumers into various electricity consumption classes which were achieved using cluster analysis. Huang et al. [16] evaluates the use of C4.5 decision trees to classify electricity prices by simulating electricity market prices in New York, Ontario and Alberta. In [17] C4.5 decision trees are made use of for optimizing energy consumption management in a building by using this technique to analyse internal and external ambient conditions. A J48 decision tree algorithm (which is effectively the Java version of C4.5) is used in [18] to show that the decision tree technique could be used for classifying various parameters associated with energy dissipation via a gabion-stepped weir.

2.3. Neural networks Neural networks are powerful nonparametric models which are capable of machine learning and pattern recognition. A fuzzy BP neural network model was adopted in [19] to forecast electricity prices in Tamil Nadu. The results were compared against a MLP neural network model which proved that the fuzzy BP model performs better. A variety of neural network models which include LN, MLP, RBF, PNN and GRNN are evaluated in [20] for forecasting system imbalance volumes in electricity markets. They find that no single neural network model can provide optimal results for all market conditions. A three layer MLP is used in [16] (amongst other techniques) for classifying electricity prices. A neural network model is adopted in [21] for wind

April 3, 2014 9:5 WSPC/2335-6804

1450004

Data mining in energy: A review

53

power forecasting, but the authors find that random forests provide comparatively more accurate results.

Int. J. Energy Stat. 2014.02:49-57. Downloaded from www.worldscientific.com by 52.2.249.46 on 09/07/15. For personal use only.

2.4. Genetic algorithm GA is a relatively new component that is considered artificially intelligent and is a popular choice for finding solutions to complex search and optimization problems. In this sense, it is reported that Lora et al. [22] used a GA to estimate the weights for a kWNN algorithm for forecasting hourly market energy prices in Spain. They relied on GA in particular owing to its reputation for tackling optimization problems which in turn enables it to deliver optimal weights which can enhance the proposed kWNN algorithm in [22]. 2.5. Dynamic regressions DR models enable users to consider lagged independent variables in the modelling process. A DR model is used to solve a problem relating to 24-hour energy price forecasting in [22]. The authors compare their results from the DR model with a kWNN algorithm and find that the DR model reports a lower percentage average relative error in comparison to the kWNN algorithm at providing daily energy price forecasts. 2.6. Bayesian models A Bayesian based classification and AR method is used in [14] to forecast the power market clearing price. In [23], Bayesian classification techniques are adopted to mine and forecast electricity market price spikes with an application into Queensland electricity market data whilst in [24] uses a Bayesian classifier to predict the range of the forecasted electricity market price spike. Moreover, in Wu et al. [25] the authors combine Bayesian expert with a Bayesian statistical method to predict electricity price spikes in a regional Chinese electricity market. The Bayesian method was mainly adopted for classifying purposes whilst the Bayesian expert method was used for forecasting. The internal relationships which exist between electricity spikes are mined using Bayesian classification in [23] to enable forecasting price spikes. 2.7. Support vector machines A SVM model is used in combination with a probability classification algorithm and a common forecasting method to determine the probability of electricity market price spikes [24]. In [25] SVM is used in combination with Bayesian expert to forecast electricity prices and spikes in a Chinese regional market. Zhao et al. [26] evaluates a variety of Data Mining techniques such as C4.5, neural networks, Knearest neighbouring methods amongst SVM and finds that SVM is the more appropriate choice of technique for electricity price spike forecasting. The SVM method

April 3, 2014 9:5 WSPC/2335-6804

54

1450004

M. Ghodsi

Int. J. Energy Stat. 2014.02:49-57. Downloaded from www.worldscientific.com by 52.2.249.46 on 09/07/15. For personal use only.

was adopted for interval forecasting of electricity price in [27] where the authors found the SVM technique outperforming GARCH and ARIMA models which are well established time series analysis and forecasting methods. Fugon et al. [21] used SVM wind power forecasting, however the technique fails to outperform Random Forests. A SVR is adopted in [28] to analyse the impact of factors on aggregated electricity vehicle load. In [29] the use of SVM is evaluated with and without preprocessing of the data using PCA for obtaining multi-criteria predictions of wind energy. They find that SVM sans preprocessing using PCA produced slightly better results. 2.8. K-nearest neighbors algorithm This nonparametric method which is closely associated with pattern recognition is used for the purpose of classification and regression analysis. A weighted kNN algorithm was used in [22] to forecast energy prices based on hourly data whilst Zhao et al. [26] evaluated the use of kNN for electricity price forecasting, but it was not seen able to outperform SVM in this case. Moreover, Lora et al. [30] used the kNN technique for obtaining point forecasts of future electricity price. Huang et al. [16] adopted this technique as an option for electricity price classification. They find the kNN technique outperforming a MLP and C4.5 decision trees in terms of the lowest classification error for electricity prices. 2.9. Random forest Random forest is a machine learning technique which builds a series of decision trees in the training process and then selects as its output the modal output by individual decision trees. In [21] the random forest technique was evaluated for wind power forecasting where the authors find this particular method outperforming both neural networks and SVM. In [31] the authors evaluate the use of five Data Mining techniques including random forests and eventually concludes that the random forest technique is best in comparison to bagging, rotation forest, ripper and kNN for predicting the status patterns of wind turbines. 3. Conclusion This review paper attempts to identify and summarise the various Data Mining techniques that have been applied in the energy industry. It is clear that classification trees have been the most popular choice of Data Mining in the energy sector, and according to [3], decision trees in particular represent one of the most widely used Data Mining techniques in the modern age. The second most popular Data Mining technique (based on number of recent applications) in the energy industry appears to be SVM whilst it is closely followed by cluster analysis. Based on the review, the Data Mining models that have currently been evaluated in the energy industry include cluster analysis, classification trees, neural networks,

April 3, 2014 9:5 WSPC/2335-6804

1450004

Int. J. Energy Stat. 2014.02:49-57. Downloaded from www.worldscientific.com by 52.2.249.46 on 09/07/15. For personal use only.

Data mining in energy: A review

55

genetic algorithm, dynamic regression, Bayesian models, support vector machines, k-nearest neighbours and random forests. Interestingly when compared with the recent review of Data Mining applications in official statistics as reported in [3] it is evident that the energy industry appears far more swift and open minded in terms of the wide range of Data Mining applications that have been used to date. For example, techniques such as random forests, dynamic regressions and k-nearest neighbours are yet to be adopted for mining official statistics. With the emergence of Big Data in the energy industry, Data Mining is likely to play a major role in the energy sector over the next decade or two, and it would not be entirely surprising if Data Mining transforms into a mandatory statistical component for the energy sector. Given the recent trends and increased application of Data Mining techniques in energy (as evident through this review paper) it is reasonable to expect further innovative and diverse applications in the near future. References [1] Friedman, J. H. (1997). Data Mining and Statistics: Whats the Connection? In: 29th Symposium on the Interface: Computing Science and Statistics, 14–17 May 1997, Houston, TX, 3–9. [2] Hand, D. J. (1998). Data Mining: Statistics and More? The American Statistician, 52(2), 112–118. [3] Hassani, H., Saporta, G. and Silva, E. S. (2014). Data Mining and Official Statistics: The Past, the Present and the Future. Big Data, 2(1), 1–10. [4] Hassani, H., Gheitanchi, S. and Yeganegi, M. R. (2010). On the Application of Data Mining to Official Data. Journal of Data Science, 8(1), 75–89. [5] Silva, E. S., Wu, Y. and Ojiako, U. (2013). Developing Risk Management as a Competitive Capability. Strategic Change, 22(5–6), 281–294. [6] Liu, H., Yao, Z., Eklund, T. and Back, B. (2012). A Data Mining Application in Energy Industry. In: Proceedings of the 12th Industrial Conference, ICDM 2012, July 13-20, Berlin, Germany. [7] Kim, H., Stumpf, A. and Kim, W. (2011). Analysis of an energy efficient building design through data mining approach. Automation in Construction, 20(1), 37–43. [8] Colak, I., Sagiroglu, S. and Yesilbudak, M. (2012). Data mining and wind power prediction: A literature review. Renewable Energy, 46, 241–247. [9] Negnevitsky, M. and Srivastava, A. K. (2009). An Overview of Forecasting Problems and Techniques in Power Systems. In: IEEE Power & Energy Society General Meeting, PES ’09, 26-30 July 2009, Calgary, AB, 1–4. [10] Khan, I., Capozzoli, A., Corgnati, S. P. and Cerquitelli, T. (2013). Fault Detection Analysis of Building Energy Consumption Using Data Mining Techniques. Energy Procedia, 42, 557–566. [11] Dent, I., Aickelin, U. and Rodden, T. (2011). The Application of a Data Mining Framework to Energy Usage Profiling in Domestic Residencies using UK Data. In: Proceedings of the Research Students Conference on “Buildings Dont Use Energy, People Do?” Domestic Energy Use and CO2 Emissions in Existing Dwellings, 28 June 2011, Bath, UK. [12] Figueiredo, V., Rodrigues, F. and Vale, Z. (2005). An Electric Energy Consumer Characterization Framework Based on Data Mining Techniques. IEEE Transactions on Power Systems, 20(2), 596–602.

April 3, 2014 9:5 WSPC/2335-6804

Int. J. Energy Stat. 2014.02:49-57. Downloaded from www.worldscientific.com by 52.2.249.46 on 09/07/15. For personal use only.

56

1450004

M. Ghodsi

[13] Valero, S., Ortiz, M., Senabre, C., Alvarez, C., Franco, F. J. G. and Gabald´ on, A. (2007). Methods for Customer and Demand Response Policies Selection in New Electricity Markets. IET Generation, Transmission & Distribution, 1(1), 104–110. [14] Li, E. and Luh, P. B. (2001). Forecasting power market clearing price and its discrete PDF using a Bayesian-based classification method. In: Proceedings of the IEEE power engineering society winter meeting, 28 January–01 February, Columbus OH, 1518– 1523. [15] Azadeh, A., Saberi, M., Ghaderi, S. F., Gitiforouz, A. and Ebrahimipour, V. (2008). Improved estimation of electricity demand function by integration of fuzzy system and data mining approach. Energy Conversion and Management, 49(8), 2165–2177. [16] Huang, D., Zareipour, H., Rosehart, W. D. and Amjady, N. (2012). Data Mining for Electricity Price Classification and the Application to Demand-Side Management. IEEE Transactions on Smart Grid, 3(2), 808–817. [17] Gao, Y., Tumwesigye, E., Cahill, B. and Menzel, K. (2010). Using Data Mining in Optimisation of Building Energy Consumption and Thermal Comfort Management. In: 2nd International Conference on Software Engineering and Data Mining (SEDM), 23-25 June, Chengdu, 434–439. [18] Salmasi, F., Sattari, M. T. and Pal, M. (2012). Application of Data Mining on Evaluation of Energy Dissipation Over Low Gabion-Stepped Weir. Turkish Journal of Agriculture and Forestry, 36(1), 95–106. [19] Devi, M. R. and Manonmani, R. (2012). Electricity Forecasting Using Data Mining Techniques in Tamil Nadu and Other Countries - A Survey. International Journal of Emerging Trends in Engineering and Development, 6(2), 295–302. [20] Garcia, M. P. and Kirschen, D. S. (2006). Forecasting System Imbalance Volumes in Competitive Electricity Markets. IEEE Transactions on Power Systems, 21(1), 240–248. [21] Fugon, L., Juban, J. and Kariniotakis, G. (2008). Data Mining for Wind Power Forecasting. In: European Wind Energy Conference & Exhibition EWEC 2008, Brussels, Belgium, 1–6. [22] Lora, A. T., Santos, J. R., Santos, J. R., Exposito, A. G. and Ramos, J. L. M. (2002). A comparison of two techniques for next-day electricity price forecasting. Intelligent Data Engineering and Automated Learning IDEAL 2002 Lecture Notes in Computer Science, 2412, 384–390. [23] Lu, X., Dong, Y. Z. and Li, X. (2005). Electricity Market Price Spike Forecast with Data Mining Techniques. Electric Power Systems Research, 73(1), 19–29. [24] Zhao, J. H., Dong, Z. Y., Li, X. and Wong, K. P. (2007). A General Method for Electricity Market Price Spike Analysis. IEEE Transactions on Power Systems, 22(1), 376–385. [25] Wu, W., Zhou, J., Mo, L. and Zhu, C. (2006). Forecasting electricity market price spikes based on Bayesian expert with support vector machine. Advanced Data Mining and Applications Lecture Notes in Computer Science, 4093, 205–212. [26] Zhao, J. H., Dong, Z. Y. and Li, X. (2007). Electricity Market Price Spike Forecasting and Decision Making. IET Generation, Transmission & Distribution, 1(4), 647–654. [27] Zhao, J. H., Dong, Z. Y., Zhao, X. and Wong, K. P. (2008). A Statistical Approach for Interval Forecasting of the Electricity Price. IEEE Transactions on Power Systems, 23(2), 267–276. [28] Guo, Q., Wang, Y., Sun, H., Li, Z., Xin, S. and Zhang, B. (2012). Factor Analysis of the Aggregated Electric Vehicle Load Based on Data Mining. Energies, 5(6), 2053–2070.

April 3, 2014 9:5 WSPC/2335-6804

1450004

Data mining in energy: A review

57

Int. J. Energy Stat. 2014.02:49-57. Downloaded from www.worldscientific.com by 52.2.249.46 on 09/07/15. For personal use only.

[29] Gill, K. and Moon, D. (2009). Data Mining For Multi-Criteria Energy Predictions. In: Proceedings of the World Congress on Engineering and Computer Science, WCECS 2009, October 20-22, San Francisco, USA. [30] Lora, A., Santos, J., Exposito, A., Ramos, J. and Santos, J. (2007). Electricity Market Price Forecasting Based on Weighted Nearest Neighbors Techniques, IEEE Transactions on Power Systems, 22(3), 1294–1301. [31] Kusiak, A. and Verma, A. (2011). Prediction of Status Patterns of Wind Turbines: A Data-Mining Approach. Journal of Solar Energy Engineering, 133(1), 1–10.