Classification performance using gated recurrent unit recurrent neural

0 downloads 0 Views 486KB Size Report
Neural Network (RNN) to train our model using UK DALE dataset on this field. Besides, we compare our approach to original RNN on energy disaggregation.
Proceedings of the 2016 International Conference on Machine Learning and Cybernetics, Jeju, South Korea, 10-13 July, 2016

CLASSIFICATION PERFORMANCE USING GATED RECURRENT UNIT RECURRENT NEURAL NETWORK ON ENERGY DISAGGREGATION THI-THU-HUONG LE, JIHYUN KIM, HOWON KIM Department of Computer Science and Engineering, Pusan National University, Busan, Republic of Korea. E-MAIL: [email protected], [email protected], [email protected]

Abstract: Energy disaggregation or NILM is the best solution to reduce our consumption of electricity. Many algorithms in machine learning are applied to this field. However, the classification results from those algorithms are not as well as expected. In this paper, we propose a new approach to construct a classifier for energy disaggregation with deep learning field. We apply Gated Recurrent Unit (GRU) based on Recurrent Neural Network (RNN) to train our model using UK DALE dataset on this field. Besides, we compare our approach to original RNN on energy disaggregation. By applying GRU RRN, we achieve accuracy and F-measure for energy disaggregation with the ranges [89%-98%] and [81%-98%] respectively. Through these results of the experiment, we confirm that the deep learning approach is really effective for NILM.

Keywords: Energy disaggregation; NILM; recurrent neural network; Gated Recurrent Unit;

1.

Introduction

Nowadays, the world is focused on reducing our consumption of electricity and concern for other resource such as natural gas, water, and grey water. It is a hard problem for many researchers and developers. There are many technologies that are developed to address these issues. NILM is one of the most effective techniques to save cost. NILM algorithms can determine what appliances are running within a home from the power line analysis. Without a solution to NILM there is no accurate way to filter consumption data, inform about consumption issues, nor present conservation solutions to the homeowner that would allow them to understand their home’s consumption and take action to conserve. There are some researches began in the late 1980s and early 1990s by researchers such as Kitching et al. [14] in 1989 and Hart [15] in 1992. The ability for NILM algorithm to perform load disaggregation accurately is a hard problem, but has the benefit of helping homeowners and occupants conserve energy by knowing how appliances within the home are used and how much energy they

consume. We extend this research following to deep learning technique. In our work, we propose a new approach for NILM algorithm to get higher accuracy. We build a classifier that obtains higher classification performance versus before NILM algorithms. The remaining of this paper is organized as follows: we introduce the related works in Section 2 and give a brief description of GRU RNN in Section 3. In Section 4, we explain about dataset that is created for training and testing model. Next, we take the experiments in Section 5. Finally, we conclude our work in Section 6. 2.

Related Works

One of major task in NILM is to classify what appliance has triggered an event. Many researchers have used many different machine learning algorithms for the classification task. NILM algorithms that use supervised learning have implemented machine learning classifiers in a standard way. The most common supervised learning classifiers used are: Artificial Neural Networks (ANNs), Support Vector Machine (SVM), various Nearest Neighbour (k-NN) algorithms. NILM algorithms that use unsupervised learning classifiers have not been implemented in a standard way. Derivations have been presented and formalized in recent publications with a focus on using different implementations of factorial Hidden Markov Models (FHMM). x ANNs: Chang et al. [5] in 2010 provided a NILM algorithm using an ANN classifier to which they achieved 100% accuracy only if appliances are used one at a time. However, when multiple appliances were used the training and testing accuracy diminished significantly (59% and 39% respectively). x SVM: Figueiredo et al. [6] (2011) used pairwise SVM with a Linear Kernel and were able to disaggregated loads with high accuracy 99%. Kolter et al. [7] (2010) used at SVM with ”a variety of hand engineered features” using low frequency sampling rate (hourly) with a classification accuracy of 59%. x k-NN: in 2010 Gupta el al. [8] obtained 100% accuracy

978-1-5090-0390-7/16/$31.00 © 2016 IEEE 105

Proceedings of the 2016 International Conference on Machine Learning and Cybernetics, Jeju, South Korea, 10-13 July, 2016

using “KNN-based classifiers” but provided no details. Berges et al. [9] used 1-NN with Fourier regression coefficients to classify unseen transient signatures with a 79% accuracy in 2010. x FHMM: H.Kim et al [16] (2010) used a combination of four FHMM variants claimed accuracies of between 69%- 98%. Kolter & Johnson [10] in 2011 trained an FHMM model using Baum-Welch with an accuracy 65%. Kolter and Jaakkola [11] using MAP algorithm obtained an average accuracy of 71% for classifying 7 appliances in 2012. These algorithms and some forms of HMM that use supervised learning cannot get higher performance of classification for energy disaggregation. 3. 3.1.

GRU node consists of two gates, update gate ‫ݖ‬௧ and reset gate‫ݎ‬௧ . Update gate decides how much the unit updates its activation, or content. It is computed in equation (3). Reset gate allows forget the previously computed state, is calculated by equation (4). The hidden layer is computed by equation (6) using ‫ܪ‬௧ which is calculated by equation (5).

Gated Recurrent Unit Recurrent Neural Network Recurrent Neural Network Figure 1. Gated Recurrent Unit

Following Graves in 2012 [1], recurrent neural networks (RNNs) have recently shown attractive method to solve in machine learning tasks. RNN is extension of a traditional neural network, which is able to handle a variable-length sequence input. The variable-length sequence is solved by recurrent hidden layer whose activation at each time in RNN. In formally, given input layer of sequence‫ ݔ‬ൌ ሼ‫ݔ‬ଵ ǡ ‫ݔ‬ଶ ǡ ǥ ǡ ‫ݔ‬௧ ሽ, ݄௧ is hidden layer, output layer ‫ ݕ‬ൌ ሼ‫ݕ‬ଵ ǡ ‫ݕ‬ଶ ǡ ǥ ǡ ‫ݕ‬௧ ሽ , we implement the update of the recurrent hidden layer as follow the equation: ݄௧ =ߪሺܹ௫௛ ‫ݔ‬௧ ൅ ܹ௛௛ ݄௧ିଵ ൅ ܾ௛ ሻሺͳሻ

In the GRU-RNN we use model parameters includin g ‫ݔ‬௧ is the input at time and weight matrices are denote d byܹ௭ ǡ ܹ௥ ǡ ܹு ǡ ܷ௭ ǡ ܷ௥ ǡ ܷ௛ . We need to calculate at these gate following below equations: ‫ݖ‬௧ ൌ ߪሺܹ௭ ‫ݔ‬௧ ൅ ܷ௭ ݄௧ିଵ ሻሺ͵ሻ ‫ݎ‬௧ ൌ ߪሺܹ௥ ‫ݔ‬௥ ൅ ܷ௥ ݄௧ିଵ ሻሺͶሻ

Where ߪ is activation function, commonly use sigmoid function or hyperbolic tangent function, W is weight matric to connect between layers in RNN, b denotes bias vector. The output layer is computed as below formula: ‫ݕ‬௧ =ܹ௛௬ ݄௧ ൅ ܾ௬ ሺʹሻ

‫ܪ‬௧ ൌ ‫݄݊ܽݐ‬൫ܹு ‫ݔ‬௧ ൅ ܷு ሺ‫ݎ‬௧ ݄௧ିଵ ሻ൯ሺͷሻ ݄‫ݐ‬

To train RNN, we use BPTT algorithm [2]. However, it is difficult to train conventional RNNs to capture long-term dependences because of the vanishing gradient problem, is proposed by Bengio [3]. Therefore, Cho et al [4] proposed a new method to address this problem in 2014. It is named as Gated Recurrent Unit (GRU). In Section 3.2 we briefly describe knowledge of GRU. 3.2.

Gated Recurrent Unit Recurrent Neural Network (GRU RNN)

As mentioned in Section 3.1, Cho address the vanishing problem by replacing hidden node in traditional RNN by GRU node. Figure 1 shows the architecture of GRU. Each

4.

ൌ ሺͳ െ ‫ݖ‬௧ ሻ݄௧ିଵ ൅ ‫ݖ‬௧ ‫ܪ‬௧ ሺ͸ሻ

Dataset

We use UK DALE dataset which is proposed by Jack Kelly [12] for our experiment. There are five houses (House 1 to 5) in UK DALE. They record both the whole house mains power demand as well as power demand from individual appliances every six seconds. They are stored in CSV files. The first column is a UNIX timestamp. We construct the training data and testing data from house 1 to house 5. We select the appliances as randomly about single state or multi-state. We change a time window from six seconds to one minute by choosing maximum value of power every minute. After that, we synchronize the time. And then

106

Proceedings of the 2016 International Conference on Machine Learning and Cybernetics, Jeju, South Korea, 10-13 July, 2016

we combine the record files by time. Input files use to power values and output files are binary sequence which shows the state of each appliance. Appliance’s state is denoted by 0 if appliance is OFF, else 1 if appliance is ON. The label of last column is 1 if all combination are operating, else the label of last column is 0. The Figure 2 shows in detail the structure of input file and input file for our training and testing data set.

They are defined as follows: ‫ ݈݈ܽܿ݁ݎ‬ൌ

ܶܲ  ሺ͹ሻ ܶܲ ൅ ‫ܰܨ‬

‫ ݊݋݅ݏ݅ܿ݁ݎ݌‬ൌ ‫ ݕܿܽݎݑܿܿܣ‬ൌ

ܶܲ ൅ ܶܰ  ሺͻሻ ሺܶܲ ൅ ‫ܲܨ‬ሻ ൅ ሺ‫ ܰܨ‬൅ ܶܰሻ

‫ ܨ‬െ ݉݁ܽ‫ ݁ݎݑݏ‬ൌ ʹ ‫כ‬

Figure 2. The structure of input and output files dataset

5.

Experimental result

In this section, we perform experiment to evaluation performance on energy disaggregation. We use evaluation metric to measurement classification performance. Our experiment environment is as follow: x CPU: Intel (R) Core™ i7-4790 CPU @3.60GHz x GPU: NIVIA GeForce GTX 750 x RAM: 8GB x OS: Windows 7 5.1. Evaluation Metric For our results, measure the Precision, Recall, F-measure accuracy and False Alarm Rate (RAR) by using confusion matrix, is shown in Figure 3.

ܶܲ  ሺͺሻ ܶܲ ൅ ‫ܲܨ‬

‫݈݈ܽܿ݁ݎ כ ݊݋݅ݏ݅ܿ݁ݎ݌‬ ሺͳͲሻ ሺ‫ ݊݋݅ݏ݅ܿ݁ݎ݌‬൅ ‫݈݈ܽܿ݁ݎ‬ሻ

False Alarm Rate (FAR): it is the ratio between total number of misclassified intances to the total number of normal instances. ‫ܲܨ‬ ‫ ܴܣܨ‬ൌ ሺͳͳሻ ሺ‫ ܲܨ‬൅ ܶܰሻ 5.2. The hyperparameters set up Hyperparameters are one of factors which effect to get higher performance when we train GRU RNN model. When we change value of hyperparameters, the performance of our model will change. Bengio et al. [13] analyzed the impact of hyperparameters and give us suggest choosing the values of them. To our model, we initial the hyperparameters consists of learning rate, momentum, epochs, hidden layer size. Table 1 shows the values of these hyperparameters which are chosen by manual to training our model. By using these hyperparametetrs we can achieve better the result of classification performance as mention in Section 5.3. TABLE 1. The values of hyperparameters

Name of hyperparameters Hidden layer size Learning rate Momentum Epochs

Value 20 0.01 0.5 700

Figure 3. Confusion matrix

5.3. The result of measurement performance Where, TP is True Positive that denotes the number of classified as ON while they actually were ON. TN is True Negative that denotes the number of classified as OFF while they actually were OFF. FP is False Positive that denotes the number of classified as OFF while they actually were ON. FN is False Negative that denotes the number of classified as ON while they actually were OFF.

appliances appliances appliances appliances

We perform two experiments on two models. First is on GRU RNN model to evaluate classification performance. Second, we implement on RNN model to compare with our model. And then we conclude our model can classify better than conventional RNN. We use confusion matrix to evaluate classification performance of our model as is mentioned in Section 5.1. In section 3, we introduce about how to generate training and testing data from original dataset. The

107

Proceedings of the 2016 International Conference on Machine Learning and Cybernetics, Jeju, South Korea, 10-13 July, 2016

combination as random the number of appliance in each house with maximum is 20 appliances. We compute the accuracy and F-measure for each house. Figure 4 and Figure 5 describe the accuracy and F-measure of increasing the number of appliances using GRU RNN model for each house, respectively. In accuracy case, almost combination are obtained over 80%.

Figure 6. The result of classifying performance using GRU RNN

To confirm our model get better classifier than traditional RNN, we also implement on the same training and testing dataset using RNN model. In Figure 7, we compare accuracy between two models. Obviously, the accuracy of GRU RNN is higher than RNN’s. Figure 4. The accuracy of GRU RNN model for 5 houses dataset

Besides almost houses are achieved over 80% in F-measure case, except some combination in house 1 get over 60% correspond to increasing 10 appliances, 17 appliances, and 18 appliances. Because of in these case have many appliances which are multi-state and similar power consumption.

Figure 7. The result of accuracy from applying two models Similar to compare accuracy, we calculate F-measure from applying two models. F-measure from applying GRU RRN is as well as accuracy. The result is shown in Figure 8.

Figure 5. The F-measure of GRU RNN model for 5 houses dataset

On the other hand, we compute the average of accuracy and F-measure for 5 houses in dataset. In worst case, the accuracy and F-measure achieve 89.49 % and 80.62% respectively. In best case, we obtain 98.13% and 98.3% correspond to accuracy and F-measure. The result in detail is shown in Figure 6.

Figure 8. The result of F-measure from applying two models.

108

Proceedings of the 2016 International Conference on Machine Learning and Cybernetics, Jeju, South Korea, 10-13 July, 2016

Besides we compute FAR to conclude that our model is quite good when the ratio of misclassified is smaller than RNN’s in each house. Table 2 shows the result of FAR of two model when are applied on energy disaggregation. TABLE 2. The result of far from applying two models

Model

Acknowledgements

FAR (%) House 1

House 2

House 3

House 4

House 5

GRU RNN 1.0 13.03 5.8 0.44 0.22 RNN 1.02 13.07 7.88 0.44 1.53 Furthermore, we comparison two models to some criteria such as best, worst and average performance. Table 3 shows that GRU RNN is classification performance well and better than RNN. TABLE 3. The comparision accuracy and f-measure between two models

Criteria

RNN

GRU RNN

Accuracy

F-measure

Accuracy

F-measure

98.13 92.96 87.86

98.34 86.34 77.48

98.13 93.83 89.49

98.3 87.64 80.62

Best Average Worst

In addition, we compare accuracy and F-measure between our algorithm and previous classifier algorithms on the same input as power consumption. The result is shown in Table 4. We once again found that classifying of appliances are well classified by GRU RNN model. TABLE 4. The comparision with other algorithms

Classifier Algorithm

Accuracy (%)

F-measure (%)

Bayes

80-92

-

SVM [17]

75-92

-

HMM [11] [16]

75-87

-

FHMM [18]

-

80-90

FHMM variants [16] FHMM using MAP algorithm [11]

-

69-98

71

-

88-98 89-98

77-98 81-98

RNN GRU RNN 6.

extracting instances from UK DALE dataset. By implementing to RNN model, we confirm our approach outperforms the previous researches in this field. Besides, we compare our model to other classifier algorithms on energy disaggregation and found that our model can predict as well.

Conclusions

In this paper, we implemented energy disaggregation classifier based on GRU RNN and evaluated the performance. For training and testing phase, we generated a dataset by

This paper is supported by the Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. 10043907, Development of high performance IoT device and Open Platform with Intelligent Software). References [1] Graves, “Supervised Sequence Labelling with Recurrent Neural Networks”, Studies in Computational Intelligence, Springer, 2012. [2] Warbos, Paul J., “Backpropagation through time: what it does and how to do it”, Proceedings of the IEEE 78.10, pp.1550-1560, 1990. [3] Bengio, Yoshua, S. Patrice, F.Paolo, “Learning long-term dependences with gradient descent is difficult”, Neural Networks, IEEE Transactions on 5.2, pp.157-166, 1994. [4] K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio, “On the properties of neural machine translation: Encoder-decoder approaches”, arXiv preprint arXiv: 1409.1259, 2014. [5] Chang, H.-H., Lin, C.-L., & Lee, J.-K., “Load identification in nonintrusive load monitoring using steady-state and turn-on transient energy algorithms”, In Computer supported cooperative work in design (cscwd), pp. 27 -32, 2010. [6] Figueiredo, M., de Almeida, A., & Ribeiro, B., “An experimental study on electrical signature identification of non-intrusive load monitoring (nilm) systems”, Adaptive and Natural Computing Algorithms, pp.31–40, 2011. [7] Kolter, J., Batra, S., & Ng, A., “Energy disaggregation via discriminative sparse coding”, In Proc. neural information processing systems, 2010. [8] Gupta, S., Reynolds, M., & Patel, S, “Electrisense: single-point sensing using emi for electrical event detection and classification in the home”, In Proceedings of the 12th acm international conference on ubiquitous computing, pp. 139–148, 2010. [9] Berges, M. E., Goldman, E., Matthews, H. S., & Soibelman, L., “Enhancing electricity audits in

109

Proceedings of the 2016 International Conference on Machine Learning and Cybernetics, Jeju, South Korea, 10-13 July, 2016

[10]

[11]

[12]

[13]

[14]

residential buildings with nonintrusive load monitoring”. Journal of Industrial Ecology, 14(5), pp. 844–858, 2010. Kolter, J., & Johnson, M., “Redd: A public data set for energy disaggregation research”, In Workshop on data mining applications in sustainability (sigkdd), San Diego, ca.Laughman, 2011. Kolter, J., & Jaakkola, T., “Approximate inference in additive factorial HMMs with application to energy disaggregation”, Journal of Machine Learning Research - Proceedings Track, 22, pp.1472-1482, 2012. Jack Kelly, William Knottenbelt, “The UK-DALE dataset, domestic appliance-level electricity demand and whole-house demand from five UK homes”, scientific data, 2015. Bengio, “Practical Recommendations for Gradient-Based Training of Deep Architectures”, Techreport at Université de Montréal, Arxiv report 1206.5533, 2012. Kitching, H., Abbott, R., & Hadden, S., “Requirements for an advanced utility load monitoring system (Tech. Rep.)”, Electric Power Research Inst., Palo Alto, CA

[15] [16]

[17]

[18]

110

(USA); New England Power Service Co., Westborough, MA (USA); Plexus Research, Inc., Acton, MA (USA), 1989. Hart, G., “Nonintrusive appliance load monitoring”, Proceedings of the IEEE, 80(12), pp.1870–1891, 1992. Kim, H., Marwah, M., Arlitt, M., Lyon, G., Han, J., “Unsupervised disaggregation of low frequency power measurements”, In 11th international conference on data mining, pp.747–758, 2010. Kato, T., Cho, H.S., Lee, D., “Appliance Recognition from Electric Current Signals for Information-Energy Integrated Network in Home Environments”, In Proceedings of the 7th International Conference on Smart Homes and Health Telematics, Tours, France, , Volume 5597, pp. 150-157, 1-3 July 2009. Zoha, A., Gluhak, A., Nati, M., Imran, M.A., “Low-power appliance monitoring using Factorial Hidden Markov Models”, IEEE Eighth International Conference on Intelligent Sensors, Sensor Networks and Information Processing, pp.527-532, 2013.