arXiv:1705.05767v1 [cs.OH] 14 May 2017

1 downloads 0 Views 1MB Size Report
May 14, 2017 - RAE: THE RAINFOREST AUTOMATION ENERGY DATASET. FOR SMART GRID METER DATA ANALYSIS. Stephen Makonin⋆†, Z. Jane ...
RAE: THE RAINFOREST AUTOMATION ENERGY DATASET FOR SMART GRID METER DATA ANALYSIS Stephen Makonin?† , Z. Jane Wang? , and Chris Tumpach†

arXiv:1705.05767v1 [cs.OH] 14 May 2017

?

Electrical and Computer Engineering, University of British Columbia, Canada † Rainforest Automation, Inc., Burnaby, Canada

Email: [email protected], [email protected], [email protected] ABSTRACT Datasets are important for researchers to build models and test how well their machine learning algorithms perform. This paper presents the Rainforest Automation Energy (RAE) dataset to help smart grid researchers test their algorithms which make use of smart meter data. RAE contains 72 days of 1Hz data from a residential house’s mains and 24 submeters resulting in 6.2 million samples for each sub-meter. In addition to power data, environmental and sensor data from the house’s thermostat is included. Sub-meter data includes heat pump and rental suite captures which is of interest to power utilities. We also show (by example) how RAE can be used to test non-intrusive load monitoring (NILM) algorithms. Index Terms— dataset, residential, smart grid, disaggregation, NILM 1. INTRODUCTION Datasets are becoming increasingly more relevant when measuring the accuracy of smart grid algorithms and to see how well they might perform in a real-world situation. Testing the accuracy performance with real-world datasets is crucial in this field of research. Synthesized data does not realistically represent an actual dataset as “a real-world dataset would normally have certain complexity that is harder to predict and in many cases can be very difficult to deal with” [1, p. 114]. For smart grid research it is valuable to have public datasets that show how smart meters report aggregate power readings with the accompanying sub-meter data for the different loads that comprise that aggregate reading. This is very true when testing non-intrusive load monitoring (NILM) algorithms [2, 3]. NILM (sometimes referred to as load disaggregation) is a computational approach to determining what appliances are running in a given house (or building) just by examining the aggregate power signal from a smart meter. We introduce The Rainforest Automation Energy (RAE)1 This work was funded in part by NSERC Engage Grant EGP-501582-16. from Harvard Dataverse via doi:10.7910/DVN/ZJW4LC.

1 Download

dataset; a free and publicly available dataset. This dataset contains 59 files and is roughly 7.8 GB in size. There are 24 sub-meters (one for each breaker on the house’s main power panel) sampled at 1Hz which capture eleven electrical data-points (voltage, current, frequency, power factor, real power/energy, reactive power/energy, and apparent power/energy). There are 72 days of captures which resulted in 6.2 million one-second samples for each sub-meter. Mains are sampled as a typical “smart meter communication to inhome display”-rate (per 15s) and there are roughly 414 thousand samples over the 72 days of capture. We also include environmental and sensor data from the house’s thermostat. Figure 1 depicts an arbitrary Sunday (a 24-hour period) to give the reader a visual idea of what the load consumption pattern can look like. In addition to smart grid and NILM this dataset can be used in research that looks at: statistical signal processing and blind source separation, energy use behaviour, eco-feedback and eco-visualizations, application and verification of theoretical algorithms/models, appliance studies, demand forecasting, smart home frameworks, grid distribution analysis, timeseries data analysis, energy efficiency studies, occupancy detection, energy policy and socio-economic frameworks, and advanced metering infrastructure (AMI) analytics. 2. RELATION TO PRIOR WORK Previously, we created a widely used dataset, named the Almanac of Minutely Power dataset (AMPds1 [4] and AMPds2 [5]) which contained data sampled at 1 minute intervals. This new dataset has all power panel circuits sampled at 1Hz. Besides AMPds and this dataset, and at the time of writing this, there exists no other Canadian open public dataset. One of the first and well-known datasets the Reference Energy Disaggregation Data Set (REDD) [6], which was released 2011 (USA homes) has a low-frequency sampling version which has the mains sampled at a higher frequency (1Hz, or per second) than the sub-metered loads (per 3-seconds). It is worth noting a more recent dataset called UK Domestic

Fig. 1. Plot of all loads over 24-hours on Sunday, March 20, 2016. Smart Meter (Itron Open Way)

Data Acquisition (Raspberry Pi 2B)

sampled at 100kHz (USA data). While these datasets provide valuable data for high-resolution applications, we feel that it is a more realistic scenario to use low-frequency sampled data for most smart grid and NILM systems, especially where there is a processor constraint on storage and speed.

ZigBee Smart Energy

3. DATA ANALYSIS

USB RS-232 XML

For the initial release of the RAE dataset we have a single house which is called House 1. We perform an analysis of the data that we collected and describe House 1 in detail below. 3.1. Capture Methodology

In-Home Display (Rainforest EMU2) USB RS-485 MODBUS

USB

Breaker Panel Sub-Metering (PowerScout 24)

Local Storage (USB Drive)

Fig. 2. Diagram of data capturing hardware/setup. Appliance-Level Electricity (UK-DALE) dataset [7] has used this methodology, as well. The RAE dataset has a different approach. We feel it is best to sample the sub-metered loads at a higher sampling frequency to be able to capture interesting features from the appliance’s power signature. Further, we wanted the mains data sampled at a sampling frequency that would be common to most smart meter in-home displays (e.g., Rainforest Automation’s EMU2). The aforementioned datasets (in the area of NILM) are considered low-frequency sampling (≤1Hz) datasets. There do exist high-frequency sampling datasets. REDD does have a high-frequency version of its data. Two such examples are: the Building-Level fUlly labeled Electricity Disaggregation dataset (BLUED) [8], sampled at 12kHz (USA data); and the Controlled On/Off Loads Library dataset (COOLL) [9],

When designing the data capture system for RAE we prioritized the need for accuracy and reliability. Hence, we chose commercial grade metering equipment. We chose to use the Rainforest Automation EMU2 in-home display2 to capture smart meter data. See Table 1 (column IDs 4 and 5) for the data we capture from the EMU2. The EMU2 reads data from a ZigBee enabled smart meter at roughly 15s intervals. To capture sub-meter data we chose a Class 1 branch circuit power meter from DENT, the PowerScout 243 . We had prior experience with using the DENT PowerScout 18 meter. See Table 1 (column IDs 1, 2, and 3) and Table 2 for the data we captured from the PowerScout 24. The PowerScout 24 can monitor up to 24 circuits at a rate of 1Hz. Thermostat data was collected from the EcoBee3 thermostat 4 at 5-minute intervals (a product limitation). Data includes set-points, operation mode (heat/cool and stage), outdoor temperature and wind speed, and inside humidity. Inside temperature and motion is reported from from the thermostat and three remote sensors (living room, basement rec room, and master bedroom). The hardware setup used to capture data for RAE is depicted in Figure 2 and we have released (as open source) the code5 used to capture, store, and convert the raw data. This 2 See

https://rainforestautomation.com/rfa-z105-2-emu-2/. https://www.dentinstruments.com. 4 See https://www.ecobee.com. 5 Code available on GitHub at https://github.com/smakonin/RAEdataset. 3 See

setup is minimal and will allow us to easily install this equipment in a different house to capture data and add it to RAE. Table 1. Measurements captured in Mains files. Column 0 1 2 3 4 5

Description Unix Timestamp (since Epoch) Voltage, Leg 1 (V) Voltage, Leg 2 (V) Frequency (f ) Real Power (P) Real Energy (Pt)

Units s V V Hz W Wh

Table 2. Measurements captured in Sub# files. Column 0 2 4 5 6 7 8 9 10 11

Description Unix Timestamp (since Epoch) Current (I) Displacement Power Factor (dPF) Apparent Power Factor (aPF) Real Power (P) Real Energy (Pt) Reactive Power (Q) Reactive Energy (Qt) Apparent Power (S) Apparent Energy (St)

Units s A ratio ratio W Wh VAR VARh VA VAh

Table 3. Sub-metered loads and appliances. Sub-Meter 1+2 3+4 5+6 7 8 9 10 11 12 13 + 14 15 + 16 17 + 18

19 20 21 + 22 23 24

Description Kitchen Wall Oven Kitchen Counter Plugs Clothes Dryer Upstairs Bedroom Arc-Fault Plugs Kitchen Fridge Clothes Washer Kitchen Dishwasher Furnace and Hot Water Unit (including Furnace Room Plug) Basement Plugs and Lights (including Outside Plugs) Heat Pump Garage Sub-Panel Upstairs Plugs and Lights (including Bathroom Lights and Vent Fan, Smoke Alarms, Living Room Plugs) Basement Blue Plugs (including Entertainment Centre: TV, Amp, DVD, PVR) Bathrooms (including 3 GFCI Plugs, 2 Lights, 1 Vent Fan, Chest Freezer) Rental Suite Sub-Panel Misc. Plugs (including Dining Room, Gas Cooktop, Microwave) Home Office (including Telco, Cable, Net, Security Equipment)

Table 4. Days with and number of missing readings. 3.2. House 1 Overview House 1 has a 200A power service that is a split 1-phase service consisting of two 120V lines (L1 and L2) which is typical for a Canadian house. The house is located in the city of Burnaby, BC, Canada and is 80m above sea level. The house was build in 1955 and had a major renovations in 2005 and 2006. This house is a single detached home with a main floor and a basement having roughly 99.5m2 of livable space. The HVAC system consists of a high efficiency natural gas furnace and 2-stage heat pump. More details about the house can be found in one of our previous publication [5]. Table 3 lists what appliances/loads were monitored by each sub-meter.

Date 2016-03-06 2016-03-13 (Daylight Saving Time) 2016-04-25 2016-05-01

No. of Readings 16 3600 3 2

3.4. Energy Consumption Analysis The three highest consumers of energy that period are HVAC & Heat Pump (570kWh), Plugs & Lights (531kWh), and Rental Suite (430kWh) as shown in Figure 3. Over the 72day period of capture period the smart meter reported a total energy consumption of 1,982kWh. A total of 1,971kWh is reported when each of the 24 sub-meters real energy accumulator is summed up. There is a 11kWh discrepancy which is due to rounding errors in each sub-meter accumulator as each sub-meter reports only whole-watt measurements.

3.3. Missing Readings Data that is missing will be represented by a timestamp then one or more null data-points. For comma-separated value (CSV) files this would mean no data between commas. For example: “1457282030,,,,4.582,38193.4” would mean that in a mains file the voltage and frequency readings are missing. Table 4 summarizes how many sampling times data was missing due to the occasional USB/serial timeout or because of the change to Daylight saving time. At this point we are not sure why these timeouts occurred. One reason might be due to the fact that the USB/serial capture programs ran in Linux user mode rather than as daemons. House 1 is located in the Pacific Timezone and on Sunday, March 13, at 3:00 AM timezones in Canada switched to Daylight saving time.

3.5. Occupancy Notes House 1 has four occupants. Three of the occupants live in the main house and a single occupant resides in the basement rental suite. Of the main house occupants, there are two adults in their early 40s and work from home. The third occupant is an elementary school child. The rental suite occupant is in his 20s and works away from home during the day. During the capturing of House 1 there were some occupant anomalies. During school Spring Break (March 11–28) the child was at home. From (afternoon) March 11 to (early morning) March 21 two additional occupants stayed in the main house’s basement guest room which is also known as the rec room. Additionally, one of the regular main house occupants went on an out-of-town trip (March 17–20).

Home Office 215kWh

Garage 1kWh

Dishwasher 30kWh

Clothes Dryer 87kWh

Clothes Washer 8kWh Fridge 90kWh Wall Oven 10kWh

HVAC & Heat Pump 570kWh

Plugs & Lights 531kWh

Rental Suite 430kWh

Fig. 3. Percent of energy consumed (in kWh) over the 72-day period for a total of 1,971kWh. 3.6. File Organization The file all houses.txt contains summary information on all the houses in the dataset. Each house has a labels file to describe the loads each sub-meter monitored. For House 1, this would contain the sub-meter data in Table 3 and a panel file to depict the house’s power breaker panel that was submetered. Each house can have one or more contiguous sampling blocks (blk). For House 1, see Table 5 for details on the two sampled blocks that were recorded. Each block has the following files associated with it. mains file(s) have smart meter measurements sampled roughly at 15s intervals (see Table 1). power and smeter file(s) contain all real power measurements from mains and sub-meters (good for testing NILM). A number of sub# files, one for each sub-meter, sampled at 1Hz (see Table 2). tstat file(s) that contains data from the house’s thermostat. Table 5. Contiguous reading blocks for House 1. Block No. Date Range 1 2016-02-07 – 2016-02-15 2 2016-03-06 – 2016-05-07 Total number of days

No. of Days 9 63 72

4. NILM EXAMPLE We wanted to demonstrate the RAE dataset being used for testing the accuracy of a NILM algorithm. We used the SparseNILM algorithm [3] for our test. SparseNILM uses a variant of the Viterbi algorithm to find the most likely set of appliances ON (and their power level) in each time period and a rate matching the dataset used – in this case 1Hz. We ran our test on a MacBook Pro (13-inch, Late 2016) having a 3.3GHz Intel Core i7 processor with 16 GB memory. First, removed the rental suite sub-panel power data so that we can test for a single occupancy home. Second, we

picked six high consuming loads (clothes dryer, furnace, heat pump, oven, fridge, and dishwasher) to disaggregate. Third, we trained the algorithm using the data from the first block file (9 days). This resulted in the creation of a 2000-state hidden Markov model (HMM) that modelled all six loads. The training phase took 58 seconds to complete. Next, we tested the accuracy of our HMM by having it disaggregate the data from the second block file (63 days). Testing took 46 minutes to complete disaggregating 5.4 million samples with an average disaggregation time of 330µs per sample. We report overall accuracy results in Table 6. Figure 4 shows the accuracy results of each appliance/load that was disaggregated. Table 6. Overall accuracy results for our NILM test. Accuracy Metric Precision Recall F-score Finite-State F-score (FS-fscore) [10] Normalized Disaggregation Error (NDE) [11] Root-Mean-Square Error (RMSE) Test Results

Score 87.86% 85.01% 86.41% 80.47% 0.71% 62.14

Ground Truth

11.20%

Clothes Dryer

12.01% 26.43% 30.35%

Furnace

43.56% 41.66%

Heat Pump Wall Oven

0.00% 0.07%

Fridge Dishwasher 0.00%

12.54% 11.85% 6.27% 4.07% 12.50%

25.00%

37.50%

50.00%

Fig. 4. Appliance/load specific accuracy results (in percent of total desegregated not of total house). 5. CONCLUSTIONS In this paper we have presented a dataset that smart grid researchers can use when creating models and algorithms that use smart meter and appliance/load data. Sub-meter data includes heat pump and rental suite captures which is of interest to power utilities. Although in the initial release of RAE there is only one house, we are actively assessing other house that can be monitored and added to this dataset. The monitoring system that we present is an accurate and reliable data capture system that can be easily installed in a house to collect data in the same format and frequency. Researchers interested in installing this system and adding data to RAE can contact the lead author.

6. REFERENCES [1] F. Hadzic, H. Tan, and T. S. Dillon, Mining of data with complex structures. Springer, 2011, vol. 333. [2] G. W. Hart, “Nonintrusive appliance load monitoring,” Proceedings of the IEEE, vol. 80, no. 12, pp. 1870– 1891, Dec 1992. [3] S. Makonin, F. Popowich, I. V. Baji´c, B. Gill, and L. Bartram, “Exploiting HMM sparsity to perform online real-time nonintrusive load monitoring,” IEEE Trans. Smart Grid, vol. 7, no. 6, pp. 2575–2585, 2016. [4] S. Makonin, F. Popowich, L. Bartram, B. Gill, and I. V. Baji´c, “AMPds: A public dataset for load disaggregation and eco-feedback research,” in 2013 IEEE Electrical Power Energy Conference, Aug 2013. [5] S. Makonin, B. Ellert, I. Baji´c, and F. Popowich, “Electricity, water, and natural gas consumption of a residential house in Canada from 2012 to 2014,” Scientific Data, vol. 3, no. 160037, pp. 1–12, 2016. [6] J. Z. Kolter and M. J. Johnson, “REDD: A public data set for energy disaggregation research,” in Workshop on Data Mining Applications in Sustainability (SIGKDD), San Diego, CA, 2011, pp. 59–62. [7] J. Kelly and W. Knottenbelt, “The UK-DALE dataset, domestic appliance-level electricity demand and wholehouse demand from five UK homes,” Scientific Data, vol. 2, no. 150007, pp. 1–14, 2015. [8] K. Anderson, A. F. Ocneanu, D. Benitez, D. Carlson, A. Rowe, and M. Berges, “BLUED: A fully labeled public dataset for event-based non-intrusive load monitoring research,” in 2nd Workshop on Data Mining Applications in Sustainability (SustKDD), 2011. [9] T. Picon, M. N. Meziane, P. Ravier, G. Lamarque, C. Novello, J.-C. L. Bunetel, and Y. Raingeaud, “COOLL: Controlled On/Off Loads Library, a Public Dataset of High-Sampled Electrical Signals for Appliance Identification,” arXiv preprint arXiv:1611.05803, 2016. [10] S. Makonin and F. Popowich, “Nonintrusive load monitoring (NILM) performance evaluation,” Energy Efficiency, vol. 8, no. 4, pp. 809–814, 2014. [11] O. Parson, S. Ghosh, M. Weal, and A. Rogers, “NonIntrusive Load Monitoring Using Prior Models of General Appliance Types.” in AAAI, 2012.