A Forecasting Capability Study of Empirical Mode Decomposition for ...

A Forecasting Capability Study of Empirical Mode Decomposition for the Arrival Time of a Parallel Batch System Linh Ngo and Amy Apon Computer Science and Computer Engineering University of Arkansas Fayetteville, Arkansas, USA Abstract This paper demonstrates the feasibility and potential of applying empirical mode decomposition (EMD) to forecast the arrival time behaviors in a parallel batch system. An analysis of the workload records shows the existence of daily and weekly patterns within the workload. Results show that the intrinsic mode functions (IMF), products of the sifting/decomposition process of EMD, produce a better prediction than the original arrival histogram when used in a simple weight-matching prediction technique. Promising applications include the implementation of an EMD/neural network combination.

Key

Words: Empirical Mode Decomposition, forecasting, time series, neural network, workload. 1. Introduction Previous work [1] demonstrated the ability of Empirical Mode Decomposition [2] for the characterization of workloads. While the EMD-based characterization offers better accuracy, the technique requires a nontrivial amount of manual calibration in order to achieve optimal results. Consequently, it is often still more practical to utilize a traditional probabilistic-distribution based technique in characterizing workloads. Aside from trying to improve the performance of EMD-based characterization, another approach that justifies the use of EMD is the possibility of incorporating the IMF (Intrinsic Mode Functions) products of EMD in the workload forecasting process in addition to the characterization process. The results of this research support the use of EMD as a preprocessing technique for data forecasting.

2. Related Work Lo divides the workload management process into three sections: workload characterization, workload forecasting, and workload control [3]. Within workload forecasting, three standard time windows are used by the data processing industry: short-range forecasting (monthly, quarterly), medium-range forecasting (quarterly, annual), and long-range forecasting (long run into the future –

Doug Hoffman Acxiom Corporation, 301 Industrial Blvd. Conway, Arkansas, USA

strategic forecasting). The longer the forecasting range is, the less accurate the forecasting data becomes. The most common approach for data forecasting uses the time series technique [3]. Lowe and Webb point out the difficulties with the two traditional forecasting approaches: model-based and statistical-based techniques. They propose an alternative approach using neural networks. Forecasting results using neural networks on a variety of time series experiments yield more accurate results than the standard linear models [4]. In recent years, the field of data forecasting has seen combinations of time series analysis techniques and neural network techniques. The use of time series analysis helps isolate the patterns hidden inside the time series. Neural networks help create a robust, adaptive, and accurate prediction mechanism. Excellent examples include [5-9]. Since EMD is a time series decomposition technique [2], it is possible to forecast data using EMD to decompose the time series data into IMFs with different frequency levels. These IMFs can be used to parameterize a neural network to produce a set of predicted IMFs. The recombination of these synthetic IMFs produces the forecast data. The application of EMD to the forecasting process has been applied previously in non-computing fields. Hamad predicts the short-term travel time on freeways using a combination of EMD and neural networks [10]. Wang and Liu combine EMD and a statistical based learning machine SVM (Support Vector Machine) to predict the content of silicon in hot metal [11]. Li and Wang utilize EMD and a traditional time series analysis technique, ARMA, to forecast the shortterm wind speed for a wind farm [12]. While EMD has proven to be helpful in terms of prediction accuracy in the above examples, the predictions above either belong to a predetermined range (traffic speed) or have a known upper limit value (silicon content and wind speed). The data of interest in this research belongs to a production cluster of computers and exhibits nonlinear and non-stationary characteristics. The data also contains a large number of arrival patterns due to the complex work schedule of the cluster users. This, in addition to the lack of theoretical basis of EMD [13], provides the motivation for a careful analytical study of

the relationship between cluster workload data and EMD, with potential for developing an improved workload forecasting technique.

3. Workload Data Fifteen months of data from January 2006 until March 2007 from one of Acxiom Corporation’s production clusters provides the data for this project. This particular cluster contains 128 heterogeneous computing nodes. Each arriving job requests a certain number of nodes. These requests are allocated based on the availability of nodes and the policy of the scheduler. Upon having their nodes allocated, the jobs run mutually exclusively until completion, at which time they release the nodes back to the cluster for reallocation. Figure 1 shows the conceptual architecture. A detailed study of this cluster has been reported [14]. For this study, the months containing abnormal data are removed. This is to exclude the effects of several routine maintenance tasks throughout the year. It reduces the effects of workload flurries [15] since hourly histograms are to express the nature of workload arrivals. From the remaining months, the data from April, May, and June 2006 are chosen as a continuous set of arrival data to be analyzed in detail by the EMD technique.

-

cluster may not follow the traditional Monday through Friday schedule. In Fig. 3, during the month of June 2006, arrivals tend to decrease on Friday and Saturday and increase on Sunday. This is not an abnormality for a single day. Rather, the Sunday pattern is maintained throughout the entire month. The two behaviors above also occur for the other days of a week. Thus, it is reasonable to assume that the same days of different weeks exhibit similar arrival patterns. Fig. 4 shows an example of this observation for all Wednesdays of June 2006.

Figure 2: Arrival Time Histogram Comparison (Wednesday and Thursday, last week of May 2006)

Figure 1: The Conceptual Architecture of the Acxiom Production Cluster [7] Based on the experimental data set (April-June 2006), the following observations can be made from the daily arrival histograms: The daily arrival patterns are similar in shape but exhibit different magnitudes in peaks. For example, Fig. 2 demonstrates the similarities in the arrival patterns between Wednesday and Thursday of the final week of May 2006: The number of arrivals starts to increase around 08:00AM, dip around noon, and gradually decrease after 06:00PM. - Contrary to a normal working schedule, the workload data shows that jobs arriving to the

Figure 3: Arrival Time Histogram Comparison ( Sundays, June 2006)

Figure 4: Comparison of all Wednesdays, June 2006 While the above observations are only made during the months of May and June 2006, it is not unreasonable to assume that the remaining months of the data set also exhibit similar arrival patterns. It is noted that the first month of the data set, January 2006, only had 16,984 jobs while the last month of the set, March 2007, had 42,353 jobs. This rate of increase is not monotonic throughout the data set. Further analysis of this behavior is outside the scope of this work.

4. EMD Application Based on the observations of the data in the previous section, the workload in the succeeding day is predicted from: - The arrival information of the previous day - The differences between the similar two days of the previous week. A simple weight-based algorithm is used. By applying EMD on the arrival histogram, several IMF sets and a trend set are generated. This section explores whether the IMFs and trend patterns are appropriate to use for prediction by comparing the obtained predictions to the prediction results from using the histograms as inputs. The details of the IMF sifting technique can be found in the original paper by Huang [2]. A study of the application of EMD to this particular experimental data set is discussed in [1].

4.1. Validation and Calibration The validation of a characterization process compares the similarity of the real data against the synthetic data. Validation can be performed immediately upon generation

of the synthetic data. In contrast, validation of a forecasting process is done by comparing the predicted data against the future data. This can only be done after the future has happened. While this might not seem to be difficult with traditional time series and probabilistic distributions, their timing functions can be extended into the future, it is different for EMD. Since the sifting process of EMD relies on empirical data, in addition to validating predicted data against actual data, it is necessary to show that the IMFs generated by the actual data resemble the IMFs generated by the predicted data. In a sense, this attemps to asnwer the question whether a synthetically generated IMF would work the same way as a timing function extended into the future. This provides a guarantee for validating and calibrating the forecasting system. To test the above requirement, 10,000 consecutive jobs are randomly taken from the data set. These jobs are divided into two groups, the first group contains jobs 1 to 5000 and the second group contains jobs 4153 to 10,000. The purpose of the overlapping jobs is to check for the possibility of continuity of IMFs on data subsets. The EMD process is applied to the two groups as well as the original 10,000-job group. The IMFs from each group are compared against each other. After the sifting process, the followings observations are mde: - Group 1 generates only 4 IMFs while group 2 generates 6 IMFs. The overall group generates 7 IMFs. - When the first four IMFs with highest frequencies from each group are compared against each other, they match up visually even at the overlapping periods (Figure 5). - Due to the differences in the number of IMFs within each group, only the trends are compared initially. While producing less IMFs than the second group, the first group exhibit better matching trends to the overall data set. - When the trend of the first group is compared against the sum of the trends and the unused IMFs of the second group and the overall group, the results are similar to the first observation. These results show that IMFs from the EMD sifting process capture a sense of continuity. This forms the basis for the validation and calibration of an EMD-based forecasting system.

4.2. Estimation and Prediction The prediction algorithm is divided into two parts: estimation and prediction. First, the weights are calculated from the estimation source and target dates. Second, these weights are applied to the prediction source in order to calculate the prediction target. Let n represent the number of hour indices included in

the process. The value of n is identical for the source estimation day, the target estimation day, the source prediction day, the target prediction day, and for all the IMFs and generated trends from the above dates. Let m represent the number of IMFs generated by the EMD. It is not guaranteed that the number of IMFs generated by the estimation source and prediction source will be identical. For this feasibility study, when the numbers are different, the set having the larger number will be reduced by adding the low frequency components to the final trend until the two numbers of IMFs are identical. Let imf ij represent the value of IMF j for the hour index i , and let wij represent the weight of hour index i and IMF j with i ∈ [0, n − 1] and j ∈ [0, m − 1] . The value of wij is between 0 and 1, and the modification step α of wij is 0.1. Initially, the values of all the weights are set to 1. s t s t Let ei , ei , pi , pi represent the source estimation, target estimation, source prediction, and target prediction of the hour index i, respectively. Letl imf ij represent the value of IMF j for the hour index i Fig. 6 shows the estimation algorithm. With the weights acquired from the estimation process, the prediction target using the IMFs of the prediction source can be calculated: m

pi = ∑ wij imf ij j =0

To evaluate the accuracy of the algorithm, the Mean Average Percentage Error (MAPE) for the estimation and prediction processes is used: n−1

ei − eit

i =0

eit

n −1

pi − pit

i =0

pit

MAPEe = ∑ MAPE p = ∑

for i = 0 to n-1 m

ti = ∑ wij imf ij Figure 5: Comparing IMFs of adjacent groups. 1.1, 1.2, 1.3, and 1.4 are the IMFs of the first group (0-5000) - 2.1, 2.2, 2.3, and 2.4 are for the second group (4193-10,000), and 3.1, 3.2, 3.3, and 3.4 are for the overall group (010,000)

j =0

if ( ti > ei ) do for j = 0 to m-1 if ( imf j >= 0) t

wij = wij - α

m

t 'i = ∑ wij imf ij j =0

if ( t 'i > ei ) continue t

else wij = wij + α until no wij can be further reduced if ( ti < ei ) do for j = 0 to m-1 if ( imf j < 0) t

wij = wij - α m

t 'i = ∑ wij imf ij j =0

if ( t 'i < ei ) continue t

else wij = wij + α until no wij can be further reduced Figure 6: Algorithm

IMF

Estimation

Process

4.3. Prediction Results The algorithm was initially applied to May and June 2006 data, with the estimation source and target being the last Wednesday and Thursday of May. The prediction source and target are the first Wednesday and Thursday of June. The EMD process was applied on the arrival data for 24 hours of those days. Both the estimation and prediction sources generated two IMFs and one trend. In the following reported experiments, the same estimation and prediction sources and targets are maintained. However, the range of data is extended on which the EMD process is applied. In one experiment, the data range includes the full month of May until the first Wednesday of June. In the other, the data range spans April and May until the first Wednesday of June. In both experiments, the number of IMFs of the estimation and prediction sources of each experiment are identical. A comparison point is also included: the prediction result based on the original histogram sources and targets for Wednesday and Thursday of May with no EMD applied. That is, the weight is simply calculated by the ratio of the estimation source and target. Table 1 shows the numerical results of the above experiments. Experiment 1 is the expriment using the histogram to predict the first Thursday of June 2006 based on: 1) the data from the first Wednesday of June 2006, and 2) the last Wednesday and Thursday of May 2006.

Experiment 2 also uses the same data set, but this time the days are decomposed to IMFs using the EMD technique. Experiment 3 and 4 also performed a prediction with a 3:1 ratio, using the IMFs of the first Wednesday of June and the last Wednesday and Thursday of May to predict first Thursday of June. However, the ranges of the empirical data used by the EMD process are extended. For experiment 3, the data range include the full month of May until the first Wednesday of June. Experiment 4 includes the full month of April 2006. While experiments 3 and 4 have a lot more data, all four experiments use the same size source/target pairs: the last Wednesday and Thursday of May and the first Wednesday of June to predict the first Thursday of June. The only difference between the experiments 1-2 and 3-4 is the size of the preprocessed data from which the partial IMFs containing the sources and targets are derived. From Table 1, it is clear that all experiments using EMD offer a better prediction result than the one using only the original data. In addition, the fourth experiment with more IMFs produced better Mean Average Percentage Error (MAPE) than the third experiment. It is not clear whether the second experiment should be compared to the third and fourth experiments, since the number of IMFs of the second experiment is significantly lower, as well as the estimation MAPE. The higher level of accuracy in the second experiment’s prediction MAPE might be due to the higher level of difference in the estimation MAPE, while in the third and fourth experiment, the estimation MAPEs are actually very accurate (less than 5%). Table 1: Estimation and Prediction Evaluations for Different Data Range Experiment 1 2 3 4

IMF Count 0 2 10 12

Estimation MAPE 0.0% 13.68% 2.65% 0.9%

Prediction MAPE 53.89% 32.27% 39.08% 36.2%

4. Conclusion With the error rates calculated by the prediction MAPE decrease noticeably with the increased number of IMFs due to a longer data length, these results show promising potential in utilizing EMD as a data processing platform before applying a prediction technique. By separating the original histograms into the intrinsic mode functions, EMD creates an improved set of data for prediction purposes. With better predictive tools, it is possible to increase the accuracy of EMD-based predictions. The future directions of this research include: - Development of an optimal predictive tool for

-

-

-

such experimental data sets. Neural network seems to be a promising choice. Investigation of alternative time series decomposition techniques such as Fourier or wavelet analysis in comparison with the EMD technique. Investigations into the effects that the end points have upon the generation of the IMFs as well as the possibilities of reducing the differences between endpoints of adjacent data sets. For example, different spline fitting techniques that reduce the inaccuracy of end points should be investigated. Implementation of a complete workload characterization and forecasting software package. With the amount of manual tuning required for optimal performance for different data sets, it is imperative that the characterization and forecasting process of EMD be encapsulated into a software package that provides users with an easy-to-customize interface. A promising candidate for the construction of such a package is MatLab Tools, which contains mathematical formulas for EMD as well as the architectural components for neural networks.

References [1] L. Ngo, B. Lu, H. Bui, A. Apon, N. Hamm, L. Dowdy, D. Hoffman, and D. Brewer. “Application of Empirical Mode Decomposition to the Arrival Time Characterization of a Parallel Batch System Using System Logs.” In Proceedings of the 2009 International Conference on Modeling, Simulation, and Visualization Methods. July 2009. [2] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N. Yen, C. C. Tung, and H. H. Liu, “The Empirical Mode Decomposition and the Hilbert Spectrum for Nonlinear and Non-stationary Time Series Analysis,” Royal Society of London Proceedings Series A, Vol. 454, Issue 1971, March 1998. [3] T. L. Lo. “The Evolution of Workload Management in Data Processing Industry: A Survey.” In Proceedings of 1986 ACM Fall Joint Computer Conference. 1986 [4] D. Lowe and A.R. Webb. “Time Series Prediction by Adaptive Networks: A Dynamical Systems Perspective.” In

IEEE Proceedings-F. February 1991. [5] N. Toda and S. Usui. “A Numerical Approach for Estimating Higher Order Spectra Using Neural Network AutoRegressive Model.” In Proceedings of Neural Networks for Signal Processing. 1995. [6] R. Drossu and Z. Obradovic. “Rapid Design of Neural Networks for Time Series Prediction.” In IEEE Computational Science and Engineering. Vol 3. Issue 2. June 1996. [7] T. Matsumoto, H, Hamagishi, and Y. Chonan. “A Hierarchical Bayes Approach to Nonlinear Time Series Prediction with Neural Nets.” In Proceedings of the 1997 International Conference on Neural Networks. 1997 [8] X. Zhao, J. Lu, W. Ptranto, and T. Yahagi . “Nonlinear Time Series Prediction Using Wavelet Networks with Kalman Filter Based Algorithm.” In Proceedings of the 2005 IEEE International Conference on Industrial Technology. 2005 [9] G. Gomes, A. Maia, T. Ludermir, F. Carvalho, and A. Araujo. “Hybrid Model with Dynamic Architecture for Forecasting Time Series.” In Proceeding of the 2006 International Joint Conference on Neural Networks. 2006. [10] K. Hamad, “Hybrid Empirical Mode Decomposition-Neuro Model for Short-Term Travel Time Prediction on Freeways.” Ph.D. Dissertation. University of Delaware. 2004. [11] Y. Wang and X. Liu. “Prediction of Silicon Content in Hot Metal Based on the Combined Model of EMD and SVM.” In Proceedings of the Second International Symposium on Intelligent Information Technology Application. 2008. [12] R. Li and Y. Wang. “Short-Term Wind Speed Forecast for Wind Farm Based on Empirical Mode Decomposition.” In Proceedings of the International Conference on Electrical Machines and Systems. 2008. [13] S. Kizhner, K. Blank, T. Flatley, N. E. Huang, D. Petrick, and P. Hestnes. “On Certain Theoretical Development Underlying the Hilbert-Huang Transform.” In Proceedings of the IEEE Aerospace Conference. 2006. [14] B. Lu, A. Apon, L. Dowdy, F. Robinson, D. Hoffman, D. Brewer. “A Case Study on Grid Performance Modeling.” In Proceedings of the 2006 International Conference on Parallel and Distributed Computing and Systems. 2006. [15] D. Tsafarir, D. G. Feitelson. “Instability in Parallel Job Scheduling Simulation: The Role of Workload Flurries.” In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS). April 2006.