Exponential smoothing model selection for forecasting

International Journal of Forecasting 22 (2006) 239 – 247 www.elsevier.com/locate/ijforecast

Exponential smoothing model selection for forecasting Baki Billah a, Maxwell L. King b, Ralph D. Snyder b,*, Anne B. Koehler c,1 a

Department of Epidemiology and Preventive Medicine, Monash University, VIC 3800, Australia Department of Econometrics and Business Statistics, Monash University, VIC 3800, Australia Department of Decision Sciences and Management Information Systems, Miami University, Oxford, OH 45056, USA b

c

Abstract Applications of exponential smoothing to forecasting time series usually rely on three basic methods: simple exponential smoothing, trend corrected exponential smoothing and a seasonal variation thereof. A common approach to selecting the method appropriate to a particular time series is based on prediction validation on a withheld part of the sample using criteria such as the mean absolute percentage error. A second approach is to rely on the most appropriate general case of the three methods. For annual series this is trend corrected exponential smoothing: for sub-annual series it is the seasonal adaptation of trend corrected exponential smoothing. The rationale for this approach is that a general method automatically collapses to its nested counterparts when the pertinent conditions pertain in the data. A third approach may be based on an information criterion when maximum likelihood methods are used in conjunction with exponential smoothing to estimate the smoothing parameters. In this paper, such approaches for selecting the appropriate forecasting method are compared in a simulation study. They are also compared on real time series from the M3 forecasting competition. The results indicate that the information criterion approaches provide the best basis for automated method selection, the Akaike information criteria having a slight edge over its information criteria counterparts. D 2005 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved. Keywords: Model selection; Exponential smoothing; Information criteria; Prediction; Forecast validation

1. Introduction The exponential smoothing methods are relatively simple but robust approaches to forecasting. They are widely used in business for forecasting demand for inventories (Gardner, 1985). They have also per* Corresponding author. Fax: +61 613 9905 5474. E-mail addresses: [email protected] (B. Billah), [email protected] (M.L. King), [email protected] (R.D. Snyder), [email protected] (A.B. Koehler). 1 Fax: +1 513 529 9689.

formed surprisingly well in forecasting competitions against more sophisticated approaches (Makridakis et al., 1982; Makridakis & Hibon, 2000). Three basic variations of exponential smoothing are commonly used: simple exponential smoothing (Brown, 1959); trend-corrected exponential smoothing (Holt, 1957); and Holt–Winters’ method (Winters, 1960). A distinctive feature of these approaches is that a) time series are assumed to be built from unobserved components such as the level, growth and seasonal effects; and b) these components need to be adapted over time when demand series display the effects of

0169-2070/$ - see front matter D 2005 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.ijforecast.2005.08.002

240

B. Billah et al. / International Journal of Forecasting 22 (2006) 239–247

structural changes in product markets. As these components may be combined by addition or multiplication operators, 24 variations of the exponential smoothing methods may be identified (Hyndman, Koehler, Snyder, & Grose, 2002). Given this proliferation of options, an automated approach to method selection becomes most desirable (Gardner, 1985; McKenzie, 1985). Hyndman et al. (2002) provided a statistical framework for exponential smoothing based on the earlier work of Ord, Koehler, and Snyder (1997). The framework incorporated stochastic models underlying the various forms of exponential smoothing and enabled the calculation of maximum likelihood estimates of smoothing parameters. It also enabled the use of Akaike’s information criterion (Akaike, 1973) for method selection. One issue not addressed was the preference for Akaike’s information criterion over possible alternatives such as Schwarz (1978), Hannan and Quinn (1979), Mallows (1964), Golub, Heath, and Wahba (1979), and Akaike (1970). One aim, therefore, is to determine whether Akaike’s information criterion (AIC) has a superior performance compared to its alternatives. Given that it was developed to minimise the forecast mean squared error, it might be hypothesised that the AIC has a natural advantage over the alternatives in forecasting applications, except possibly for Akaike’s FPE which is asymptotically equivalent to the AIC. The exponential smoothing methods were traditionally implemented without reference to a statistical framework so that other approaches were devised to resolve the method selection problem. Prediction validation (Makridakis, Wheelwright, & Hyndman, 1998) is one such approach. The sample is divided into two parts: the fitting sample and the validation sample. The fitting sample is used to find sensible values for the smoothing parameters, often with a sum of squared one-step ahead prediction error criterion. The validation sample is used to evaluate the forecasting capacity of a method with a criterion such as the mean absolute percentage error (MAPE). Another approach applies a general version of exponential smoothing on the assumption that it effectively reduces to an appropriate nested method when this is warranted by the data. Trend corrected exponential smoothing is applied to annual time series; Winter’s method is applied to sub-annual time series. A second

aim is to gauge the effectiveness of these traditional approaches relative to the information criterion approach to method selection. The plan of this paper is as follows. State space models for exponential smoothing and an approach to their estimation are introduced in Section 2. Criteria to be used in model selection and a measure for comparing resulting forecast errors are explained in Section 3. A simulation study is discussed in Section 4. An application of the model selection criteria to the M3 competition data (Makridakis & Hibon, 2000) is given in Section 5. The paper ends with some concluding remarks in Section 6.

2. State space models The state space framework in Snyder (1985), and its extension in Ord et al. (1997), provides the basis of an efficient method of likelihood evaluation, a sound mechanism for generating prediction distributions, and the possibility of model selection with information criteria. Important special cases, known as structural models, that capture common features of time series such as trend and seasonal effects, provide the foundations for simple exponential smoothing, trend corrected exponential smoothing and Holt– Winters’ seasonal exponential smoothing. Of the 24 versions of exponential smoothing found in Hyndman et al. (2002), the scope of this study is limited to three linear cases. The focus is on a time series that is governed by the innovations model (Snyder, 1985): yt ¼ hVxt1 þ et

ð2:1Þ

xt ¼ Fxt1 þ Aet :

ð2:2Þ

Eq. (2.1), called the measurement equation, relates an observable time series value y t in typical period t to a random k-vector x t1 of unobservable components from the previous period. h is a fixed k-vector, while the e t , the so-called innovations, are independent and normally distributed random variables with mean zero and a common variance r 2. The inter-temporal dependencies in the time series are defined in terms of the unobservable components with the so-called transition equation (2.2). F is a fixed k k dtransitionT matrix and a is a k-vector of smoothing parameters.


The following special cases, termed structural models, provide the statistical underpinning for common forms of exponential smoothing. The models most naturally relate to the error correction versions of exponential smoothing (Gardner, 1985), but also underpin the more traditional and equivalent dweighted averageT versions of the method. ! Local Level Model (LLM) y t = S t1 + e t where S t is a local level governed by the recurrence relationship S t = S t1 + ae t where 0 V a V 1. It underpins the simple exponential smoothing method (Brown, 1959). ! Local Trend Model (LTM) y t = S t1 + b t1 + e t where b t is a local growth rate. The local level and local growth rates are governed by the equations S t = S t1 + b t1 + ae t and b t = b t1 + be t , respectively, where 0 V a V 1 and 0 V b V a. Note that aV = [a b]. This model underpins trend corrected exponential smoothing (Holt, 1957). ! Additive Seasonal Model (ASM) y t = S t1 + b t1 + s tm + e t , where s t is the local seasonal component. The local level, growth and seasonal components are governed by S t = S t1 +b t1 + ae t , b t = b t1 + be t and s t = s tm + ce t , respectively, where m is the number of seasons in a year, 0 V a V 1, 0 V b V a, and 0 V c V 1 a. In this case xVt = [S t b t c t . . . c tm + 1] and aV = [a b c 0. . .0]. This model is the basis of Holt–Winters’ additive method (Winters, 1960). Traditionally, the smoothing parameters a, b and c were set to fixed values determined subjectively by users on the basis of personal experience. The studies of Chatfield (1978) and Bartolomei and Sweet (1989) show that this can be problematic and that parameters are best estimated from data. Ord et al. (1997) recommend that estimates of the parameters be obtained by maximizing the conditional log-likelihood. For the class of linear state space models (2.1) and (2.2), the conditional likelihood function based on a sample of size n is given by n n n 1 X logL ¼ logð2pÞ log r2 2 e2 2 2 2r t¼1 t ð2:3Þ where the errors are calculated recursively with a general linear form of exponential smoothing defined by the relationships e t = y t hVx t1 and x t = Fx t 1 +a e t for t = 1,2,. . ., n.

241

This conditional likelihood is not only a function of a but also of the unobserved random vector x 0. Following a tradition from exponential smoothing, the seed state vector x 0 is approximated with plausible heuristics such as the following: Seed level estimate equals average of first three observations in sample. Fit a linear trend line to the first five observations. The seed level is set equal to its intercept. The seed rate is set equal to its slope. Local seasonal Fit a linear trend line with seasonal dummy model variables to the first two years of data. The seed level and rate are chosen as before. The seasonal factors are set equal to coefficients of the seasonal dummies after normalisation to sum to zero. Local level model Local trend model

The resulting heuristic estimates xˆ 0 of x 0 are entered into the conditional likelihood function, so that the latter is then only maximized with respect to the parameter vector a. The exact likelihood function could potentially be obtained by integrating x 0 out of (2.3). The conditional likelihood is used instead because the state variables in the models are generated by non-stationary processes so that the seed state vector has an improper unconditional distribution. The situation is similar to one considered in Bartlett’s paradox (Bartlett, 1957) from Bayesian statistics. Exact likelihood values for models with different state dimensions are non-comparable and information criteria based on them are therefore also non-comparable. This is not an issue for the conditional likelihood because the use of an improper unconditional distribution of the seed state vector is avoided. It is for this reason that the conditional likelihood is used instead of the exact likelihood for estimating the parameter vector a. ˆ , the exponenOn obtaining the estimates xˆ 0 and a tial smoothing algorithm is used to obtain the corresponding estimate xˆ n of the state vector at the end of the sample. Point forecasts are then generated recursively with the equations yˆn ( j) = hVx n+j1 and xˆ n+j = Fxˆ n+j1 for j = 1,2,. . ., r where r is the prediction horizon.

3. Model selection approaches and a measure for comparing them An information criterion has the general form ˆ ) p(n,q), where p(n,q) is the so-called log L(a

242


penalty function, q being the number of free parameters. Various forms of the penalty have been suggested, as may be seen from Table 1. Note that q* is the number of free parameters in the smallest model that nests all models under consideration and c = n q*. No clear theory exists for deciding which of these information criteria is best suited for choosing the appropriate method of exponential smoothing. Thus, a simulation study was undertaken to compare them. The simulation also included a comparison with two other approaches for model selection. The prediction validation approach (Val) selects the model with the smallest mean absolute percentage error (MAPE) for forecasting withheld data, and the encompassing model approach always selects LTM for annual data and ASM for quarterly and monthly data. The performance of each approach was gauged in terms of the median absolute prediction error as a percentage of the standard deviation, given by MdAPES

1

0

C B C B jynþj yˆ n ð jÞj C: ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi s ¼ medianB 100 C B X n A @ 2 ðyt y¯ Þ =ðn 1Þ t¼1

This measure was chosen for several reasons. It is a unit free measure that permits comparisons between real time series measured in different units. It avoids the problems encountered with the MAPE (and its variations) for series values close to zero. Most impor-

Table 1 Alternative penalty functions Criterion Penalty function

Source

AIC BIC HQ MCp GCV FPE

Akaike (1973) Schwarz (1978) Hannan and Quinn (1979) Mallows (1964) Golub et al. (1979) Akaike (1970)

q qlog (n) / 2 qlog(log(n)) nlog(1 + 2q / c) / 2 nlog(1 q / n) (nlog(n + q) nlog (n q)) / 2

q is the number of free parameters. q* is the number of free parameters in the smallest model nesting all other models. c = n q*.

tantly, it gives a fair comparison when applied to time series with different standard deviations. The forecast error may display a tendency to be larger for time series with a large standard deviation. APES is a measure that does not produce larger values just because there is more variability in the time series. Thus, such time series will not necessarily cause an increase in the MdAPES just because of the larger variability. More variable time series can still have APES values near the median value and play a central role in the evaluation process. In the comparisons, both simulated data and real data with different amounts of variability are included. The median was used instead of the average to eliminate the distorting effect of outlier APES.

4. Simulation study A simulation study was conducted to determine whether any of the approaches to model selection displayed a superior performance as measured in terms of forecast accuracy instead of the more usual criteria of the proportion of correctly selected models. It consisted of many experiments carried out under a wide variety of conditions. Depending on the type of data, the time series were generated by the three models: the local level model (LLM), local trend model (LTM) and additive seasonal model (ASM). For the ASM, the seed seasonal components were generated from the equation s jm =Asin(2jp / m), j = 1,2, . . ., m, where A is the seasonal amplitude and m is the number of seasons in a year. Candidate values for the various factors in these models were: r = 10, 20 (for all models) n = 24, 40 (for LLM and LTM) n = 24, 60 (for quarterly/m = 4 ASM) n = 48, 96 (for monthly/m = 12 ASM) a = 0.1, 0.5, 0.9; S 0 = 100 (for LLM) a

0:1 0:4 0:7 ¼ ; ; ; b 0:05 0:1 0:1 100 100 S0 ¼ ; ðfor LTMÞ b0 1 5


0 1 0 1 0 1 0 1 a 0:1 0:4 0:7 B C B C B C B C @ b A ¼ @ 0:05 A; @ 0:1 A; @ 0:1 A;

c 0:1 0:2 S0 100 100 ¼ ; ; 1 5 b0

0:3

A ¼ 0; 25; 50 ðfor ASMÞ: The various combinations of these factors lead to 180 scenarios for the simulation study. The 180 scenarios were repeated 10 times (i.e., 10 trials) so that the study consisted of 1800 simulation experiments. Each experiment consisted of the following four steps: 1. Generate a time series, from a specified model, consisting of a) a tuning sample of a specified size n, and b) an evaluation sample for r succeeding periods: for annual data r = 6, for quarterly data r = 8, and for monthly data r = 18. 2. Fit a collection of models to the tuning sample using the conditional likelihood function: the LLM and LTM for annual data, and additionally the ASM model for quarterly and monthly data. 3. Select the best model by one of the model selection approaches: a) For the six information criteria approaches, choose the model that is best according to each information criterion. b) For the prediction validation approach (Val): i) Withhold the last r periods of the tuning sample and fit the local level model, local trend model, and additive seasonal model to the first n r values by maximizing the conditional likelihood function. ii) Choose the model with the smallest MAPE for the forecasts of the r periods of withheld data. c) For the encompassing approach (Enc): i) Choose the local trend model for annual data. ii) Choose the additive seasonal model for quarterly and monthly data. 4. Using the estimates from Step 2 for the model chosen in Step 3: i) Generate predictions for each of the r periods in the evaluation sample.

243

ii) Calculate the absolute prediction error as a percentage of the standard deviation of the tuning sample (APES) for each of the time periods in the evaluation sample. Overall, 20,880 APES were calculated for each model selection method. As their relative frequency distribution displayed a strong positive skew, their central tendency was summarised by their median. The effect of sampling error on their median was gauged with a bootstrap study. The 20,880 APES were treated as a surrogate population. Five hundred samples of size 20,880 were randomly selected with replacement. The distribution of the medians of these 500 samples is summarised by its median and 90% confidence interval shown for each model selection method in Fig. 1a. It may be observed that: a) The confidence intervals are quite tight so that the effect of sampling error has largely been eliminated by the use of the large surrogate population, thereby reducing room for ambiguity in the comparison of the various model selection methods. b) There are no marked differences between the various information criteria. c) Information criteria are better for model selection than the prediction validation approach. d) The encompassing approach, to the surprise of the authors, is just as effective as the information criteria. Observation c) is important because the prediction validation approach is often recommended and used in practice (Makridakis et al., 1998). The reason for its inferior performance is most likely to be the fact that it does not rely on the entire sample for the fitting phase and so the estimates of parameters lack the level of statistical efficiency that the other approaches have. The effect of sample size, collation period and prediction horizon are shown in Figs. 1b, c and d. To interpret these results, it should be understood that: a. Small sample is n = 24 for annual and quarterly series, and n = 48 for monthly series; b. Large sample is n = 40 for annual series, n = 60 for quarterly series and n = 96 for monthly data; c. The short run consists of future periods 1–3 for annual series, 1–4 for quarterly series and 1–9 for monthly series;

244


(b) Sample Size

28

45

26

40

24

35

22 lower

AIC

BIC

HQ

MCp

23.41

23.48

23.48

23.44

23.83

23.88

23.84

24.37

24.37

24.38

24.37

median 23.8 upper

24.34

GCV

FPE

Enc

Val

23.46

23.4

23.42

24.93

23.8

23.79

23.83

25.43

24.35

24.35

25.88

MdAPES

MdAPES

(a) All

30 25

large

20 15 10 small lower

(c) Collation Period

small

AIC

BIC

HQ

MCp

GCV

FPE

Enc

small median 38.56 38.56 38.46 38.59 38.56 38.56 38.7

50

annual

small upper

39.46 39.46 39.26 39.31 39.22 39.4

large lower

14.5

14.57 14.5

large upper

41.08

39.48 42.06

14.48 14.42 14.48 14.43 15.9

large median 14.81 14.92 14.86 14.82 14.76 14.81 14.8

45

Val

37.71 37.62 37.64 37.86 37.72 37.74 37.76 39.94

16.41

15.24 15.38 15.35 15.28 15.18 15.31 15.29 16.89

MdAPES

40

(d) Prediction Horizon

35 30

40

quarterly

long run 35

20 15

MdAPES

25

monthly AIC

BIC

HQ

MCp GCV

FPE

Enc

39.53 40.16 39.82 39.75 39.23 39.58 39.43 38.44

annual median

41.87 42.2

annual upper

43.85 44.85 44.11 44.02 43.38 44.01 44.38 43.53

quarterly lower

24.15 24.18 24.21 24.21 23.94 24.13 23.88 25.24

41.97 41.87 41.36 41.73 41.94 40.61

24.9

25

Val

annual lower

quarterly median 24.86 24.9

30

24.91 24.76 24.87 24.68 26.06

short run

20 15 short lower

AIC

BIC

HQ

MCp

GCV

FPE

Enc

Val

16.58 16.62 16.65 16.58 16.58 16.55 16.53 18.32

short median 17.08 17.17 17.2

17.17 17.01 17.06 17.01 18.8

quarterly upper

25.59 25.71 25.68 25.75 25.64 25.66 25.43 26.98

short upper

17.47 17.52 17.58 17.51 17.47 17.5

monthly lower

20.48 20.51 20.57 20.51 20.46 20.44 20.29 22.45

long lower

32.51 32.5

32.43 32.51 32.52 32.45 32.52 33.13

long median

33.31 33.4

33.32 33.39 33.46

long upper

34.34 34.49

34.4

monthly median 21.12 21.2 monthly upper

21.19 21.17 21.09 21.1

20.97 23.08

21.61 21.65 21.63 21.58 21.55 21.53 21.49 23.6

33.4

17.45 19.28 33.46 34.18

34.42 34.46 34.36 34.57 35.17

Fig. 1. Simulation results (median APEs and 90% confidence intervals for the criteria given in Table 1).

d. The long run comprises future periods 4–6 for annual data, 5–8 for quarterly data and 10–18 for monthly data.

Table 2 Surrogate population sizes for bootstrap

Collation period

Sample size Prediction horizon

Annual Quarterly Monthly Small Large Short run Long run

Simulation

M3 data

2160 5760 12,960 10,440 10,440 10,440 10,440

3870 6048 25,704 19,472 16,150 17,811 17,811

Partitioning the original sample of 20,880 APES according to these factors results in many smaller surrogate populations, their sizes being those shown in the second column of Table 2. The bootstrap was applied as described above to these smaller samples to again obtain the median and 90% confidence intervals of the median APES. The results obtained for sample size and prediction horizon were as expected. Larger sample sizes lead to smaller APES. There is more volatility in long run predictions. A key point, however, is that the basic conclusions above remain unchanged for these cases. The poorer performance of the prediction evaluation method does not change with sample size or prediction horizon. The results for the collation period tell a slightly different story. They


should be highly correlated with the sample size results because the monthly time series are longer than the quarterly time series, which in turn are longer than the annual series. Interestingly, in the case of the annual series, the median for prediction validation is lower than the medians of the other approaches, but the wider confidence intervals suggest that this difference is not statistically significant.

5. Application to the M3 competition data The use of simulated data does raise the criticism that in real life the true model is unknown. Furthermore, real series are not as well behaved as the simulated series. This happens even when random

errors and outliers are included in the simulated series. In this section, we investigate how forecasting performance is affected by the eight approaches to model selection on real data. The eight model selection approaches were applied to the M3 competition data (Makridakis & Hibon, 2000) to see whether the results of simulated data carry through for real data. It was necessary to remove time series that were too short. Each time series had a tuning sample of a specified size n and an additional evaluation sample of size r where again r = 6 for annual data, r = 8 for quarterly data, and r = 18 for monthly data. For the prediction validation approach, it was necessary to fit models to n r observations where n must exceed r. Moreover, to obtain plausible results with a satisfactory level of statistical reliability,

(a) All

(b) Sample Size

44

50

small

42 45

40 38

lower

AIC

BIC

HQ

MCp

GCV

FPE

Enc

Val

40.02

40.24

40.32

40.05

40.36

39.95

41.3

41.48

median 40.53

40.76

40.76

40.54

40.99

40.56

41.85

42.04

41.03

41.35

41.29

41.02

41.48

41.02

42.56

42.68

upper

MdAPES

MdAPES

245

40

30 small lower

(c) Collation Period

AIC

BIC

HQ

MCp

GCV

FPE

44.99 44.83 44.89 44.93 45.97 45

Enc

Val

47.72 47.33

small median 45.66 45.56 45.59 45.66 46.92 45.68 48.48 48.23

60

small upper

46.56 46.39 46.46 46.59 47.68 46.51 49.57 49.09

large lower

34.53 35.24 35.05 34.51 34.34 34.53 34.44 35.25

large median 35.2

annual

55

large

35

large upper

35.91 35.81 35.24 34.96 35.15 35.1

35.94

35.87 36.75 36.64 35.87 35.62 35.94 35.85 36.86

MdAPES

50

(d) Prediction Horizon 45

monthly 60

40

long run

55

quarterly 30

AIC

BIC

HQ

MCp

GCV

FPE

Enc

Val

MdAPES

50 35

45 40

30

annual median

50.93 50.98 50.16 50.93 52.93 50.94 54.45 52.39

25

annual upper

53.1

quarterly lower

34.49 34.36 34.85 34.49 34.54 34.5

53.18 52.52 53

54.84 53.01 56.95 54.25 34.89 33.32

quarterly median 35.57 35.55 35.91 35.59 35.63 35.59 36.03 34.5

shortrun

35

47.81 47.95 47.65 47.97 50.86 48.08 52.27 49.88

annual lower

short lower

AIC

BIC

HQ

MCp

GCV

FPE

Enc

Val

30.76 30.73 30.76 30.75 30.92 30.71 31.99 31.76

short median 31.21 31.23 31.26 31.19 31.41 31.19 32.54 32.36

quarterly upper

36.98 36.81 37.21 36.96 37.07 37.07 37.13 35.9

short upper

31.84 31.85 31.88 31.84 32

monthly lower

40.09 40.47 40.42 40.08 40.29 40.06 41.44 42.06

long lower

51.66 52

long median

52.68 52.82 52.92 52.64 53.65 52.72 54.15 53.99

long upper

53.64 54.01 53.98 53.73 54.54 53.72 54.98 55.03

monthly median 40.63 41.06 40.96 40.63 40.96 40.64 42.15 42.76 monthly upper

41.28 41.68 41.62 41.28 41.55 41.24 42.92 43.39

31.75 33.05 32.96

51.98 51.67 52.61 51.53 53.33

Fig. 2. M3 data results (median APEs and 90% confidence intervals for the criteria given in Table 1).

53.1

246


the focus of the study was restricted to those series where n z 20 for annual data, n z 28 for quarterly data, and n z 72 for monthly data. After culling series that are too small according to this definition, 1452 of the 2829 time series in the M3 data remained for the comparative study. The procedures that were described in Steps 2 and 3 of Section 3 for the simulation study were applied to the 1452 time series from the M3 competition data. The results involved 35,622 APES for each method. The bootstrapped sampling distribution of medians is summarised in Fig. 2a. The results are similar to those from the simulation study except that the encompassing approach is no longer competitive with the information criteria. When examined by sample size, collation period and prediction horizon, as shown in Figs. 2b, c and d, the results are again similar to those from the simulation study. Now, however, the quarterly time series has the lowest MdAPES. Moreover, although the validation method appears to be better than its competitors for this particular case, the difference is barely significant at the 90% level. The distinction between the small and large samples used in this study depends on the collation period. A small sample was one with a length that did not exceed the median length of all the samples with a corresponding collation period.

6. Conclusions In this paper various approaches to model selection were compared using time series simulated from statistical models underlying exponential smoothing. They were also evaluated using a subset of the time series from the M3 competition database. Their relative performances were judged in terms of the prediction capacities of their selected models. Results from the simulated and real time series data proved to be remarkably consistent. They indicated that there is little to distinguish the various information criteria. The most important finding was that the information criteria approaches appear to be superior to the commonly used prediction validation approach. Moreover, there was some evidence, using the real data, that suggested that the information criteria approaches are better than the encompassing approach.

Overall, these studies indicate that the best practice would be to adopt an information criteria approach for choosing between the various common exponential smoothing methods. They indicate that the AIC probably has a slight edge over its counterparts. As expected, however, the FPE had a similar performance to the AIC, reflecting the asymptotic equivalence of these criteria.

References Akaike, H. (1970). Statistical predictor identification. Annals of the Institute of Statistical Mathematics, 22, 203 – 217. Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov, & F. Csaki (Eds.), Second international symposium on information theory (pp. 267 – 281). Budapest7 Akademiai Kiado. Bartlett, M. S. (1957). A comment on D.V. Lindley’s statistical paradox. Biometrika, 44, 533 – 534. Bartolomei, S. M., & Sweet, A. L. (1989). A note on a comparison of exponential smoothing methods for forecasting seasonal series. International Journal of Forecasting, 5, 111 – 116. Brown, R. G. (1959). Statistical forecasting for inventory control. New York7 McGraw Hill. Chatfield, C. (1978). The Holt–Winters forecasting procedure. Applied Statistics, 27, 264 – 279. Gardner, E. S. (1985). Exponential smoothing: The state of the art. Journal of Forecasting, 4, 1 – 28. Golub, G. H., Heath, M., & Wahba, G. (1979). Generalized crossvalidation as a method for choosing a good ridge parameter. Technometrics, 21, 215 – 223. Hannan, E. J., & Quinn, B. G. (1979). The determination of the order of an autoregression. Journal of the Royal Statistical Society. Series B, 41, 190 – 195. Holt, C. C. (1957). Forecasting Trends and Seasonal by Exponentially Weighted Averages, ONR Memorandum No. 52, Carnegie Institute of Technology, Pittsburgh, USA (published in International Journal of Forecasting 2004, 20, 5–13). Hyndman, R. J., Koehler, A. B., Snyder, R. D., & Grose, S. (2002). A state space framework for automatic forecasting using exponential smoothing methods. International Journal of Forecasting, 18, 439 – 454. Makridakis, S., Andersen, A., Carbone, R., Fildes, R., Hibon, M., Lewandowski, R., et al. (1982). The accuracy of extrapolation (time series) methods: Results of a forecasting competition. Journal of Forecasting, 1, 111 – 153. Makridakis, S., & Hibon, M. (2000). The M3-Competition: Results, conclusions and implications. International Journal of Forecasting, 16, 451 – 476. Makridakis, S., Wheelwright, S. C., & Hyndman, R. J. (1998). Forecasting: Methods and applications. New York7 John Wiley & Sons. Mallows, C. L. (1964). Choosing variables in a linear regression: A graphical aid. Paper presented at the Central Regional

B. Billah et al. / International Journal of Forecasting 22 (2006) 239–247 Meeting of the Institute of Mathematical Statistics, Manhattan, Kansas. McKenzie, E. (1985). Comments on dExponential smoothing: The state of the artT by E.S. Gardner. Journal of Forecasting, 4, 32 – 36. Ord, J. K., Koehler, A. B., & Snyder, R. D. (1997). Estimation and prediction for a class of dynamic nonlinear statistical models. Journal of the American Statistical Association, 92, 1621 – 1629.

247

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461 – 464. Snyder, R. D. (1985). Recursive estimation of dynamic linear statistical models. Journal of the Royal Statistical Society. Series B, 47, 272 – 276. Winters, P. R. (1960). Forecasting sales by exponentially weighted moving averages. Management Science, 6, 324 – 342.