1 Introduction - ISEG

Some problems in the overspeci cation of ARMA and ARIMA processes using ARFIMA models Nuno Crato 1 and Bonnie K. Ray 2

Abstract Nonstationary ARIMA processes and nearly nonstationary ARMA processes, such as autoregressive processes having a root of the AR polynomial close to the unit circle, have sample autocovariance and spectral properties that are, in practice, almost indistinguishable from those of a stationary longmemory process, such as a Fractionally Integrated ARMA (ARFIMA) process. Because of this, model misspeci cation may occur in trying to distinguish between the dierent processes. An appealing strategy would be to overspecify the model and estimate the integration parameter. In this paper, we investigate the eects of this type of misspeci cation on parameter estimates using either spectral regression or frequency-domain maximum likelihood methods. KEYWORDS: frequency-domain likelihood, long-range forecasting, nonstationarity, spectral regression

1 Introduction In modeling a realization of a time series, a generally accepted strategy is to dierence the series until it is stationary before tting an ARMA model. What is not so generally accepted, however, is the best way to determine the degree of dierencing necessary to achieve stationarity. In some borderline cases, to distinguish between a stationary series that should not be dierenced and a nonstationary series that needs to be dierenced is not a trivial matter. For example, a stationary ARMA(p + 1; q) process with a root of the AR polynomial close to the unit circle can be said to be Contact: Department of Mathematics, Stevens Institute of Technology, Hoboken, NJ 07040, USA; Phone: (201) 216-5446; FAX: (201) 216-8321; internet: [email protected] 2 Contact: Department of Mathematics and Center for Applied Mathematics and Statistics, New Jersey Institute of Technology, Newark, NJ 07102, USA; Phone: (201) 642-4496; FAX: (201) 596-6467; internet: [email protected] 1

1

nearly nonstationary; its sample functions, such as its sample spectrum or sample au-

tocovariance function, are almost indistinguishable from those of an ARIMA(p; 1; q) process, which should be dierenced once to achieve stationarity. Both underdierencing an ARIMA(p; 1; q) series and overdierencing an ARMA(p + 1; q) series lead to problems in modeling and forecasting the series. However Wichern (1973), in a well-known paper, claimed that for a given number of series observations, there are competing models, stationary and nonstationary, which have almost indistinguishable sample autocorrelation properties. To illustrate his analysis, Wichern presented two models that have since been widely studied. Anderson and de Gooijer (1979, 1990), in particular, have studied in detail the identi cation of Wichern's models, which are given below. Let (Xt ) represent an observable time series and ("t ) represent a white noise process, i.e., a zero mean uncorrelated process with constant variance "2 . Let B denote the backwards shift operator, BXt = Xt,1 , and let r represent the dierencing operator, rXt = (1 , B )Xt = Xt , Xt,1 . The autoregressive and moving-average process, ARMA(p; q), has the form

Xt , 1 Xt,1 , : : : , pXt,p = "t + 1"t,1 + : : : + q "t,q ; written (B )Xt = (B )"t ; where () and () represent the polynomials (z ) = 1 , 1 z : : : , p z p and (z ) = 1 + 1 z + : : : q z q . We will assume that (z ) and (z ) have all roots outside the unit circle [jz j 1] and no common roots. If we consider the process Yt = rXt = Xt , Xt,1 , where (Yt ) is ARMA(p; q), then (Xt ) is called an autoregressive integrated moving-average process, ARIMA(p; 1; q). With this notation, the rst model presented by Wichern is the nearly nonstationary ARMA(1,1) model

Xt , :95Xt,1 = "t , :74 "t,1 :

(1)

The second model is a special case of an ARIMA model having p = 0, the IMA(1,1) model Xt , Xt,1 = "t , :8 "t,1 : (2) The standard ARIMA model can be generalized by allowing the order of dierencing to take nonintegral values. Fractionally Integrated ARMA (ARFIMA) models 2

were introduced independently by Granger and Joyeux (1980) and Hosking (1981) in order to model processes having long-range dependence. Let d be any real number. The fractional dierence of the process (Xt ) is de ned through the binomial expansion

r = (1 , B ) = d

d

,

1 d! X

(,B )k ; k k =0

where kd is the binomial coecient d(d,1) : : : (d,k+1)=k!. The process (Xt ) follows an ARFIMA(p; d; q) model if rd Xt is an ARMA(p; q) process. When d 2 (,:5; :5) and all the roots of the MA and AR polynomials lie outside the unit circle, (Xt ) is stationary and invertible. The ARMA and ARIMA models can be thought of as particular cases of ARFIMA models having d = 0 and d = 1; 2; : : :, respectively. In fact, the sample functions of a stationary ARFIMA(p; d; q) series are sometimes very similar to those of a nearly nonstationary ARMA(p; q) process or a nonstationary ARIMA(p; 1; q) process. Thus, in addition to the possibility of misspecifying series generated by models such as (1) using an ARIMA model, or misspecifying series generated by models such as (2) using an ARMA model, we have the added possibility of identifying such series as ARFIMA processes. One might ask if it is possible to distinguish between ARMA and ARIMA processes by explicitly estimating d, since both models are special cases of the more general ARFIMA model. In fact, one might be tempted to suggest that an ARFIMA model should always be used, with the hope that the correct d (including d = 0 or d = 1) will be estimated. If d is not correctly estimated, then at least one could hope that the resulting misspeci ed model would have nite sample properties very similar to the ones of the original process and that the forecasts given by the misspeci ed model would be close to the optimal ones. In this paper, we investigate this strategy for modeling nearly nonstationary ARMA and nonstationary ARIMA processes. For the types of processes discussed above, other methods of determining the most appropriate model, such as the Dickey-Fuller test for a unit root in the AR polynomial (Dickey and Fuller, 1979), do not perform well. The Dickey-Fuller test has been found to have low power when the process has an AR root close to the unit 3

circle. It also has low power against fractional d (Diebold and Rudebusch, 1991). Misspeci cation problems dealing with long memory models have been previously considered from the viewpoint of modeling ARFIMA processes using ARIMA models. Ray (1993) examines the case in which an ARFIMA process is modeled as an autoregressive process. The analysis indicates that this type of misspeci cation can lead to serious forecasting errors when d is near 1/2. Crato (1992) investigates the case in which a stationary ARFIMA process is wrongly identi ed as a nonstationary ARIMA process. The analysis indicates that the resultant forecasting errors can increase without bound. Yajima (1993) gives results concerning the parameter estimates of an ARFIMA process modeled as an ARMA process, but does not address the forecasting issue. Crato and Ray (1995) present a large-scale simulation study showing that a large number of observations is necessary in order to obtain any signi cant advantage on the use of ARFIMA forecasting models. In this paper we investigate the reverse misspeci cation problem: can ARFIMA models be used to characterize nearly nonstationary ARMA and nonstationary ARIMA processes? In Section 2, we discuss misspeci cation problems resulting from errors in estimating the fractional dierencing parameter d only. In Section 3, we extend the discussion to include the estimation of all the parameters of an ARFIMA model for series generated by (1) and (2) using two dierent estimation procedures: the spectral regression or GPH method (Geweke and Porter-Hudak, 1983) and the frequency-domain maximum likelihood (ML) method. Recent work (see, e.g., Agiakloglou, Newbold, and Wohar, 1993) has indicated that the GPH estimator can be signi cantly biased, namely for processes having roots of the AR polynomial close to the unit circle. We illustrate these biases. With regard to ML estimation, Smith, Sowell, and Zin (1993) claim that ARFIMA model parameters estimated using ML methods have smaller nite sample biases. We investigate this claim for the Wichern models. In Section 4, we summarize our ndings and discusses the implications.

4

2 Errors Resulting from the Estimation of the Dierencing Parameter The estimation of the fractional dierencing parameter d is fraught with diculties. All known methods produce relatively large estimation errors, even in large samples. For relatively small samples, (such as those studied in the quoted papers of Wichern and of Anderson and de Gooijer, where n = 100) the problem can be very serious. A stationary ARMA model (having d = 0) is likely to produce estimated d^ values signi cantly dierent from zero if the ARMA components are signi cantly large. Regarding the increase in forecast error due to the erroneous estimation of d, some insight can be gained from looking at the case in which the true process is white noise misspeci ed as a fractional noise. This corresponds to the case in which the short-memory components are exactly known. In what follows, X^ t+hjt will represent the best linear predictor of Xt+h ; h > 0; made with the knowledge of Xs ; s t, and X~ t+hjt := X^ t+hjt , Xt+h will represent the forecast error for the best linear predictor. A star will be systematically used to denote a misspeci cation. Thus X^ t+hjt will represent the misspeci ed forecast and X~t+hjt := X^ t+hjt , Xt+h will represent the misspeci ed forecast error. Let (Xt ) be a white noise process, Xt = "t , misspeci ed as a fractional noise rd Xt = "t ; d 2 (,:5; :5), and let k represent the coecients of the moving , average operator r,d ; with k := ,kd (,1)k . The moving-average representation of the misspeci ed process is 1 X Xt = j "t,j ; j =0

where ("t ) is in fact a fractional noise, "t = rd "t . The corresponding h-step P " ~ predictor at time t = n is X^ n+hjn = 1 j =h j n+h,j and the error is Xn+hjn =

Ph,1 " . Thus the mean square forecasting error at step h, V (h), is j =0 j n+h,j V (h) =

,1 hX ,1 X

h

=0 j =0

i

5

E[" " ]2 j t,i t,j

i

=

,1 hX ,1 X

h

=0 j =0

i

" (i , j );

i

(3)

j

where " (k) denotes the autocovariance of the ("t ) process at lag k. Using the fact that " (k) = "2 (,1)k (,2d )!=[(k , d )!(,k , d )!] (Hosking, 1981, p. 167), V (h) can be computed for dierent values of d and h. For this process, the mean square forecasting error with the correct model is always V (h) = "2 , for any step h. Then, the relative increase is given by the formula

0

1

,1 hX ,1 V (h) , V (h) = @hX (,d )!(i , j + d , 1)! , 1A i j V (h) (d , 1)!(i , j , d )! i=0 j =0

(4)

Figure 1 shows this relative increase for dierent values of h and d . Clearly, d close to zero does not produce signi cant errors, but d close to ,:5 or .5 signi cantly increases the immediate forecasts. Interestingly enough, the relative increase in mean square error decreases rapidly as h increases. These results can be useful as a reference, however more signi cant errors in the estimation of the dierencing parameter arise when short-memory parameters disturb the estimation of d. As mentioned previously, the spectral estimator of d suggested by Geweke and Porter-Hudak (1983) can be signi cantly biased. Our results show that the maximum likelihood approach can also lead to signi cant biases. We concentrate next on the estimation of the speci c models presented by Wichern (1973).

3 Errors Using Misspeci ed Models 3.1 ARMA Process Misspeci ed as ARFIMA As previously discussed, the nearly nonstationary ARMA model (1) studied by Wichern has nite sample properties that can lead the analyst to identify it as an ARFIMA process. In this section, we show that if the analyst proceeds to estimate an ARFIMA model using either a spectral regression based method or ML estimation, the expected value of d will be nonzero. 6

Figure 1: Relative increase in forecasting error variance (ordinates) for h = 1; 5; 10, versus the fractional dierencing parameter d (abscissa).

3.1.1 GPH Estimation In order to obtain theoretical results, we assume that the analyst follows a twostep procedure, as suggested by Geweke and Porter-Hudak (1983). First, a spectral estimator of the parameter d is obtained. Second, the data is fractionally dierenced using the estimated d and the ARMA parameters of the dierenced series are estimated. For a sample of size n = 100 the expected value of d for the stationary ARMA model (1) of Wichern is computed by the method outlined in Agiakloglou, Newbold, and Wohar (1993) using Fourier frequencies !j = (2j=n); j = (l + 1); : : : ; m, with l = [n:1] = 1 and m = [n:55 ] = 12. (The lower truncation of periodogram ordinates was suggested by Robinson (1991)). The estimated value of the fractional parameter in this case is d =.44. Next, we assume the analyst ts an ARMA(1,1) model to the dierenced data and estimates and by autocorrelation methods. In order to assess the expected 7

value of these parameters, we use the expected value of the sample autocorrelation coecients. The dierenced series is now an ARFIMA(1,.44,1) model having autoregressive coecient = :95 and moving-average coecient = ,:74. Hosking (1984) gives the expected value of the sample autocorrelation coecients for an ARFIMA process as

, d) 2d,1 E rk = (k) , (1 d,(1(+k)),(1 2d),(d) frdX (0) n ; where (k) represent the autocorrelation at lag k and frdX represents the spectral density of the ARMA part of the process. In this case frd X (0) = (1 , :95)2 =(1 , :74)2 2="2 . Let rk be the estimates obtained through this process. For this model,

E r1 = :7584 and E r2 = :4745: We approximate the expected estimates of the ARMA(1,1) parameters by substituting E r1 and E r2 into the well-known equations that relate the ARMA(1,1) parameters to the theoretical correlation coecients (see, e.g., Box and Jenkins 1976, p. 77). )( + ) (1) = (1(1++ (5) 2 + 2) ; (2) = r1 : The corresponding values for the ARMA parameters obtained from these relations are = :63 and = ,:48: The complete expected ARFIMA model is then (1 , :63B )r:44 Xt = (1 , :48B )"t :

(6)

3.1.2 ML Estimation Since it is generally accepted that the two-step GPH method performs poorly for processes containing large ARMA components, one may choose to estimate the parameters of the ARFIMA model using ML estimation. Using the Whittle method (Brockwell and Davis, 1991, p. 529, Eq. (13.2.26)), we minimize

0 1 [X n=2] [n=2] I ( ! ) n j A ,1 X log g (!j ); + n ln @n,1 g (! ) j

=,[n=2]

j

j

8

=,[n=2]

(7)

2500

2000

1500

1000

500

-1

-0.5

0.5

1

Figure 2: Spectral likelihood function for the ARMA process estimated as an ARFIMA with and xed at the minimizing values; d is in the abscissa. where g (!j )"2 =(2) is the theoretical spectrum of the ARFIMA model we are estimating and In (!j ) is the periodogram of the series. We derive the expected ML parameter estimates under misspeci cation by replacing In (!j ) by its expected value fX (!j )"2 =(2) and numerically minimizing the result as a function of the parameters (d; 1 ; : : : ; p ; 1 ; : : : ; q ) of g (!j ). For the particular case of process (1) modeled as an ARFIMA(1; d ; 1) process, we obtain d = :70; = :10; and = ,:57, i.e., the ARFIMA model (1 , :10B )r:70 Xt = (1 , :57B )"t :

(8)

This result is quite interesting! It indicates that a nonstationary ARFIMA model (d > :5) is likely to be estimated when the true generating process is stationary. Figures 2 and 3 provide some insight into this phenomenon and show the serious problems encountered using ML estimation in this case. The spectral likelihood 9

d

4000 00

2000 000

0.5

0 0 d -0.5 0

-0.5

AR 0.5

Figure 3: Spectral likelihood function for the ARMA process estimated as an ARFIMA with xed at the minimizing values; d (right) and the AR parameter (left) are the independent variables. function (7) is very at over the region 0 < d < 1; 0 < < 1. This could result from the interplay between d and in the autocorrelation of an ARFIMA(1; d; 1) process. In practice, the estimation of a signi cantly nonzero d is highly plausible.

3.2 ARIMA Processes Misspeci ed as ARFIMA models The nonstationary IMA model (2) studied by Wichern also presents nite sample properties that can lead the observer to identify it as an ARFIMA process. In this section, we derive the estimated ARFIMA model parameters and the resulting forecasts.

3.2.1 GPH Estimation If the analyst suspects that the process is generated by an ARFIMA model and estimates the model parameters using the two-step method outlined in the previous section, Ed = d = :35 for a sample of size n = 100. This value is obtained in a 10

manner similar to that for the stationary ARMA process, but using the expected value of the periodogram of the rst dierences of the process with n = 99 and adding one to the value given by the spectral regression, since the spectrum is not de ned for the nonstationary model (2). This result is quite interesting; it shows that a stationary long-memory model (0 < d = :35 < :50) is likely to be identi ed when the two-step method is applied to a series generated by a nonstationary IMA. Next, we assume that the analyst fractionally dierences the observations as before and estimates the ARMA(1,1) model by autocorrelation methods. In order to assess the expected value of and , the expected values of the sample autocorrelation coecients are needed. The diculty here is that the autocorrelations of the nonstationary model (2) are not de ned. We derive the expected value of the sample autocorrelation coecients in the Appendix using the simplifying assumption Xt = 0; t 0. The expected sample covariances of order k divided by the expected sample variances can be used as an approximation to the expected sample correlation coecients of the dierenced process (rd Xt ). Let E rk be the estimates obtained through this process. The expected estimates of the ARMA parameters are obtained by substituting E r1 and E r2 into (5). With the model (2) and a sample of size n = 100, the values for the autocorrelations are computed from (13), obtaining

E r1 = :97; E r2 = :95: The corresponding values for the ARMA parameters computed from the relations (5) are = :98; = ,:22. The complete expected estimated ARFIMA model is then (1 , :98B )r:35 Xt = (1 , :22B )"t : (9)

3.2.2 ML Estimation Using the ML approach described above, we again face the problem that the spectrum of (2) is not de ned. We use the periodogram of the dierenced process and 11

obtained quite precise estimates (dd , 1) = :00; ^ = :00; ^ = ,:74, which correspond to the true generating process (8). Thus in this case, there is no increase in forecast error resulting from using the estimated ARFIMA model.

4 Conclusions Using autocorrelation and spectral methods, we have shown that the misspeci cation of a nearly nonstationary ARMA process or a nonstationary ARIMA process as a fractionally integrated process is likely in practice. In fact, this misspeci cation may even arise deliberately if one tries to model a series having d = 0 or d = 1 using the more general ARFIMA(p; d; q) model in order to incorporate both of the models discussed as special cases. Our results, however, illustrate that serious errors may result from this type of misspeci cation. The two-step spectral estimation procedure gives, in the rst step, a highly biased estimate of d, which leads to erroneous estimates of the autoregressive and movingaverage parameters of the process. In particular, it leads to the identi cation of a stationary process for observations generated by the nonstationary IMA model. The frequency-domain maximum likelihood procedure identi es correctly the nonstationary IMA but it leads to the identi cation of a nonstationary ARFIMA for observations generated by the stationary ARMA model. If ARFIMA models are to be used, extremely care is necessary to identify the proper model and to estimate the d parameter. Our results suggest that only with a large number of observations reasonable estimates can be obtained.

APPENDIX Sample Autocorrelation Coecients of an IMA(1,1) Process First, we condition on the value of the rst observation and assume, without loss of generality, that X0 = 0; "0 = 0. This latter assumption does not signi cantly aect the results and is consistent with the common simpli cation of assuming the unknown variables (t 0) as null. 12

We have then

X0 = 0 X1 = (1 , B )"1 X2 = (1 , B )(1 + B )"2

Xt = (1 , B )(1 + B + B 2 + + B t,1 )"t

(10)

From (10), we obtain the expression for the fractionally dierenced observations (rd Xt ). Simplifying the notation by using = 1 , and the coecients i for the terms of the operator r,d ( 0 = 1; 1 = d ; 2 = d (d + 1)=2; : : :), we have

rd X0 rd X1 rd X2 rd Xt

= = =

0

"1 "2 + 1 "1

+"1

= "t + 1 "t,1 + 2 "t,2 + : : : + t,2 "2 + t,1 "1 +"t,1 + 1 "t,2 + : : : + t,3 "2 + t,2 "1

:::

or, rearranging,

r Xt = d

0 t,1 X @ j

=0

+"2

(11)

+ 1 "1 +"1

1 ,1 X A "t,j : j + i j

(12)

=0

i

From these expressions we compute the expected values of the sample covariances for a realization of length n.

0 , n,j ,k+1 X 2 @

E (r Xtr Xt,k ) = " d

d

n k j

=1

n

10 ,1 X A @ j + i j

=0

i

13

, +

k j

,X,1

j k i

=0

1 A : i (13)

References Agiakloglou, Christos, Newbold, Paul, and Wohar, Mark (1993). \Bias in an estimator of the fractional dierence parameter," Journal of Time Series Analysis, 14, 3, 235-246. Anderson, Oliver D. and De Gooijer, Jan G. (1990). \Discrimination between nonstationary and nearly nonstationary processes, and its eect on forecasting," Recherche Operationnelle/Operations Research, 24, 1, 67-91. Anderson, Oliver D. and De Gooijer, Jan G. (1979). \On discriminating between IMA(1,1) and ARMA(1,1) processes: Some extensions to a paper by Wichern," The Statistician, 28, 2, 119-133. Box, George E. P. and Jenkins, Gwilym M. (1970). Time Series Analysis: Forecasting and Control, Holden Day, San Francisco. Brockwell, Peter J. and Davis, Richard A. (1991). Time Series: Theory and Methods, second edition, Springer-Verlag. Crato, Nuno (1992). \Long-memory models misspeci ed as nonstationary ARIMA," American Statistical Association, 1992 Proceedings of the Business and Economic Statistics Section, 82{87. Crato, Nuno and Ray, Bonnie K. (1995). \Model selection and forecasting for long-range dependent processes," Department of Mathematics, New Jersey Institute of Technology. Diebold, Francis X. and Rudebusch, Glenn D. (1991). \On the power of Dickey Fuller tests against fractional alternatives," Economic Letters, 35, 155-160. Dickey, D.A. and Fuller, W. A. (1979). \Distribution of the estimators for autoregressive time series with a unit root," Journal of the American Statistical Association, 74, 427-431. Geweke, John and Porter-Hudak, Susan (1983). \The estimation and application of long memory time series models," Journal of Time Series Analysis, 4, 4, 221-238. Granger, Clive W. J. and Joyeux, Roselyne (1980). \An introduction to long memory time series models and fractional dierencing," Journal of Time Series Analysis, 1, 1, 15-29. Hosking, Jon R. M. (1981). \Fractional dierencing," Biometrika, 68, 1, 165-176.

Hosking, Jon R. M. (1984). \ Asymptotic distributions of the sample mean, autocovariances and autocorrelation of long memory time series," Technical Report 2752, Mathematics Research Center, University of Wisconsin{Madison.

14

Ray, Bonnie K. (1993). \Modeling long memory processes for optimal long-range prediction," Journal of Time Series Analysis 14, 5, 511-525. Robinson, Peter M. (1991). \Log-periodogram regression of time series with long range dependence," London School of Economics. Smith, A.A., Sowell, F., and Zin, S.E. (1993). \Fractional integration with drift: Estimation in small samples," Carnegie Mellon University. Wichern, D. W. (1973). \The behaviour of the sample autocorrelation function for an integrated moving-average process." Biometrika, 60, 235{9. Yajima, Y. (1993). \Asymptotic properties of estimates in incorrect ARMA models for long memory time series," in New Directions in Time Series Analysis, Part II, eds. Brillinger, et al., 375-382.

15