Second-order least-squares estimation for regression ... - Springer Link

Comput Stat (2014) 29:931–943 DOI 10.1007/s00180-013-0470-1 ORIGINAL PAPER

Second-order least-squares estimation for regression models with autocorrelated errors Dedi Rosadi · Shelton Peiris

Received: 21 December 2011 / Accepted: 25 November 2013 / Published online: 14 December 2013 © Springer-Verlag Berlin Heidelberg 2013

Abstract In their recent paper, Wang and Leblanc (Ann Inst Stat Math 60:883–900, 2008) have shown that the second-order least squares estimator (SLSE) is more efficient than the ordinary least squares estimator (OLSE) when the errors are independent and identically distributed with non zero third moments. In this paper, we generalize the theory of SLSE to regression models with autocorrelated errors. Under certain regularity conditions, we establish the consistency and asymptotic normality of the proposed estimator and provide a simulation study to compare its performance with the corresponding OLSE and generalized least square estimator (GLSE). It is shown that the SLSE performs well giving relatively small standard error and bias (or the mean square error) in estimating parameters of such regression models with autocorrelated errors. Based on our study, we conjecture that for less correlated data, the standard errors of SLSE lie between those of the OLSE and GLSE which can be interpreted as adding the second moment information can improve the performance of an estimator. Keywords Second-order least square · Asymptotic normality · Regression model · Autocorrelated errors · Ordinary least square · Generalized least square · Consistency 1 Introduction A considerable attention has been paid in the literature to a class of models which is useful and applicable in many scientific endeavor including economic, business, D. Rosadi (B) Department of Mathematics, Gadjah Mada University, Sekip Utara, Yogyakarta, Indonesia e-mail: [email protected] S. Peiris School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia e-mail: [email protected]

123

932

D. Rosadi, S. Peiris

engineering, environmental and health science is the class of regression models with autocorrelated errors. This regression model is given by yi = f (Xi , β) + u i , (i = 1, 2, . . . , N ),

(1)

where yi ∈ R is the response variable, Xi ∈ Rk is the vector of fixed predictor variables (this could be relaxed and the results presented here are applicable for stochastic variables such that independent but not identically distributed regressors), β is the vector of unknown regression parameters in the parameter space Ω ⊂ Rq , and u i follows a covariance stationary time series such that the covariances, cov(u i , u i+h ) depend only on the gap h and not on the position i (in time). Further, the regression response function f (Xi , β) is a fully specified linear or nonlinear function of β and assume E( f (Xi , β)u i |Xi ) = 0 for all β in Ω and all i. It is known that the variance–covariance matrix Λ N of the disturbance vector, u = (u l , u 2 , . . . , u N ) consists of elements γi j = cov(u i , u j ) = γ (|i − j|) = γ (h), h = |i − j|, and γ (h) is the autocovariance function at lag h of the process {u t , t ∈ Z}, i.e. γ (h) = cov(u i , u i+h ), h = 0, ±1, ±2 . . .. As a special case of interest, this paper considers the process {u t , t ∈ Z} follows a causal, stationary autoregressive (AR) process of order p satisfying u t = a1 u t−1 + a2 u t−2 + · · · + a p u t− p + υt ,

(2)

where υt is a white noise sequence such that E(υt ) = 0, E(υt2 ) = σ 2 and E(υt υt ) = 0 for all t = t . Further, υt is independent of Xt for all t. Given a sample of N observations (X1 , y1 ), . . . , (X N , y N ), we wish to estimate the unknown parameters β, a = (a1 , . . . , a p ) and σ 2 . In the case of nonlinear regression model, an efficient estimator of β, when the Λ N is known would be the generalized least square estimator (GLSE). Specifically, one would estimate β by βˆG L S which minimizes the squared Mahalanobis length of the residual vector Q(β) given by

Q(β) = (y − f (β)) Λ−1 N (y − f (β)), where y = (y1 , y2 , . . . , y N )

and

f (β) = ( f (X1 , β), f (X2 , β), . . . , f (X N , β)) . In the linear case, we have the following closed form solution as the estimator −1 −1 βˆGLS = (X Λ−1 N X) X Λ N y.

When Λ N is not known, an obvious approach is to replace it by a consistent estimator in the formula above [see for example, Gallant and Goebel (1976), Prais and Winsten (1954)] to evaluate βˆG L S . When the errors follow a stationary AR( p) type process,

123

Second-order least-squares estimation for regression models

933

the estimation involves additional p + 1 parameters. In this case Λ N can be written in terms of those additional parameters and then minimize the function Q(ψ), where the vector ψ contains all p + q + 1 parameters in models (1) and (2). As an alternative simple approach, one could ignore the autocorrelation structure of the errors and apply the ordinary least squares estimator (OLSE) which minimizes SSE(β) given by SSE(β) =

N

(yi − E(yi |Xi ))2 =

i=1

N

(yi − f (Xi , β))2 ,

i=1

where the corresponding estimator is denoted by βÔLS . There are numerous articles describing the efficiency of the estimator βÔLS (which ignores the serial correlation of the errors) relative to the βˆGLS , which takes the correlation into account. See, for example, Chipman (1979), Krämer (1980), Ullah et al. (1983), and Krämer and Marmol (2002). In the linear regression setting with AR(1) errors, Chipman (1979) gave the greatest lower bound for the efficiency of the OLS of β relative to that of GLS estimator. Further, Krämer (1980) proved that the efficiency (when defined as the trace of the variance–covariance matrix) of the OLS is as good as that of GLS when a1 is close to one if the regression model includes only a constant term. In addition, Krämer and Marmol (2002) have shown that OLS and GLS are asymptotically equivalent in the linear regression model with AR( p) disturbances. Asymptotic properties of Prais–Winstein estimator has been discussed in Ullah et al. (1983). In order to improve the efficiency of the OLSE, which uses only the information contained in the first order moment of data, this paper suggests an approach based on the additional information given by the second moments or the second-order least-squares estimator (SLSE). Therefore, it is obvious that unlike in the OLSE, the autocorrelation nature of the error structure is taking into account in this SLSE approach. With that view in mind the rest of this paper is organized as follows. In Sect. 2, we define the method of SLSE for the problem that we consider in this paper and provide necessary details together with consistency and asymptotic normality of the estimator under certain regularity conditions. The Sect. 3 considers a numerical illustration together with a Monte Carlo simulation study to support the properties of the suggested estimator. A conclusion and a summary of significant results from this work are added in the Sect. 4. 2 Second-order least squares estimation The SLSE method was proposed by Wang (2003, 2004) and explored the theory of estimating parameters of regression models with independent and identically distributed (iid) measurement errors. Further Abarin and Wang (2006) and Wang and Leblanc (2008) extended this approach to estimate nonlinear models with homoscedastic errors. Abarin (2008) generalized the SLSE method to incorporate the heteroscedastic (non autocorrelated) case by using the quasi-likelihood variance function (QVF) approach. In their paper Abarin and Wang (2009) have used the SLSE to estimate censored regres-

123

934


sion models while Li (2011) and Li and Wang (2011) have applied SLSE to estimate linear mixed models. Chen et al. (2011) proposed a method to improve robustness of the SLSE. In their paper, Kim and Ma (2011) generalized this SLSE method to a general framework of semiparametric approaches. 2.1 The SLSE Let ψ = (β , a , σ 2 ) be the vector of parameters in the space Γ = Ω × Θ × Σ ⊂ Rq+ p+1 and let ψ0 = (β0 , a0 , σ02 ) be the vector of true parameters. Following Wang and Leblanc (2008), the SLSE ψˆ SLSE of ψ is defined as the measurable function that minimizes Q N (ψ) =

N

ρi (ψ)Wi ρi (ψ),

i=1

where ρi (ψ) = [yi − E(yi |X i ), yi2 − E(yi2 |X i )] and Wi = W (X i ) is a suitably chosen 2 × 2 nonnegative definite matrix which may depend on X i . Under the assumption of the model (1) with errors from a stationary AR(p) process given in (2), we obtain E(yi |Xi ) = f (Xi , β) and E(yi2 |Xi ) = f 2 (Xi , β) + γ (0), where γ (h) is the autocovariance function at lag h of {u t , t ∈ Z} and Cov(X t , u t ) = 0 for all t. Now we establish the consistency and asymptotic normality properties of the proposed SLSE, ψˆ SLSE , under certain regularity conditions as given in the Sect. 2.2 below. 2.2 Asymptotic properties of SLSE The following assumptions are required to establish the consistency and asymptotic normality properties of the proposed estimator ψˆ SLSE . Assumption 1 f (x, β) is a measurable function of x for every β ∈ Ω and continuous on Ω a.s.−P. 4+δ + 1) ≤ B < ∞, Assumption 2 For some δ > 0, E Wi 1+δ (supΩ | f (X√ i , β)| 1 where · denotes the Euclidean norm given by M = trace(M M ).

123


935

Assumption 3 The parameter space Γ ⊂ Rq+ p+1 is compact, ψ0 is an interior point of Γ. Assumption 4 For any ψ ∈ Γ , E[(ρi (ψ) − ρi (ψ0 )) Wi (ρi (ψ) − ρi (ψ0 ))] = 0 if and only if ψ = ψ0 . Assumption 5 f (x, β) is twice continuously differentiable function in Ω. Further4+δ more, for some δ > 0, the first two derivatives satisfy E W (X )1+δ supΩ ∂ f (X,β) ≤ ∂β B2 < ∞

and

2 4+δ E W (X )1+δ supΩ ∂ f (X,β) ≤ B3 < ∞. ∂β∂β

Assumption 6 The matrix B =

1 N E i=1 N

∂ρi (ψ0 ) ∂ρi (ψ0 ) ∂ψ Wi ∂ψ

is non singular.

Notes Assumption 1 is given to ensure that the objective function Q N (ψ) is a continuous function of ψ. Assumptions 2 and 3 are needed to guarantee that Q N (ψ) converges. Assumption 4 (identification condition) is given to guarantee that Q(ψ) attains a unique minimum at the true parameter value ψ0 ∈ Γ . Assumption 5 ensures the uniform convergence of the derivative of Q N (ψ). Assumption 6 is necessary for the existence of the variance of ψˆ N . Now we state and prove the following theorems to establish the consistency and asymptotic normality properties of the proposed SLSE estimator. Theorem 1 Under the Assumptions 1–4, ψˆ S L S E −−→ ψ0 , as N → ∞. a.s.

Theorem 2 Under the Assumptions 1–6, as N → ∞, N (0, B −1 C B −1 ), where

√

L N (ψˆ S L S E − ψ0 ) − →

N ∂ρi (ψ0 ) ∂ρi (ψ0 ) 1 Wi E B= N ∂ψ ∂ψ i=1

and C=

N 1 ∂ρi (ψ0 ) ∂ρi (ψ0 ) E W ρ (ψ )ρ (ψ )W i i 0 0 i i ∂ψ ∂ψ N i=1

+

N −1 N ∂ρ j (ψ0 ) 1 ∂ρ j−i (ψ0 ) E W ρ (ψ )ρ (ψ )W j j 0 j−i 0 j−i ∂ψ ∂ψ N i=1 j=i+1

1 + N

N −1

N

i=1 j=i+1

E

∂ρ j−i (ψ0 ) ∂ρ (ψ ) W j−i ρ j−i (ψ0 )ρ j (ψ0 )W j j 0 ∂ψ ∂ψ

(3)

Proofs for the above 2 theorems are given in the “Appendices”. The following subsection considers a numerical approach to illustrate the above theorems.

123

936


2.3 A numerical illustration The solution ψˆ S L S E can be obtained by minimizing the function Q N (ψ) using a suitable numerical method such as the Newton–Raphson algorithm. As alternative approaches one could apply the other methods which use the gradient and/or the second derivative of Q N (ψ) or the Nelder–Mead simplex method to minimize the quadratic function Q N (ψ). In theory, any suitable form of Wi satisfying the regularity conditions could be used to find the solution. Wang and Leblanc (2008) have proposed the best choice of Wi which gives the minimum asymptotic variance covariance matrix of ψˆ N or the most efficient (optimal) estimator in iid case. The similar form of matrix is more difficult to find in the autocorrelated case. However, for computational purpose, in this paper we propose to use the most efficient Wi of iid case for the computation of the estimator. Wang and Leblanc (2008) and Abarin and Wang (2006) show that in iid case, the most efficient SLSE can be obtained by setting Wi = U −1 , with U = E(ρi (ψ0 )ρi (ψ0 )|X i ). A direct calculation shows that in AR( p) errors case 1 Wopt (X ) = (U (X ))−1 = det(U (X ))

(u 2 + 2u f (X, β0 ) − γ (0))2 −u(u 2 + 2u f (X, β0 ) − γ (0)) ×E X −u(u 2 + 2u f (X, β0 ) − γ (0)) u2 1 = det(U (X ))

μ4 + 4μ3 f (X, β0 ) + 4γ (0) f 2 (X, β0 ) − γ 2 (0) −μ3 − 2 f (X, β0 )γ (0) × , −μ3 − 2 f (X, β0 )γ (0) γ (0) (4) where μ3 = E(u 3 |X ) and μ4 = E(u 4 |X ). The calculation of Wopt involves unknown parameters and which are to be estimated first. Wang and Leblanc (2008) proposed the following two stages procedure for a better solution. First, Q N (ψ) is minimized using W = I2 and obtain the first stage estimator ψˆ 1 . Secondly, estimate the elements of U −1 using ψˆ 1 and the corresponding moments of the errors using the residuals and then minimize Q N (ψ) again with W = Uˆ i−1 to obtain the second stage estimator ψˆ 2 . In general, U can be estimated using any suitable method. For example, use a method based on the means to obtain Wˆ opt (X ) = ⎛ ×⎝

1 N

− N1

1 det (U ) ⎞ N 2 N 2 2 −1 ˆ ˆ i=1 ri + 2ri f (Xi , β) − γˆ (0)) i=1 ri (ri + 2ri f (Xi , β) − γˆ (0)) N ⎠, N 1 N 2 + 2r f (X , β) 2 ˆ r (r − γ ˆ (0)) r i i i=1 i i i=1 i N

ˆ denotes the residuals obtained in each iteration. where ri = yi − f (Xi , β) Now the Sect. 3 considers a simulation study to justify that the SLSE performs well in finite samples when the errors follow a first order autoregressive model.

123


937

3 A Monte Carlo study This section investigates the finite sample behaviour of the SLSE and it compares with the corresponding OLSE and GLSE through a simulation study. To illustrate this procedure, we consider a linear regression model with errors from an AR(1) process satisfying yi = β0 + β1 X i1 + β2 X i2 + u i where u t = au t−1 + vt and |a| < 1. For this model we have E(u i |X i ) = 0, σ2 , 1 − ρ2 σ 2 ρ |i− j| E(u i u j |X i ) = , 1 − ρ2 E(u i2 |X i ) =

and

E(yi |X i ) = X i β,

E(yi y j |X i ) = (X i β)(X j β) +

σ 2 ρ |i− j| . 1 − ρ2

The true vector for the regression parameters is β = (β0 , β1 , β2 ) = (25, 2, −3) . The observation vector (X i1 , X i2 ) is assumed to follow a multivariate normal distribution N2 (μ, Σ), where the mean vector μ = (5, 8) and covariance matrix Σ = 04 09 . √ The true parameter values for a are a = 0.2, 0.4, 0.6, 0.8 and vt = (χ 2 (5)−5)/ 10 follows a normalized χ 2 (5) distribution such that var (vt ) = σ 2 = 1. The sample sizes considered in this study are n = 50 and n = 100. This particular choice of parameters for the above regression model has been considered by Chen et al. (2011) to study the robust SLSE. In our simulation study, we conduct s = 500 simulation runs to compute the bias and the standard error of ψˆ under the OLSE, GLSE and SLSE. ( j) Suppose θî is an estimate for the parameter θi at the jth iteration, j = 1, 2, . . . , s. Then we compute the following for each parameter θi . – Mean 1 ( j) θî θ¯î = s s

j=1

– Estimated average bias bias(θî ) =

1 ( j) ¯ (θî − θî ) s s

j=1

123

938


– Estimated standard error ⎛ SE(θî ) = ⎝

1 s−1

s

⎞1/2 ( j)

(θî

− θ¯î )2 ⎠

j=1

The corresponding simulation results are reported in Tables 1 and 2. These results show that the SLSE performs well for estimating all parameters of the model and gives small bias. For small to medium values of a (0 ≤ a ≤ 0.6), the standard error of SLSE for β0 , β1 and β2 are in between those of the OLSE and GLSE, which can be interpreted as that adding the second moment information to OLS can improved the performance of the estimator in the case of autocorrelated errors. For large values of a, the data become volatile and therefore the SLSE is less robust than the OLSE. The SE of GLSE are the smallest, which is consistent with the theoretical result (Gauss– Markov–Aitken theorem for linear regression model).

Table 1 Simulation results, n = 50 True values

OLSE

SLSE

GLSE

Bias

SE

Bias

SE

Bias

SE

25

β0

−0.04557

0.58021

−0.03917

0.56475

−0.03916

0.56474 0.07562

2

β1

0.00199

0.07849

0.00052

0.07564

0.00051

−3

β2

0.00304

0.05030

0.00318

0.04901

0.00320

0.04904

0.2

ρ

−0.04040

0.12495

−0.02884

0.13375

−0.02884

0.13375

1

σ2

−0.05311

0.28251

−0.05037

0.28334

−0.05037

0.28334

25

β0

−0.16998

2.30815

−0.08476

1.67191

−0.08535

1.67165

2

β1

−0.01890

0.19479

−0.00855

0.14736

−0.01008

0.14415

−3

β2

0.02510

0.27671

0.01539

0.19988

0.01425

0.19812

0.4

ρ

−0.06389

0.12606

−0.03857

0.13287

−0.03894

0.13188

1

σ2

−0.00437

0.31009

0.00854

0.31048

0.00844

0.31021

25

β0

−0.05940

0.71123

−0.02884

0.58101

−0.03091

0.57806

2

β1

0.00548

0.08347

0.00200

0.06687

0.00119

0.06526

−3

β2

0.6

ρ

1

0.00143

0.05789

−0.00277

0.06465

0.00011

0.04179

−0.09134

0.12620

−0.06035

0.12611

−0.06105

0.12481

σ2

0.01040

0.31521

0.02710

0.32401

0.02685

0.32391

25

β0

0.03921

1.07482

0.03933

1.07470

−0.02160

0.78781 0.05722

2

β1

0.00634

0.10816

0.00703

0.10940

0.00269

−3

β2

−0.00990

0.07055

−0.00803

0.08240

−0.00065

0.03973

0.8

ρ

−0.03500

0.04433

−0.03492

0.04464

−0.01120

0.04573

1

σ2

0.13517

0.36275

0.13518

0.36274

0.15835

0.37500

123


939

Table 2 Simulation results, n = 100 True values

OLSE Bias

SLSE SE

Bias

GLSE SE

Bias

SE

25

β0

0.022251

0.399331

0.020763

0.385956

0.020763

0.385955

2

β1

0.000149

0.052535

−0.000055

0.049943

−0.000051

0.049938

−3

β2

−0.002324

0.033754

−0.002079

0.032925

−0.002082

0.032935

0.2

ρ

−0.025597

0.097289

−0.019103

0.100738

−0.019103

0.100739

1

σ2

−0.022733

0.195856

−0.021296

0.196248

−0.021296

0.196248

25

β0

−0.034181

0.434496

−0.028483

0.392330

−0.028482

0.392327

2

β1

0.003453

0.054821

0.002644

0.046731

0.002645

0.046734

−3

β2

0.000398

0.036123

0.000169

0.032599

0.000169

0.032602

0.4

ρ

−0.029533

0.087968

−0.017121

0.089937

−0.017121

0.089937

1

σ2

−0.012052

0.218836

−0.007288

0.220121

−0.007288

0.220120

25

β0

−0.010551

0.544125

−0.008302

0.426951

−0.008670

0.426403

2

β1

0.000264

0.063288

−0.000753

0.049277

−0.001776

0.044151

−3

β2

0.6

ρ

1

0.001596

0.040270

0.004507

0.046846

0.002713

0.027569

−0.044281

0.083945

−0.028193

0.085033

−0.028663

0.083764

σ2

0.020653

0.225843

0.030235

0.230371

0.030088

0.230230

25

β0

−0.074765

0.785222

−0.074756

0.785230

−0.051382

0.543196

2

β1

0.009274

0.078726

0.009792

0.079047

0.000420

0.039020

−3

β2

−0.003521

0.054461

−0.001325

0.068416

−0.000956

0.026126

0.8

ρ

−0.059670

0.070302

−0.059591

0.070501

−0.041803

0.068644

1

σ2

0.059794

0.228647

0.059803

0.228645

0.074950

0.236140

4 Conclusion This paper extends the study of Wang and Leblanc (2008) on SLSE to regression models with autocorrelated errors. We have established the consistency and asymptotic normality properties of the proposed SLSE estimator. Performance of the SLSE has been studied and compared with the corresponding OLSE and GLSE. Based on a simulation study using a special case of linear regression with AR(1) errors, we have provided the results to show the superiority of SLSE. In particular, we have shown that the SLSE performs well in estimating the parameters of regression models giving small bias. In addition – for less correlated data SLSE can improve the performance over the OLSE. – for highly correlated data, the data become volatile and therefore the SLSE is less robust than the OLSE. These results indicate that for highly autocorrelated errors in regression, the SLSE method need to be robustified. This topic is a subject of a future research paper and will be completed in the near future.

123

940


Acknowledgments The financial support from DIKTI Indonesia via Program Academic Recharging (PAR) had greatly helped D. Rosadi to initiate this project in 2011. The financial support from Hibah Kompetensi in 2012 and 2013 is also gratefully acknowledged. This work has been completed while D. Rosadi was visiting the School of Mathematics and Statistics, The University of Sydney in 2011 and 2012. The authors would like to thank the anonymous referee and the editor of this journal for their constructive comments and useful suggestions to improve the quality and readability of this manuscript.

Appendices Appendix 1: Proof of Theorem 1 We will show that assumption of the theorem will fulfill the condition of Lemma 3 of Amemiya (1973). Assumption 1 implies that Q N (ψ) is measurable and continuous in ψ ∈ Γ with probability one. Using Assumption 2 and 3 and Cauchy Schwarz inequality we have Wi sup (Yi − f (Xi , β))2 ≤ 2Wi 1+δ Y 2+δ + 2Wi 1+δ sup | f (Xi , β)|2+δ i Γ

Ω

and 2 Wi sup Y 2 − f 2 (Xi , β)−γ (0) i Γ

≤ 3Wi 1+δ Yi4+δ + 3Wi 1+δ sup | f (Xi , β)|4+δ + 3Wi 1+δ sup (γ (0))2+δ Ω

Σ

which imply supΓ ρi (ψ)Wi ρi (ψ) ≤ Wi supΓ ρi (ψ) where E Wi 1+δ supΓ ρi (ψ)2+δ < ∞ for some δ > 0. It follows from the uniform strong law of large numbers and Domowitz N of White E(ρi (ψ)Wi ρi (ψ)) con(1984), Theorem 2.3. that N1 Q N (ψ) and Q(ψ) = N1 i=1 ¯ verge almost surely and uniformly for all ψ in Γ to the same limit, let say Q(ψ). Furthermore, (ρi (ψ) − ρi (ψ0 )) does not depend on Yi , we have E ρi (ψ0 )Wi ρi (ψ) = E E(ρi (ψ0 )|X i )Wi ρi (ψ) = 0 N E (ρi (ψ)−ρi (ψ0 )) Wi (ρi (ψ)−ρi (ψ0 )) . which implies Q(ψ) = Q(ψ0 ) + N1 i=1 It follows that Q(ψ) ≥ Q(ψ0 ) and by Assumption 4, equality holds if and only if a.s ψ = ψ0 . Thus by applying Lemma 3 of Amemiya (1973), we have ψˆ S L S E −→ ψ0 , as N → ∞. Appendix 2: Proof of Theorem 2 The first derivative ∂ Q N (ψ)/∂(ψ) exists by Assumption 5 and it has the first-order a.s Taylor expansion in Γ . Since ψˆ S L S E −→ ψ0 for sufficiently large N we have

123


941

∂ Q N (ψ0 ) ∂ 2 Q N (ψ˜ N ) ∂ Q N (ψˆ S L S E ) ˆ S L S E − ψ0 ) = + =0 ( ψ ∂ψ ∂ψ∂ψ ∂ψ

(5)

where ψ˜ N − ψ0 ψˆ S L S E − ψ0 . The first derivative of Q N (ψ) in Eq. (5) is given by

∂ρ (ψ) ∂ Q N (ψ) i =2 Wi ρi (ψ) ∂ψ ∂ψ N

i=1

where ⎛ ∂ f (Xi ,β) ∂β

⎜ ∂ρi (ψ) = −⎜ ⎝ ∂ψ

i ,β) 2 f (Xi , β) ∂ f (X ∂β

∂γ (0) ∂a ∂γ (0) ∂σ 2

0 0

⎞ ⎟ ⎟ ⎠

(6)

The second derivative of Q N (ψ) in Eq. (5) is given by N ∂vec(∂ρi (ψ)/∂ψ) ∂ρi (ψ) ∂ρi (ψ) ∂ 2 Q N (ψ) Wi =2 + (ρi (ψ)Wi ⊗ Iq+ p+1 ) ∂ψ∂ψ ∂ψ ∂ψ ∂ψ i=1

where ⎡

∂ 2 f (Xi ,β) ∂β∂β

⎢ ⎢ 0 ⎢ ⎢ ⎢ 0 ∂vec(∂ρi (ψ)/∂ψ) ⎢ ⎢ 2 = ⎢ i ,β) ∂ f (Xi ,β) + 2 ∂ f (X ⎢ 2 f (Xi , β) ∂ f (Xi ,β) ∂ψ ∂β ∂β∂β ∂β ⎢ ⎢ ⎢ 0 ⎢ ⎣ 0

0 0 0 0 ∂ 2 γ (0) ∂a∂a

0

⎤

⎥ ⎥ ⎥ ⎥ 0 ⎥ ⎥ ⎥ ⎥ 0 ⎥ ⎥ ⎥ ∂ 2 γ (0) ⎥ 2 ∂a∂σ ⎥ ⎦ 0

∂ 2 γ (0) ∂ 2 γ (0) ∂σ 2 ∂a ∂(σ 2 )2

(7)

2

i (ψ) i (ψ) ∂ρi (ψ) Here we obtain supΓ ∂ρ∂ψ Wi ∂ρ∂ψ Wi supΓ ∂ψ . Using Assumption 5, Cauchy Schwarz inequality, Eq. (6) and similar arguments as in proof of Theorem 2 in

Wang and Leblanc (2008) and Theorem 1 above, we can show 2+δ ∂ρ (ψ) i Wi 1+δ supΓ ∂ψ

where

∂ρ (ψ) 2 i Wi supΓ ∂ψ

2+δ ∂ρ (ψ) i E Wi 1+δ supΓ < ∞, for some δ > 0. ∂ψ

We also have

123

942


∂vec(∂ρ (ψ)/∂ψ) i sup ρi (ψ) Wi ⊗ Iq+ p+1 ∂ψ Γ ∂vec(∂ρ (ψ)/∂ψ) i (q + p + 1) Wi sup ρi (ψ) ∂ψ Γ ⎡ ⎛ ⎞⎤1/2 ∂vec(∂ρ (ψ)/∂ψ) 2 ⎠⎦ i (q + p + 1) ⎣ Wi sup ρi (ψ)2 ⎝Wi sup ∂ψ Γ Γ

Using Assumption 5, Cauchy Schwarz inequality, Eq. (7) and similar argument as in proof of Theorem 2 in Wang and Leblanc (2008) and Theorem 1 above, we can show for some δ > 0 ∂vec(∂ρ (ψ)/∂ψ) 2 ∂vec(∂ρ (ψ)/∂ψ) 2+δ i i 1+δ Wi sup sup ≤ Wi ∂ψ ∂ψ Γ Γ where ⎞ ∂vec(∂ρ (ψ)/∂ψ) 2+δ ⎠ i