Orthogonal Series Estimation in Nonlinear ... - Monash University

1 downloads 0 Views 554KB Size Report
Sep 30, 2015 - An asymptotic theory is established in both point–wise and the space ... (2007) also derive some limit theory for nonparametric estimation of ...
ISSN 1440-771X

Australia

Department of Econometrics and Business Statistics http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/

Orthogonal Series Estimation in Nonlinear Cointegrating Models with Endogeneity

Biqing Cai, Chaohua Dong, Jiti Gao

September 2015

Working Paper 18/15

Orthogonal Series Estimation in Nonlinear Cointegrating Models with Endogeneity Biqing Cai†‡ and Chaohua Dong†[ and Jiti Gao†



University of Bergen, Norway‡ and Monash University, Australia† and Southwestern University of Finance and Economics, China[

September 30, 2015

Abstract This paper considers a nonlinear time series model associated with both nonstationarity and endogeneity. The proposed model is then estimated by a nonparametric series method. An asymptotic theory is established in both point–wise and the space metric sense for the estimator. The Monte Carlo simulation results show that the performance of the proposed estimate is numerically satisfactory. Key words: Cointegration, endogeneity, Hermite functions, series estimator, unit root JEL Classification Numbers: C14; C22; G17.



Corresponding author: Jiti Gao, Department of Econometrics and Business Statistics, Monash University,

Caulfield East, Victoria 3145, Australia. Email: [email protected].

1

1

Introduction

Since Engle and Granger (1987), the concept of cointegration has become popular in economics because cointegration relationships are often used to describe economic variables which share some common stochastic trends or have long–run equilibrium relationships. However, the idea that every small deviation from the long–run equilibrium will lead instantaneously to error correction mechanisms is implicit in the definition. Nonetheless, as argued by Balke and Fomby (1997), the presence of fixed costs of adjustment may prevent economic agents from adjusting continuously, thus the movement towards the long–run equilibrium need not occur in every period such that linear cointegration may fail. Also, there is consensus in econometrics that nonlinearity is now the norm, rather than the exception (as discussed in Granger 1995; Gao 2007; Ter¨asvirta et al. 2010, for example). Misspecifying a linear cointegration model may lead to non-finding of cointegration. Recently, nonlinear cointegration models have become a hot topic in econometrics. Park and Phillips (1999) discuss asymptotics for nonlinear transformation of unit root process and Park and Phillips (2001) for nonlinear regression with a unit root process. Furthermore, asymptotic properties for nonparametric estimation for nonlinear cointegration models have been derived by Wang and Phillips (2009a,b). Meanwhile, Karlsen and Tjøstheim (2001) and Karlsen et al. (2007) also derive some limit theory for nonparametric estimation of nonlinear cointegration based on different assumptions on the data generating process and different mathematical techniques. Chen et al. (2012) consider estimation issues in a partially linear model with nonstationary regressors. Gao and Phillips (2013) consider semiparametric estimation in triangular system equations with nonstationarity and endogeneity. In addition to the kernel–based estimation proposed in the literature, the series estimation method is another commonly used estimation method. When the data are either independent and identically distributed or stationary, estimation theories based on series estimation methods have been discussed in Andrews (1991), Newey (1997), Chen and Shen (1998) and Gao (2007) for example. However, as far as we know, when the data set is assumed to be unit root nonstationary, there are only a couple of studies based on series estimation. Dong and Gao (2013, 2014) were among the first considering series expansion for nonstationary data. Dong and Gao (2013) discuss series expansion for L´evy processes which can be considered as an orthogonal series expansion based on time varying probability densities. By contrast, we propose using a Hermite series expansion which is orthogonal with respect to Lebesgue density without specifying the distribution of the innovation to unit root process. Thus, we allow for much more general data generating assumptions. It is well known that the series estimation has some advantages over the kernel–based estimation. For example, it is easy to impose some types of restrictions, such as additive separability. It is also computationally convenient. In this paper, we consider a class of integrable regression models and propose using a Hermite series estimation method for such a class of cointegration models where the time series

2

regressor is nonstationary and endogenous with the error process. Without necessarily using an instrumental variable approach, we show that the proposed nonparametric series estimator is still asymptotically consistent and normally distributed under such a type of endogeneity. The nonparametric series based approach under endogeneity complements an existing kernel based method by Wang and Phillips (2014). It should be pointed out that while similar asymptotic results, such as Theorem 3.1 and Corollary 3.1 listed in Section 3.2 below, may be obtained by either the kernel or the series based method, both the establishments and the proofs of the asymptotic results are quite different. It should also be pointed out that while the class of integrable models may be restrictive, such integrable models have their own empirical applications for appropriately balancing the relationship between a stationary time series on the left–hand side and a highly nonstationary regressor on the right–hand side (see, for example, Marmer (2008)). Meanwhile, we establish an asymptotic distributional theory for a matrix of partial sums of nonlinear nonstationary time series in Theorem 3.2 listed in Section 3.3 below. Such an asymptotic result is generally applicable to deal with the inverses of matrices of unit root nonstationary time series. As a consequence, we are able to establish some uniform consistency results and an asymptotic normality for the series based estimator with a rate of T −1/4 p1/2 , where p is the truncation parameter involved in the series approximation and T is the sample size. The organisation of this paper is as follows. In Section 2, we propose the model and discuss its estimation and assumptions. In Section 3, we derive the uniform consistency and asymptotic normality of the series estimator. In Section 4, we conduct Monte Carlo simulation to evaluate the finite sample performance of the nonparametric series estimator. Section 5 discusses potential extension, followed by Section 6 that concludes the paper. Several lemmas are present in Appendix A, which are crucial for the proof of our main results in Appendix B. The proofs of the lemmas listed in Appendix A, as well as a detailed proof of one theorem, are given in Appendix C of the paper. Throughout this paper, →D , →P and →a.s. denote convergence in distribution, in probability and almost surely, respectively. For a vector k·k stands for the Euclidean norm and for a matrix R P Pn 2 A = (aij )n×m , kAk2 = m g(x)dx stands for an integral over (−∞, +∞). i=1 j=1 aij .

2 2.1

Model estimation and assumptions Preliminaries of the Hermite functions

In this paper, we use the Hermite functions to estimate square integrable functionals of a unit root process. Let {Hi (x)}∞ i=0 be the Hermite polynomial system orthogonal with respect to the weight function exp(−x2 ) given by Hi (x) = (−1)i exp(x2 )

di exp(−x2 ), dxi 3

i ≥ 0.

(2.1)

It is known that {Hi (x)}∞ i=0 is a complete orthogonal system in the Hilbert space Z 2 2 2 L (R, exp(−x )) = {g(x) : g 2 (x)e−x dx < ∞} satisfying the orthogonality

R

2

Hi (x)Hj (x)e−x dx =

√ i π2 i!δij , where R = (−∞, ∞) and δij is

the Kronecker delta function. Define 1 Fi (x) = √ √ Hi (x) exp(−x2 /2), 4 π 2i i!

i ≥ 0.

(2.2)

Then, {Fi (x)}∞ i=0 is the so-called Hermite series or Hermite functions in the literature, comR R plete orthonormal in L2 (R) = {g(x) : g 2 (x)dx < ∞} satisfying Fi (x)Fj (x)dx = δij . Consequently, any continuous function f (x) ∈ L2 (R) has an infinite orthogonal series expansion Z ∞ X f (x) = θi Fi (x), where θi = f (x)Fi (x)dx. (2.3) i=0

Moreover, Fi (x) is bounded uniformly in both i and x ∈ R (see Szego, 1975, p. 242).

2.2

Model estimation and assumptions

Consider a nonparametric regression model of the form yt = f (xt ) + et , xt = xt−1 + vt , t = 1, 2, · · · , T,

(2.4)

where vt is a stationary linear process, x0 = OP (1), et is also a stationary linear process, and f (·) ∈ L2 (R). In view of (2.3), for each t we have yt = Zpτ (xt )θ + γp (xt ) + et , where p is some ∞ P θj Fj (·) a residue positive integer, Zpτ (·) = (F0 (·), ..., Fp−1 (·)), θτ = (θ0 , · · · , θp−1 ) and γp (·) = j=p

after truncation, or in a matrix form, Y = Zθ + γ + e,

(2.5)

where Y τ = (y1 , · · · , yT ), Z = (Zp (x1 ), · · · , Zp (xT ))τ a T × p matrix, γ = (γp (x1 ), · · · , γp (xT ))τ and e = (e1 , · · · , eT )τ . Hence, by the ordinary least–square (OLS) method θ is estimated by θb = (Z τ Z)−1 Z τ Y.

(2.6)

b To Then, naturally the series estimator of function f (x) for any x ∈ R is fb(x) = Zpτ (x)θ. proceed further, we introduce the following technical assumptions. Assumption 1. (a) Let {j , j ∈ Z} be a sequence of independent and identically distributed (iid) continuous random variables satisfying E0 = 0, E20 = 1 and E40 < ∞. Let ϕ(u) be R the characteristic function of 0 satisfy |u ϕ(u)|du < ∞. (b) Let {vt } be a linear process defined by vt = P and ∞ j=0 j|ψj | < ∞. 4

P∞

j=0

ψj t−j , where ψ0 = 1, ψ :=

P∞

j=0

ψj 6= 0

(c) Let xt = xt−1 + vt with x0 = OP (1). Let et = P∞ j=0 j|φj | < ∞. (d) For any given u ∈ R, define h(u) =

P∞

j=0

φj t−j with φ0 = 1,

P∞

j=0

φj 6= 0 and

ϕ0 (u) . ϕ(u)

Suppose that there is a nonnegative function k(λ) R∞ Q such that maxj≥0 |h(λφj )| ≤ k(λ) and −∞ k(λ) |Γ(λ)| dλ < ∞, where Γ(λ) = ∞ i=0 ϕ(λ φi ) is the characteristic function of et .

Condition (a) shows the requirement for the underlying process {j , j ∈ Z} that determines the properties of the regressor and the error term. The moment conditions are commonly used in the literature. The integrability of |λϕ(λ)| in (1) is about to derive some properties for the density functions related to xt , and the condition for h(u) is satisfied in many cases, such as symmetric stable variables with α ∈ [1, 2], in which h(u) = C1 uα−1 and k(u) = C2 uα−1 for some finite C1 and C2 . Meanwhile, Assumption 1(d) is also satisfied with the case where φ(u) = 2/(eu + e−u ) and then h(u) =

e−u −eu eu +e−u

and k(u) = 1 (Lukacs, 1970, p.88).

The regressor xt is integrated by the linear process vt , while the linear processes vt and et have the same i.i.d. sequence {j , j ∈ Z} as building blocks. The endogeneity of the structural cointegration model is incurred accordingly. While the same type of endogeneity is used in Wang and Phillips (2014) for the kernel estimation method, the estimation method as well as the establishment and the proof of the main results in this paper are quite different from the kernel method case. By the Beveridge-Nelson decomposition (Phillips and Solo, 1992, p. 972), vt = ψt +˜ vt−1 −˜ vt P P∞ ˜ ∞ ˜ ˜t is a stationary process since where v˜t = k=j+1 ψk . Note that v j=0 ψj t−j with ψj = P∞ ˜ 2 j=0 |ψj | < ∞ due to (b) of Assumption 1. A similar condition is used in Phillips and Solo p √ P (1992). It follows that xt = ψ tj=1 j + v˜0 − v˜t and hence dt := Ex2t = |ψ| t(1 + o(1)). Define, for 0 ≤ u ≤ 1, WT (u) =

1 x[T u] . dT

(2.7)

It is known that WT (u) →D B(u), a standard Brownian motion. Straightforwardly, E[xt et ] = P Pt ˜ ψ j=1 φt−j + E[˜ v0 et ] − E[˜ vt et ] where |E[˜ v0 et ]| < t−1 and E[˜ vt et ] = ∞ j=0 ψj φj is a constant. −1 Generally, E[xt et ] 6= 0 but E[d−1 t xt et ] → 0 as t → ∞. This implies dt xt and et are asymptoti-

cally uncorrelated for large t. More importantly, in Lemma A.5 below we claim that d−1 t xt and es are asymptotically independent for all large t and s. Our asymptotic theory is built upon the asymptotic independence. Meanwhile, our asymptotic theory relies on the local time process LB (t, s) of B(u) defined by 1 LB (t, s) = lim ε→0 ε

Zt I{|B(u) − s| < ε}du,

(2.8)

0

where I(A) denotes the conventional indicator function. Roughly speaking, the local time can be interpreted as a spatial occupation density in s for Brownian motion B(u). The local time is a key tool in studying the intersection of nonlinearity and nonstationarity, e.g., Park

5

and Phillips (1999, 2001), Wang and Phillips (2009a). Phillips (2001) provides some examples where the tool of local time can be used to analyse economic time series which is called “spatial analysis of time series”. Assumption 2. Let f (x) ∈ L2 (R) be differentiable. Moreover, there exists a positive integer r such that xi f (r−i) (x) ∈ L2 (R) for all i = 0, · · · , r. Assumption 2 requires that f (x) is sufficiently smooth with the thin tail such that the orthogonal expansion converges with a fast rate. See Lemma A.3 in Appendix A. The same assumption in a different form is used by Lemma 3 of Schwartz (1967). The classes of f includes Gaussian functions, Laplace functions and functions with compact support. Assumption 3. Let the truncation parameter p of the Hermite series expansion satisfy p = [c · T α ] where c > 0 is a constant and

1 2(r−1)

< α < 15 .

Assumption 3 restricts the truncation parameter p to guarantee the convergence of the regression matrix Z τ Z and the smoothness order r to ensure the truncation residue γp (·) does not affect the limit distribution studied below. The condition for r and α also implies r > 72 , which can be satisfied by r ≥ 4 in Assumption 2.

3

Asymptotic theory

3.1

Consistency of series estimator

In this subsection, we discuss the asymptotic consistency of the series estimator. Lemma 3.1. Under Assumptions 1–3, we have as T → ∞, kθb − θk = oP (1), and supx |fb(x) − f (x)| = oP (1). Lemma 3.1 shows that the estimated coefficients converge to the true coefficients and the series estimator fb(x) for f (x) has a uniform convergence. When data are stationary time series, polynomials or splines are usually used as basis functions, e.g., in Andrews (1991), Newey(1997), and Gao (2007). In their cases, the uniform consistency is usually based on more restrictive assumptions than those for the point–wise consistency. By contrast, in nonparametric and nonstationary context it is very difficult, if not impossible, to obtain a uniform convergence on the entire real line using kernel method. Gao et al. (2009), Chan and Wang (2014) and Wang and Chan (2014) study the uniform convergence that, however, happen in a compact domain of the real line. In our study, due to the uniform boundedness of Hermite series, the uniform consistency requires the same conditions as those for the point–wise consistency. This is one of advantages that series estimation has in comparison with kernel estimation.

6

3.2

Asymptotic distribution

In this subsection, we shall establish asymptotical distribution for the series estimator. There are two kinds of approximation of fb(x) to f (x): one is pointwise, fb(x) − f (x) = Z τ (x)(θb − θ) − p

2

γp (x) for any x ∈ R; another one is in the L -sense, kfb(x)−f (x)k2L2 (R) = kθb−θk2 +kγp (x)k2L2 (R) , 1/2 R 2 where by definition kg(x)kL2 (R) = g (x)dx the norm of g(x) ∈ L2 (R). The following theorem gives the asymptotic distribution of fb(x) in both the pointwise and L2 -norm sense. Theorem 3.1. Under Assumptions 1–3, we have as T → ∞ s  LB (1, 0) T  b f (x) − f (x) →D N (0, 1), σe2 kZp (x)k2 dT and moreover

s

1 T b −1/2 kf (x) − f (x)kL2 (R) →D LB (1, 0), · p dT

σe2

(3.1)

(3.2)

where LB (1, 0) is a local–time random variable with its cumulative distribution function being given by   2Φ(x) − 1, x ≥ 0, FL (x) = P (LB (1, 0) ≤ x) =  0, x < 0,

(3.3)

in which Φ(x) is the cdf of N (0, 1). Since kZp (x)k2 = O(p) uniformly in x and dT = O(T 1/2 ), the rates of convergence of the series estimator in both pointwise and L2 -norm sense are (T 1/4 p−1/2 )−1 . Meanwhile, the rate of convergence of the kernel estimator is (T 1/4 h1/2 )−1 (see, for example, Wang and Phillips 2014), where h is the bandwidth parameter. Thus, they are equivalent when we replace h by p−1 . Note also that there are three nuisance parameters involved in the large sample theory √ of (3.1), namely, ψ in dT = |ψ| T (1 + o(1)), σe2 and the local time LB (1, 0), which should be replaced by their consistent estimates. However, noting the structure of LB (1, 0)/dT in (3.1) and P the limit dTT Tt=1 φ(xt ) →P LB (1, 0) in a rich probability space where φ(x) = √12π exp(−x2 /2), P we may estimate the ratio of LB (1, 0)/|ψ| by √1T Tt=1 φ(xt ). Moreover, we estimate σe2 by σ be2

T 1X 2 := eb , T t=1 t

where ebt := yt − fb(xt ).

(3.4)

It is also possible to estimate ψ individually if we stipulate a parametric structure for the linear process vt in Assumption 1. See Dong and Gao (2014) for the details. Thus, in practice the limit in (3.2) can also be used for inference by noting that LB (1, 0) follows the same distribution as |N | where N is a standard normal variable. Nonetheless, we focus only on (3.1) since the limit is normal and it does not need an estimate of ψ.

7

Corollary 3.1. Under Assumptions 1–3, we have as T → ∞, σ be2 →P σe2 ,

√1 T

PT

t=1

φ(xt ) →P

LB (1, 0)/|ψ|; consequently, 1 σ be kZp (x)k

v u T   uX t b φ(xt ) f (x) − f (x) →D N (0, 1).

(3.5)

t=1

The proofs of Lemma 3.1, Theorem 3.1 and Corollary 3.1, which are given in Appendix B, employ an asymptotic approximation of the regression matrix Z τ Z by a diagonal matrix listed in Theorem 3.2 in the next subsection.

3.3

Asymptotic property of Z τ Z

As mentioned in the introductory section and seen in the above discussion, the least squares estimator of θ involves an inverse matrix of Z τ Z, which causes both theoretical and computational difficulties. In the literature, such difficulties are avoided through using a transformed version of θb of the form θe = Z τ Z · θb (see, for example, Dong and Gao 2014). As a consequence, it b although a rate of convergence of θe is available. is difficult to obtain a rate of convergence for θ, Therefore, we tackle this difficulty by studying the convergence of

dT T

Z τ Z directly.

Theorem 3.2. Let p = [c · T α ] for c > 0 and 0 < α < 51 . Suppose that Assumption 1 holds. Then, in an expanded probability space, we have as T → ∞

dT τ

Z Z − LB (1, 0)Ip →P 0,

T

(3.6)

where Ip is an identity matrix of dimension p × p. It follows from the definition of Z that !2 2 p−1 p−1 T X dT X X dT τ 2 Z Z − LB (1, 0) Ip = F (x ) − L (1, 0) + t B T T t=1 i i=0 i6=j=0

T dT X Fi (xt )Fj (xt ) T t=1

!2 .

Since p → ∞, existing results (Wang and Phillips 2009a, 2011, for example) regarding all terms in the bracket are not applicable. Thus, the proof of Theorem 3.2 is not trivial because the key steps used in deriving the rates of convergence for the terms in the bracket use new ideas and various properties about the orthogonal series. As frequently encountered in the nonparametric nonstationary series estimation context, Theorem 3.2 is of independent interest. The implication is that the regression matrix Z τ Z for the parameterized model after normalization is asymptotically a diagonal matrix with LB (1, 0) at its diagonal, and hence the eigenvalues satisfy λmin ( dTT Z τ Z) = LB (1, 0) + oP (1) and λmax ( dTT Z τ Z) = LB (1, 0) + oP (1). Our experience suggests that such convergence itself may be applicable to significantly simplify the construction of existing estimation and specification procedures, such as those discussed in Dong and Gao (2013, 2014). The proof of Theorem 3.2 is given in Appendix C of the supplementary material. In Section 4 below, we examine the finite–sample performance of the series estimation. 8

4

Simulation study

In this section, we conduct Monte Carlo experiments to assess the finite sample performance of the proposed nonparametric series estimator. The data generation procedure is as follows. Let {t , et } be an independent and identically distributed sequence, {t , et } ∼ N (0, Σ) with  Σ = 0.12 ρ1 ρ1 . The regressor xt is integrated by an AR(1) process vt , i.e. xt =xt−1 + vt

and vt = 0.2 vt−1 + t ,

where x0 = OP (1). The following models are used to investigate the performance: 1 + et , t = 1, ..., T ; 1 + x4t Model 2 : yt = (1 + sin(xt )) exp(−x2t /2) + et ,

Model 1 : yt =

(4.1) t = 1, ..., T.

(4.2)

We shall consider two cases for ρ: ρ = 0, implying the case of exogeneity, and ρ = 0.9, implying the existence of endogeneity.

4.1

Bias and standard deviation

Let T = 400, 800, 1200 and 1800 be the sample sizes. The number of replications is 2000. Using a generalised cross–validation method proposed in Gao et al. (2002), the truncation parameter is chosen as p = [2 · T 1/8 ] such that it varies along with the sample size and satisfies the theoretical requirement in Assumption 3. The sample bias, standard deviation (Std) and root mean square error (RMSE) are defined by N T 1 1 X X b Bias = f (xn,t ) − f (xn,t ) , N T n=1 t=1

!1/2

Std =

N T 2 1 1 XXb b f (xn,t ) − f (xn,t ) N T n=1 t=1

!1/2

RMSE =

T N 2 1 1 XXb f (xn,t ) − f (xn,t ) N T n=1 t=1

,

,

respectively, where (xn,1 , · · · , xn,T ) denotes the simulated data in n − th replication, and by ¯ ¯ which fb(·) is the series estimator of the regression function, and fb(·) = Zp (·)τ θb with θb being the average of θbn over Monte Carlo replications, and N is the number of replications. The results of the simulation are summarised in Table 1. It should be pointed out that the sample size of simulation for nonstationary integrable regression models usually has to be much larger than that for stationary regression models. The reason is the slower rate of convergence in the former case. It can be seen from Table 1 that both the bias and the standard deviation decrease with the increase of the sample size. These verify the approximation of the proposed estimator 9

Table 1: Simulation Results for Bias, Std and RMSE ρ=0

Bias

Std

RMSE

ρ = 0.9

T

Model 1

Model 2

Model 1

Model 2

400

0.0819

0.0146

0.0899

0.0465

800

0.0714

0.0132

0.0774

0.0333

1200

0.0551

0.0121

0.0589

0.0221

1800

0.0256

0.0110

0.0250

0.0067

400

0.0904

0.0218

0.1037

0.0538

800

0.0803

0.0194

0.0901

0.0385

1200

0.0620

0.0185

0.0680

0.0293

1800

0.0265

0.0171

0.0289

0.0104

400

0.0583

0.0471

0.0586

0.0473

800

0.0297

0.0080

0.0293

0.0075

1200

0.0288

0.0071

0.0283

0.0066

1800

0.0252

0.0061

0.0247

0.0054

to the true regression function. Comparing the results of the two models, however, Model 2 outperforms Model 1 in all sample sizes. According to our experience this may be mainly due to the difference in the tails of two regression functions, namely, the tail of 1/(1 + x4 ) is much heavier than that of (1+sin(x)) exp(−x2 /2). It is known that the heavier tail results in a slower convergence of the orthogonal series expansion. Consequently, Model 2 has better results than Model 1. Additionally, the results for the case of ρ = 0 and the case ρ = 0.9 have no evidence to show how they are different. Based on these results, the endogeneity does not affect the nonparametric estimate in the proposed models, and more importantly, this coincides with our theoretical findings in the preceding section.

4.2

Normal approximation and confidence interval curves

Corollary 3.1 gives the normality of our estimator fb(x) with all nuisance parameters estimated by the observation {(xt , yt ), t = 1, · · · , T }. Accordingly, we are able to construct the confidence interval at a significance level and any point. This section devotes to the visualization of the normality. To do so, using ksdensity function in MatLab we first estimate the density of a set of fb(x∗ ) − f (x∗ ) with normalization according to Corollary 3.1 for a particular point x∗ = 0 for

10

Model 2 with T = 200, 400 and 800 for ρ = 0 and N = 1000, where the truncation parameter is taken using the same formula as before, viz., p = [2 × T 1/8 ]. Figure 1: Normal density approximation and confidence interval curves 1.6

0.4 T=200 T=400 T=800 N(0,1)

0.35

true function fitted function upper bound lower bound

1.4 1.2

0.3

1 0.25

0.8 0.2

0.6 0.15

0.4 0.1

0.2

0.05 0 −4

0

−3

−2

−1

0

1

2

3

4

−0.2

(a) Normal density approximation

−2

−1

0

1

2

3

(b) Confidence interval curves

Technically, we only use the replications that both the numbers of observations less than and larger than zero are greater than 0.2 T . The reason is that, due to the divergence of the integrated data, it is possible that the generated data (xn1 , · · · , xnT ), where n corresponds to the n–th replication of the total number of replications, in one replication may be located mostly in one side of zero, which definitely gives a poor estimation of the density, particularly for the kernel method of ksdensity function in Matlab. Similar discussion is available in Section 5 of Karlsen et al. (2007). Figure 1a shows three estimated density curves corresponding to the different sample size T . It can be seen that the densities gradually approach to the standard normal density with the increase of the sample size. We may conclude that the theoretical result of the normality in Corollary 3.1 is verified in this experiment. Second, for significance level 95%, we draw for Model 2 the lower bound and upper bound P confidence curves based on the result of (3.5), namely, fb(x) ± 1.96 σ be kZp (x)k( T φ(xt ))−1/2 , t=1

where φ(·) is the density function of a standard normal variable. Here, T = 800 and p is the same as before. Figure 1b displays the true regression function, the estimated function averaging over replications and the confidence interval curves. As can be seen, the estimated curve fb(x) is located exactly between the lower bound and the upper bound, implying the reliability of inference based on our estimator.

5

Discussion

It is worthy to discuss potential extensions of our method to deal with models where regression functions are not in L2 (R). The following is a brief discussion on this issue. Consider yt = 2

f (xt ) + et where xt and et still satisfy Assumption 1 but f (x) ∈ L2 (R, e−x ). It follows that 2 f˜(x) := f (x)ϕ(x) ∈ L2 (R) where ϕ(x) = e−x /2 . This motivates multiplying the both sides of

11

the model by ϕ(xt ), giving y˜t = f˜(xt ) + e˜t ,

t = 1, · · · , T,

(5.1)

where y˜t = yt ϕ(xt ), and e˜t = et ϕ(xt ). Now, model (5.1) is completely the same as model (2.4). Expand f˜(x) into orthogonal series in terms of {Fi (x)}: f˜(x) =

∞ X

θ˜i Fi (x) =

Zpτ (x)θ˜ +

γ˜ (x),

with θ˜i =

Z

f˜(x)Fi (x)dx

(5.2)

i=0

P ˜ where for any p ≥ 1, θ˜ = (θ˜0 , · · · , θ˜p−1 )τ , Zp (x) is the same as before and γ˜p (x) = ∞ i=p θi Fi (x). Suppose further that f˜(x) and truncation parameter p satisfy Assumptions 2 and 3. We are able to have an estimator of f˜(x) following exactly the same procedure as in Section 2.2, ˜ fb˜(x) = Zpτ (x)b θ,

where b θ˜ = (Z τ Z)−1 Z τ Y˜ ,

(5.3)

˜ Z is the same as before and Y˜ = (˜ in which b θ˜ is an estimate of θ, y1 , · · · , y˜T )τ . Denote for later use that e˜ = (˜ e1 , · · · , e˜T )τ and γ˜ = (˜ γp (x1 ), · · · , γ˜p (xT ))τ . To derive the asymptotic distribution of fb˜(x), notice that, for any x ∈ R, fb˜(x) − f˜(x) = ˜ − γ˜ (x) and θ˜ − θ) Z τ (x)(b p

kfb˜(x) − f˜(x)k2L2 (r) =

Z

˜ 2+ θ˜ − θk [fb˜(x) − f˜(x)]2 dx = kb

Z

γ˜p2 (x)dx

Z −2

˜ γp (x)dx θ˜ − θ)˜ Zpτ (x)(b

˜ 2 + k˜ =kb θ˜ − θk γ (x)k2L2 (R) . Hence, following a similar fashion we may be able to establish the asymptotic distribution of fb˜(x) in both the point–wise and the L2 sense. Meanwhile, it is possible to extend the approach in Sections 2 and 3 to a partially linear single–index model of the form: yt = xτt β0 + f (xτt θ0 ) + et , where xt is a vector of integrated time series, (β0 , θ0 ) is a vector of unknown parameters and f (·) is an unknown integrable function. In empirical applications, a vector of macro-economic time variables, such as the income and real interest rate variable, may be chosen as xt and yt can be the expenditure variable when are interested in establishing the relationship between yt and xt . In order to establish similar results to Theorem 3.1 and Corollary 3.1, some new techniques may be needed. We therefore wish to leave such extensions to future research.

6

Conclusions

In this paper, we have established the uniform consistency and asymptotic distribution in both the point–wise and L2 sense for the Hermite series estimator of the proposed integrable cointegration model accommodating endogeneity. The endogeneity is of a general form. Possible extensions from integrable models to non-integrable models have been discussed. The finite sample experiments show that the proposed series estimator performs well for models satisfying our assumptions. 12

Nonetheless, there are some problems that may be studied in our future research. The choice of the truncation parameter should be discussed in more detail and a data driven choice of the truncation parameter should be investigated. The theory may be extended to an additive multivariate model with both stationary and nonstationary regressors or a partially linear cointegration model.

7

Acknowledgements

The original version of this paper was presented at a lunch time seminar in the Department of Econometrics and Business Statistics, Monash University, Australia. The first author acknowledges some useful comments by the audience. The authors also acknowledge comments by Farshid Vahid and the financial support by the Australian Research Council Discovery Grants Program under Grant Numbers: DP1096374 and DP1314229.

A

Lemmas

Five useful lemmas are given in this section. All their proofs can be found in Appendix C of this paper. Throughout the rest of this paper, we use 0 < C < ∞ to denote a generic constant which may have different values at different places. Meanwhile, we use || · ||L2 to simplify || · ||L2 (R) in the proofs. We shall consider several versions of decomposition for xt . Without loss of generality, in what follows let x0 = 0 almost surely. It follows that xt =

t X `=1

v` =

t ` X X

ψ`−i i =

`=1 i=−∞

t X





t X

ψ`−i  i =:

 i=−∞

t X

bt,i i .

i=−∞

`=max(1,i)

Let j ≤ t be fixed. Thus we have xt = bt,j j + xt/j ,

with xt/j :=

t X

bt,i i ,

(A.1)

i=−∞,6=j

where xt/j is the variable deducting the term containing j in xt . Obviously, xt/j and j are mutually independent. Additionally, letting 1 ≤ s < j ≤ t, xt also has the following decomposition: xt =x∗s + xts = x∗s + bt,j j + xts/j ,

(A.2)

P P where x∗s = xs + x ¯s with x ¯s = ti=s+1 sa=−∞ ψi−a a containing all the information available up to Pt Pt s and xts = i=s+1 bt,i i , while obviously xts/j = i=s+1,6=j bt,i i . Evidently, xts captures all the information contained in xt on the time periods (s, t], while xts/j captures all information contained in xt on the time periods (s, j) ∪ (j, t]. Let dts := (Ex2ts )1/2 throughout the rest of this paper. Moreover, x ¯s = OP (1) by virtue of Assumption 1. Lemma A.1. Suppose that Assumption 1 holds. For t or t − s is large,

13

(1) d−1 t xt have uniformly bounded densities ft (x) over all t and x satisfying a uniform Lipschitz condition supx |ft (x + y) − ft (x)| ≤ C|y| for any y and some constant C > 0. In addition, supx |ft (x) − φ(x)| → 0 as t → ∞ where φ(x) is the standard normal density function. Let 1 ≤ s < t. d−1 ts xts , where xts is given by (A.2), have uniformly bounded densities fts (x) over all t, s and x satisfying the above uniform Lipschitz condition as well. (2) Let j ≤ t. d−1 t xt/j , where xt/j is given by (A.1), have uniformly bounded densities ft/j (x) over all t, j and x satisfying uniform Lipschitz condition in the above form. Let 1 ≤ s < j ≤ t. d−1 ts xts/j , where xts/j is given by (A.2), have uniformly bounded densities fts/j (x) over all t, j and s, x satisfying the above uniform Lipschitz condition as well. Lemma A.2. Suppose that Assumption 1 holds. Let j be a fixed integer and j ≤ t. For any functions R U and g : R 7→ R such that |U (w)|dw < ∞ and E|j g(j )| < ∞, and for large t or t − s, we have (1) E[U (xt )g(j )] = E[U (xt/j )]E[g(j )] + cU 1t where xt/j defined by (A.1) is independent of j , R cU is a quantity such that |cU | ≤ C E|j g(j )| |U (w)|dw. In particular, if Eg(j ) = 0, then E[U (xt )g(j )] = cU 1t ; (2) E|U (xt )g(j )| ≤ C

1 √ E|g(j )| t

R

|U (w)|dw;

(3) For any `: j 6= ` ≤ t, E[U (xt )g(j )|` ] = E[U (xt/j )|` ]E[g(j )] + 1t η` where xt/j is defined by R (A.1), η` is a random variable depending on ` such that |η` | ≤ C |U (w)|dw almost surely. If E[g(j )] = 0, then E[U (xt )g(j )|` ] = 1t η` . Meanwhile, E[|U (xt )g(j )||` ] ≤ C

1 √ E|g(j )| t

R

|U (w)|dw almost surely.

1 (4) For 1 ≤ s < j ≤ t, E[U (xt )g(j )|Fs ] = E[g(j )]E[U (xt/j )|Fs ] + t−s ξs , where |ξs | ≤ O(1)E|j g(j )| R R 1 |U (x)|dx almost surely; meanwhile, E[|U (xt )g(j )||Fs ] ≤ O(1) √t−s E[|g(j )|] |U (w)|dw a.s..

Lemma A.3. (1) (i) supx kZp (x)k2 = O(1)p; (ii) R (iv) x2 Fi2 (x)dx = (2i + 1)/2.

R

kZp (x)k2 dx = p; (iii)

R

kZp (x)kdx = O(1)p11/12 ;

 R (2) Let Assumption 2 hold. Then, (i) supx |γp (x)| = o p−(r−1)/2−1/12 ; (ii) γp2 (x)dx = o(1)p−r . Lemma A.4. Zpτ (x)Zp (y) → δ(x − y) as p → ∞, where δ(u) is the Dirac delta function. The Dirac delta function δ(u) is a generalized function satisfying δ(u) = 0 for any u 6= 0 and R

δ(u)du = 1. See Kanwal (1983, p. 5).

Lemma A.5. Let mT be a sequence such that mT → ∞ and mT /T → 0 as T → ∞. Let {aj } be any P sequence of nonnegative real numbers satisfying Ti=mT ai = 1. (1) For any s ≥ 1 and t ≥ mT , es and d−1 t xt are asymptotically independent. Consequently, for any given k and t ≥ mT , k and d−1 t xt are asymptotically independent. PT (2) For any s ≥ 1, es and j=mT aj d−1 j xj are asymptotically independent.

14

B

Proofs of the main results

This appendix gives the proofs of Lemma 3.1, Theorem 3.1 and Corollary 3.1. Proof of Lemma 3.1. Notice by (2.6) and Theorem 3.2 that θb − θ =(Z τ Z)−1 Z τ Y − θ = (Z τ Z)−1 Z τ (e + γ) =(1 + oP (1))L−1 B (1, 0)

dT τ Z (e + γ) := (1 + oP (1))L−1 B (1, 0)(A1T + A2T ), T

where we will ignore the small order “oP (1)” for notational simplicity in the rest of the derivations. We will also use “O(1)” to denote any finite positive constant. Observe that d2 kA1T k2 = T2 T

T

2

X

d2

Zp (xt )et = T2

T



T X

T X

Zp (xt )et Zp (xt )et t=1 t=1 t=1 T T t−1 d2 X d2 X X τ = T2 kZp (xt )k2 e2t + 2 T2 Zp (xt )Zp (xs )et es T T t=1 t=2 s=1 T T T t−1 2 2 X X dT d2T X X τ 2 2 2 2 2 dT kZp (xt )k + 2 kZp (xt )k (et − σe ) + 2 2 Zp (xt )Zp (xs )et es =σe 2 T T T t=1 t=1 t=2 s=1

:=A11T + A12T + 2A13T ,

say.

Using the density ft (x) of d−1 t xt and its uniform boundedness, we have Z T T d2T X d2T X −1 2 EkZ (x )k = d kZp (x)k2 ft (d−1 p t t t x)dx T2 T2 t=1 t=1 Z T 1 X −1 dt kZp (x)k2 dx = O(1)T −1/2 p, ≤O(1) T t=1

implying that A11T = OP (T −1/2 p). For the second term A12T , notice that e2t − σe2 =

P∞

2 2 j=0 φj (t−j

− 1) +

P∞

j1 =0,6=j2

φj1 φj2 t−j1 t−j2 .

Moreover, by (1) and (3) of Lemma A.2 and conditional argument, we have T t d2T X X 2 φt−j |E[kZp (xt )k2 (2j − 1)]| T2

|E[A12T ]| ≤

t=1 j=−∞

j1 −1 T t d2 X X X + T2 |φt−j1 φt−j2 E[kZp (xt )k2 j1 j2 ]| T t=1 j1 −∞ j2 =−∞

≤ O(1)

Z T t 1 X −2 X 2 dt φt−j kZp (x)k2 dx T t=1

+O(1)

1 T

T X t=1

d−2 t

j=−∞ t X

jX 1 −1

Z |φt−j1 φt−j2 |

j1 −∞ j2 =−∞

For A13T , notice that, for t > s, et es =es

t X j=s+1

φt−j j + es

s X

φt−j j

j=−∞

15

1 kZp (x)k2 dx = O(1) p ln(T ). T

t X

=es

s X

φt−j j +

j=s+1

j=−∞

jX 1 −1

s X

φs−j φt−j 2j +

φs−j2 φt−j1 j1 j2 .

j1 =−∞ j2 =−∞

Meanwhile, we only need to tackle the terms in A13T where t − s > mT with mT satisfying m4T /T → 0 and mT → ∞ as T → ∞. Because for the rest we are able to control them as small as we wish. Moreover, since the probability of xt = xs is zero, we exclude these regions in the following calculation of expectations. Normally, the exclusion does not make any difference but due to Lemma A.4 this time it really matters. Thus, it follows from (3) and (4) of Lemma A.2 that |E[A13T ]| ≤

d2T T2 +

+

t−m XT

T X

t X

|φt−j ||E[Zpτ (xt )Zp (xs )es j ]|

t=mT +1 s=1 j=s+1

d2T T2 d2T T2

s X

|φs−j φt−j ||E[Zpτ (xt )Zp (xs )2j ]|

t=mT +1 s=1 j=−∞ t−m XT

T X

s X

jX 1 −1

|φs−j2 φt−j1 ||E[Zpτ (xt )Zp (xs )j1 j2 ]|

t=mT +1 s=1 j1 =−∞ j2 =−∞

1 ≤O(1) T

T X

t−m XT

t X

−1 d−2 ts ds

t=mT +1 s=1

1 + O(1) T 1 + O(1) T ≤o(1)

t−m XT

T X

T X

t−m XT

t−m XT

x6=y

ZZ

s X

−1 d−1 ts ds

T X

|Zpτ (x)Zp (y)|dxdy

|φs−j φt−j | x6=y

j=−∞

t=mT +1 s=1

1 1√ T + o(1) T T

s X

−1 d−1 ts ds

|Zpτ (x)Zp (y)|dxdy

|φt−j |

j=s+1

t=mT +1 s=1 T X

ZZ

jX 1 −1

ZZ x6=y

j1 =−∞ j2 =−∞ t−m XT

|Zpτ (x)Zp (y)|dxdy

|φs−j2 φt−j1 |

  −1 −1 −1/2 , d−1 d (t − s) = o T ts s

t=mT +1 s=1

where we have used Assumption 1 that

P

j

j|φj | < ∞ to derive

P

j≤s |φt−j |

= O(1)(t − s)−1 , and

Lemma A.4 has used to have the double integral being o(1) by virtue of the Dirac function. These calculations give that A1T = OP (1)T −1/4 p1/2 . Next, consider the term A2T . Note that Z T T dT X dT X −1 EkA2T k ≤ EkZp (xt )γp (xt )k ≤ O(1) dt kZp (x)γp (x)kdx T T t=1 t=1 Z 1/2 Z 2 2 kZp (x)k dx |γp (x)| dx = O(1)p1/2 kγp (x)kL2 ≤O(1) b The assertion for θb follows, in view of Lemma A.3. Precisely, kθ−θk = OP (1)p1/2 max(T −1/4 , kγp (x)kL2 ). For the second part, by the result for θb − θ, sup |fb(x) − f (x)| ≤ sup |Zpτ (x)(θb − θ)| + sup |γp (x)| x

x

x

≤ sup kZp (x)kkθb − θk + sup |γp (x)| = OP (1)p max(T −1/4 , kγp (x)kL2 ) + sup |γp (x)| = oP (1). x

x

x

16

Proof of Theorem 3.1. This proof includes two parts, Part One and Part Two, to show (3.1) and (3.2), respectively. Part One. Notice by Theorem 3.2 that fb(x) − f (x) =Zpτ (x)(θb − θ) − γp (x) = Zpτ (x)(Z τ Z)−1 Z τ (γ + e) − γp (x) dT −1 L (1, 0)Zpτ (x)Z τ (γ + e)(1 + oP (1)) − γp (x) T B dT −1 dT L (1, 0)Zpτ (x)Z τ γ − γp (x) (1, 0)Zpτ (x)Z 0 e + = L−1 T B T B n X dT −1 dT −1 = LB (1, 0) L (1, 0)Zpτ (x)Z τ γ − γp (x), Zpτ (x)Zp (xt )et + T T B =

t=1

where Zpτ (x)Zp (xt ) =

Pp−1

Fi (x)Fi (xt ), and we have replaced “(1 + oP (1))” by 1.

i=0

It follows that T

X T T LB (1, 0)[fb(x) − f (x)] = Zpτ (x)Zp (xt )et + Zpτ (x)Z τ γ − LB (1, 0)γp (x) dT dT

(B.1)

t=1

:=A1T + A2T − A3T ,

say.

Moreover, choose mT → ∞ and pm√TT → 0. Similar to the evaluation of A001T below, we may show P PmT −1 τ PT τ Zp (x)Zp (xt )et + Tt=mT Zpτ (x)Zp (xt )et = (1 + oP (1)) · that A1T = t=1 t=1 Zp (x)Zp (xt )et = PT PT τ τ ∗ t=mT Zp (x)Zp (xt )et for convenience. t=mT Zp (x)Zp (xt )et . Denote A1T = hP  i1/2 T τ (x)Z (x ) 2 Let BT = BT (x; xmT , · · · , xT ) = Z . We then show that p t p t=mT BT−1 A1T →D N 0, σe2

and BT−1 AiT →P 0



(B.2)

for i = 2, 3, where σe2 = E[e21 ]. We start to prove the first part of (B.2). Towards this end, it suffices  to show that BT−1 A∗1T →D N 0, σe2 as T → ∞. As shown in Lemma A.5, for any s ≥ 1 and t ≥ mT , es and d−1 t xt are asymptotically independent. Therefore, the dominated convergence theorem implies that as T → ∞   −1 P (B −1 A∗1T < u) − Φ(u) ≤ E E I(B −1 A∗1T < u)|d−1 mT xmT , · · · , dT xT − Φ(u) → 0, T T

(B.3)

if, for any u ∈ R, −1 P (B −1 A∗1T < u|d−1 mT xmT , · · · , dT xT ) − Φ(u) →P 0, T

(B.4)

as T → ∞, where Φ(u) is the distributional function of a standard normal random variable. P Recall that et = ∞ j=0 φj t−j . Notice that A∗1T =

T X

Zpτ (x)Zp (xt )et =

t=mT

=

T X

T X

Zpτ (x)Zp (xt )

t=mT

Zpτ (x)Zp (xt )

t=mT

t X j=−∞

=A01T

φt−j j =

T X



+

T X

 j=−∞

t=j

A001T ,

φj t−j

j=0

  mX T T T −1 X X  = Zpτ (x)Zp (xt )φt−j  j + j=mT

∞ X

j=−∞

say.

17

 Zpτ (x)Zp (xt )φt−j  j

t=max(mT ,j) T X t=mT

! Zpτ (x)Zp (xt )φt−j

j

 Let DT := DT (x; xmT , · · · , xT ) = LB (1, 0)

PT

j=mT

P

T t=j

Zpτ (x)Zp (xt )φt−j

2 1/2

. We then have

T −1 b DT [f (x) − f (x)] =DT−1 A∗1T (1 + oP (1)) + DT−1 A2T − DT−1 A3T dT =DT−1 A01T + DT−1 A001T + DT−1 A2T − DT−1 A3T .

Denote DT−1 A01T =

−1 PT τ j=mT vjT j , in which vjT = DT t=j Zp (x)Zp (xt )φt−j . As shown in the DT2 = BT2 σe2 (1 + oP (1)). In order to prove (B.4), in view of Lemma A.5 that

PT

derivation of DT below,

j and d−1 t xt are asymptotically independent for any j ≥ mT and t ≥ mT , it suffices to show that for u ∈ R and as T → ∞, −1 →P 0. P (D−1 A01T < u|d−1 x , · · · , d x ) − Φ(u) m T m T T T T

(B.5)

We now employ Lemma 1 of Robinson (1997) to prove (B.5). Observe that the condition (2.2) of the P 2 = 1, and hence what we need to show is that the lemma is satisfied automatically due to Tj=1 vjT condition (2.3) is fulfilled, i.e., limT →∞ max1≤j≤T |vjT | = 0 in probability. To begin, note that  2 T T T X T X X X 2  DT2 = Zpτ (x)Zp (xt )φt−j  = Zpτ (x)Zp (xt )φt−j j=mT

t=j

j=mT t=j

T T t−1 X X X

+2

Zpτ (x)Zp (xt )φt−j Zpτ (x)Zp (xs )φs−j =: D1T + D2T .

j=mT t=j+1 s=j

The leading term of DT2 is the first term D1T . Indeed, T X T X

D1T =

2 Zpτ (x)Zp (xt )φt−j

=

=

[Zpτ (x)Zp (xt )]2

t−m XT

t=mT

=σe2 (1 where

Pt−mT j=0

[Zpτ (x)Zp (xt )]2

t=mT

j=mT t=j T X

T X

φ2t−j

j=mT

φ2j = σe2 (1 + o(1))

T X

[Zpτ (x)Zp (xt )]2

t=mT

j=0

+

t X

o(1))BT2

φ2j = σe2 (1 + o(1)) uniformly in t ≥ mT + mqT with any 0 < q < 1.

Notice further that using the densities of d−1 t xt in Lemma A.1, E[BT2 ]

=

T X

E[Zpτ (x)Zp (xt )]2

t=mT

=

T X

=

T Z X

[Zpτ (x)Zp (dt y)]2 ft (y)dy

t=mT

d−1 t

Z

[Zpτ (x)Zp (y)]2 ft (d−1 t y)dy

= φ(0)

t=mT T X

T X

d−1 t

Z

[Zpτ (x)Zp (y)]2 dy

t=mT

d−1 t

Z

[Zpτ (x)Zp (y)]2 [ft (d−1 t y) − ft (0) + ft (0) − φ(0)]dy

=C T Zpτ (x) √ =O(1) T p,

Z

√ Zp (y)Zpτ (y)dyZp (x)(1 + o(1)) = O(1) T kZp (x)k2

+

t=mT



where we have used the orthogonality of the basis to derive

R

Zp (y)Zpτ (y)dy = Ip , and Lipschitz

condition for ft (·) and the uniform approximation of supx |ft (x) − φ(x)| → 0 in Lemma A.1.

18

−1 For D2T , using the densities of d−1 ts xts and ds xs in Lemma A.1, we have T t−1 s−1 X X X τ τ E|D2T | = E Zp (x)Zp (xt )Zp (x)Zp (xs ) φt−s+j φj t=mT +1 s=mT j=0 T X



t−1 X

E|Zpτ (x)Zp (xt )Zpτ (x)Zp (xs )|

t=mT +1 s=mT T X

≤O(1)

s−1 X

|φt−s+j φj |

j=0 t−1 X

−1 d−1 ts ds

ZZ

|Zpτ (x)Zp (y)Zpτ (x)Zp (z)|dydz

t=mT +1 s=mT T X

≤O(1)

t−1 X

s−1 X

|φt−s+j φj |

j=0 −3/2 −1/2

(t − s)

Z

s

|Zpτ (x)Zp (y)|dy

2

t=mT +1 s=mT



Z

|Zpτ (x)Zp (y)|dy

=O(1) T where we have used P of k k|φk | < ∞.

R

|Zpτ (x)Zp (y)|dy →

R

2

√ = O(1) T ,

δ(x−y)dy = 1 as p → ∞ by Lemma A.4 and the convergence

 Thus, DT2 = OP p T 1/2 . In order to prove lim T →∞ maxmT ≤j≤T |vjT | = 0 in probability, it P therefore suffices to show that p−1/2 T −1/4 maxmT ≤j≤T Tt=j Zpτ (x)Zp (xt )φt−j = oP (1). In fact, T T X X −1/2 −1/4 τ −1/2 −1/4 p T max Zp (x)Zp (xt )φt−j ≤ p T max kZp (x)kkZp (xt )k|φt−j | mT ≤j≤T mT ≤j≤T t=j t=j =O(1)p1/2 T −1/4 max

T X

mT ≤j≤T

|φt−j | ≤ O(1)p1/2 T −1/4

t=j

∞ X

|φj | = o(1)

j=0

almost surely by Assumption 3. Therefore, (B.5) holds. In what follows, we will prove that p−1/2 T −1/4 A001T = oP (1), p−1/2 T −1/4 A2T = oP (1) and p−1/2 T −1/4 A3T =  oP (1), in view of DT = OP p1/2 T 1/4 . Indeed, by Lemma A.2, 

mX T −1

T X

j=−∞

t=mT

p−1 T −1/2 E[A001T ]2 = T −1/2 p−1 E 

=T

−1/2 −1

p

E

" mX T −1 j=−∞

+ 2T

−1/2 −1

p

=T −1/2 p−1 E

E

T X

! #2 Zpτ (x)Zp (xt )φt−j

j

t=mT

j−1 mX T −1 X

T X

j=−∞ i=−∞

t=mT

mX T −1

! 2 Zpτ (x)Zp (xt )φt−j j 

T X

! Zpτ (x)Zp (xt )φt−j

j

T X

! Zpτ (x)Zp (xt )φt−i

t=mT

[Zpτ (x)Zp (xt )]2 φ2t−j 2j

j=−∞ t=mT

+ 4T −1/2 p−1 E

mX T −1

T X

t−1 X

Zpτ (x)Zp (xt )φt−j Zpτ (x)Zp (xs )φs−j 2j

j=−∞ t=mT +1 s=mT

+ 2T −1/2 p−1 E

j−1 mX T −1 X

T X

[Zpτ (x)Zp (xt )]2 φt−j φt−i j i

j=−∞ i=−∞ t=mT

+ 4T −1/2 p−1 E

j−1 mX T −1 X

T X

t−1 X

Zpτ (x)Zp (xt )φt−j Zpτ (x)Zp (xs )φs−i j i

j=−∞ i=−∞ t=mT +1 s=mT

19

i

≤T

−1/2 −1

p

Z mX T T −1 X 1 √ [Zpτ (x)Zp (y)]2 dy φ2t−j t t=m j=−∞ T

+ 4T

T X

−1/2 −1

p

t−1 X

t=mT +1 s=mT

+ 2T

−1/2 −1

p

1 1 √ √ t−s s

t−1 X

t=mT +1 s=mT T X t=mT

+ O(1)T −1/2 p

1 √ t

∞ X

|Zpτ (x)Zp (z)|dz

mX T −1

|φs−j φt−j |

j=−∞

t

+ O(1)T −1/2 p11/6

j

Z

|Zpτ (x)Zp (z)|dz

j−1 X

j−1 mX T −1 X

|φt−j φs−i |

j=−∞ i=−∞ T X

t−1 X

t=mT +1 s=mT



1 1 √ t−s s

∞ X

|φj φt−s+j |

j=s

|φt−j φt−i |

j=−∞ i=−∞

T X t−1 X

p + O(1)T P

|Zpτ (x)Zp (y)|dy

j=t

t=2 s=1

where we have used

Z

1 1 √ t−s s

φ2j + O(1)T −1/2 p11/6

mT T X 1 X t=1

≤ O(1)T

Z

j=−∞ i=−∞

T X

+ 4T −1/2 p−1

−1/2

|Zpτ (x)Zp (y)|dy

Z j−1 mX T T −1 X X 1 [Zpτ (x)Zp (y)]2 dy |φt−j φt−i | t t=m T

≤O(1)T −1/2 p

Z

∞ ∞ 1 1 X X √ |φt+j φs+i | t−s s j=mT i=j+1

−1/2 11/6

p

= o(1),

j|φj | < ∞ and the assumption for p in Assumption 3.

Moreover, we obtain T X T −1/4 p−1/2 E|A2T | = T −1/4 p−1/2 E Zpτ (x)Zp (xt )γp (xt ) t=mT

≤O(1)T

−1/4

T X

EkZp (xt )k|γp (xt )| ≤ O(1)T

−1/4

t=mT

≤O(1)T 1/4

Z

kZp (y)k2 dy

Z T X 1 √ kZp (y)k|γp (y)|dy t t=m T

Z

|γp (y)|2 dy

1/2

= o(1)T 1/4 (pp−r )1/2

=o(1)T 1/4−(r−1)α/2 = o(1) again by Assumption 3. Finally, for A3T , note that T 1/4 p−1/2 |γp (x)| = o(1)T 1/4 p−1/2 p−(r−1)/2−1/12 = o(1)T 1/4−rα/2−1/12 = o(1) due to Lemma A.3. The proof of (3.1) is completed. Part Two. To show the limit in (3.2), it suffices to show the limit for kfb(x) − f (x)k2L2 due to the continuous mapping theorem. Notice that Z Z 2 2 b b kf (x) − f (x)kL2 = (f (x) − f (x)) dx = [Zpτ (x)(θb − θ) − γp (x)]2 dx Z Z  τ Z τ 2 2 b b = [Zp (x)(θ − θ)] dx + [γp (x)] dx − 2 θ − θ Zp (x)γp (x)dx =kθb − θk2 + kγp (x)k2L2 , because of the orthogonality of the Hermite function sequence. It follows from Taylor expansion and Theorem 3.2 that kθb − θk2 =k(Z τ Z)−1 Z τ (e + γ)k2 = (e + γ)τ Z(Z τ Z)−2 Z τ (e + γ)

20

 −2 d2T dT τ τ (e + γ) Z L (1, 0)I + Z τ (e + γ) Z Z − L (1, 0)I p p B B T2 T   d2T dT τ −2 τ = 2 (e + γ) Z LB (1, 0)Ip + Op (1)k Z Z − LB (1, 0)Ip k Z τ (e + γ) T T 2 dT τ τ =L−2 B (1, 0) 2 (e + γ) ZZ (e + γ)(1 + oP (1)) T d2T τ [e ZZ τ e + γ τ ZZ τ γ + 2eτ ZZ τ γ]. =L−2 (1, 0) B T2 =

Rescaling gives T

T dT τ [e ZZ τ e + γ τ ZZ τ γ + 2eτ ZZ τ γ] + kγp (x)k2L2 . dT p T p dT p PT Pt−1 τ PT 2 2 Noticing eτ ZZ τ e = s=1 Zp (xt )Zp (xs )et es , using exactly the same t=2 t=1 kZp (xt )k et + 2 kfb(x) − f (x)k2L2 =L−2 B (1, 0)

way as L1n proved in Theorem 3.1 of Dong and Gao (2014), we have T

T

t−1

dT X dT X X τ kZp (xt )k2 e2t →D σe2 LB (1, 0) and Zp (xt )Zp (xs )et es = oP (1) T p T p t=1

t=2 s=1

as T → ∞. To fulfill the proof of (3.2), by virtue of Cauchy-Schwarz inequality we only need to show that T dT τ γ ZZ τ γ = oP (1) and kγp (x)k2L2 = o(1). T p dT p In fact, by (2) of Lemma A.3,

T 2 dT p kγp (x)k

= o(1)T 1/2 p−1 p−r = o(1)T 1/2−(1+r)α = o(1) due to

Assumption 3. Moreover, using the densities in Lemma A.1, we have T T t−1 dT X X dT X dT E[kZp (xt )k2 γp2 (xt )] + 2 E[Zpτ (xt )Zp (xs )γp (xt )γp (xs )] E[γ τ ZZ τ γ] = T p T p T p t=1 t=2 s=1 Z T dT X 1 √ ≤O(1) kZp (x)k2 γp2 (x)dx T p t t=1 ZZ T t−1 dT X X 1 1 √ √ + O(1) |Zpτ (x)Zp (y)γp (x)γp (y)|dxdy T p s t − s t=2 s=1 Z 2 Z 2 1/2 −1 ≤O(1) γp (x)dx + O(1)T p kZp (x)k|γp (x)|dx Z Z 1/2 −1 2 ≤o(1) + O(1)T p kZp (x)k dx |γp (x)|2 dx

=O(1)T 1/2 p−r = O(1)T 1/2−rα = o(1) in view of Lemma A.3 and again Cauchy-Schwarz inequality. Proof of Corollary 3.1. We first show that σ be2 →P σe2 as T → ∞. Note that σ be2

T T T T 1X 1X 2 1X 1X 2 2 b b = (et + f (xt ) − f (xt )) = et + (f (xt ) − f (xt )) + 2 et (f (xt ) − fb(xt )). T n T T t=1

t=1

To begin with, we shall show that

1 T

t=1

PT

2 t=1 et

t=1

→P σe2 . Recall that σe2 = Ee2t =

independence of {i }, we obtain !2 T T T t−1 1 XX 2 1X 2 1 X 2 2 2 2 E et − σe = 2 E(et − σe ) + 2 2 E (et − σe2 )(e2s − σe2 ) T T T t=1

t=1

t=2 s=1

21

P∞

2 j=0 φj .

By the

T

t−1

1 XX 2 1 (et − σe2 )(e2s − σe2 ) = V ar(e21 ) + 2 2 E T T t=2 s=1

1 =2 2 E T

T X t−1 X

s X

φ2t−j (2j

− 1)

t=2 s=1 j=−∞

1 + 2 2E T

T X t−1 X

s X

φ2s−` (2` − 1)

`=−∞

s X

s X

φt−j φt−j1 j j1

t=2 s=1 j=−∞ j1 =−∞,6=j

s X

s X

T t−1 s T t−1 s−1 1 XX X 2 2 1 XX X 2 2 =2 2 φt−j φs−j E(j − 1) + 4 2 T T t=2 s=1 j=−∞ T X t−1 X

1 ≤O(1) 2 T

φs−` φs−`1 ` `1

`=−∞ `1 =−∞,6=` s X

φt−j φt−j1 φs−j φs−j1 E2j 2j1

t=2 s=1 j=−∞ j1 =j+1

s X

φ2t−j

t=2 s=1 j=−∞

1 + O(1) 2 T

T X t−1 X s−1 X

s X

|φt−j φt−j1 |

t=2 s=1 j=−∞ j1 =j+1

T t−1 1 XX 1 =O(1) 2 (t − s)−2 = O(1) 2 T = o(1), T T t=2 s=1

by virtue of

P

j

j|φj | < ∞.

Moreover, by Lemma 3.1 supx |f (x) − fb(x)| = oP (1), implying |f (xt ) − fb(xt )| = oP (1) uniformly in t. Thus, the second term is oP (1) and so is the third one. The other two assertions are trivially valid in view of Theorem 3.1.

Appendix C: Proofs of Lemmas A.1–A.5 The related notation is rephrased. Without loss of generality, in what follows let x0 = 0 almost surely. It follows that xt =

t X `=1

v` =

t ` X X

ψ`−i i =

`=1 i=−∞

t X





t X

ψ`−i  i =:

 i=−∞

t X

bt,i i .

i=−∞

`=max(1,i)

Let j ≤ t be fixed. Thus we have xt = bt,j j + xt/j ,

with xt/j :=

t X

bt,i i ,

i=−∞,6=j

where xt/j is the variable deducting the term containing j in xt . Obviously, xt/j and j are mutually independent. Additionally, letting 1 ≤ s < j ≤ t, xt also has the following decomposition: xt =x∗s + xts = x∗s + bt,j j + xts/j , P P where x∗s = xs + x ¯s with x ¯s = ti=s+1 sa=−∞ ψi−a a containing all information available up to s and P P xts = ti=s+1 bt,i i , while obviously xts/j = ti=s+1,6=j bt,i i . Evidently, xts captures all information containing in xt on the time periods (s, t], while xts/j captures all information containing in xt on the time periods (s, j) ∪ (j, t]. Let dts := (Ex2ts )1/2 for late use. Moreover, x ¯s = OP (1) by virtue of Assumption 1. Lemma A.1. Suppose that Assumption 1 holds. For t or t − s is large,

22

(1) d−1 t xt have uniformly bounded densities ft (x) over all t and x satisfying a uniform Lipschitz condition supx |ft (x + y) − ft (x)| ≤ C|y| for any y and some constant C > 0. In addition, supx |ft (x) − φ(x)| → 0 as t → ∞ where φ(x) is the standard normal density function. Let 1 ≤ s < t. d−1 ts xts have uniformly bounded densities fts (x) over all (t, s) and x satisfying the above uniform Lipschitz condition as well. (2) Let j ≤ t. d−1 t xt/j have uniformly bounded densities ft/j (x) over all (t, j) and x satisfying uniform Lipschitz condition in the above form. Let 1 ≤ s < j ≤ t. d−1 ts xts/j have uniformly bounded densities fts/j (x) over all (t, j) and (s, x) satisfying the above uniform Lipschitz condition as well. Proof of Lemma A.1: We shall prove the assertion about d−1 t xt only. All the other claims follow in the same fashion. R

|λϕ(λ)|dλ < ∞. Let

x− t ,

where x+ t includes all

Denote by ϕ(λ) the characteristic function of 0 . Under Assumption 1, Φt (α) be the characteristic function of

d−1 t xt

for α ∈ R. Denote xt =

x+ t

+

j with j > 0 in xt , while x− t includes all j with j ≤ 0 in xt . It follows that Z Z Z −1 + |α||Φt (α)|dα = |α||E exp(iαdt xt )|dα ≤ |α||E exp(iαd−1 t xt )|dα    Y Z Z t t X   −1 −1     = |α| E exp i αdt bt,j j E exp iαdt bt,j j dα dα = |α| j=1 j=1 Z t Y ϕ(αd−1 bt,j ) dα. = |α| t j=1

It is clear that there exists a δ0 > 0 such that |ϕ(λ)| < e−|λ|

2 /4

whenever |λ| ≤ δ0 and |ϕ(λ)| < η

if |λ| > δ0 for some 0 < η < 1 (Wang and Phillips, 2009a, p. 730). Note also that bt,j = ψ0 + · · · + ψt−j . If t − j is large, bt,j = ψ(1 + o(1)) where ψ =

P

j

ψj 6= 0.

Let ν = νt be a function of t such that ν → ∞ and ν/t → 0 as t → ∞. Thus, for 1 ≤ j ≤ t − ν, there exist constants c1 , c2 such that 0 < c1 < c2 < ∞ and c1 < |bt,j | < c2 . Indeed, we may take c1 = |ψ|/2 and c2 = 3|ψ|/2. Therefore, letting δ = δ0 /c2 , Z

Z t t−ν Y Y −1 ϕ(αd−1 bt,j ) dα |α| ϕ(αdt bt,j ) dα ≤ |α| t j=1

j=1

Z

!

Z

=

|α|

+ |α|≤dt δ

Z ≤

|α|e

|α|>dt δ −α2 d−2 t

t−ν Y

ϕ(αd−1 bt,j ) dα t

j=1

Pt−ν

2 j=1 bt,j /4

dα + η t−ν−1

|α|≤dt δ

|α|>dt δ

Z ≤

|α|e

−α2 c

1 (1−ν/t)/4

|α|≤dt δ

Z ≤

−α2 c1 /4

|α|e

Z

dα +

dα +

2 t−ν−1 b−2 t,1 dt η

2 t−ν−1 b−2 t,1 dt η

 |α| ϕ αd−1 t bt,1 dα

Z |α| |ϕ(α)| dα |α|>δ

Z |α| |ϕ (α)| dα < ∞,

where we have used the fact that d2t η t−ν−1 → 0 and bt,1 → ψ 6= 0 as t → ∞. The integrability of |Φt (α)| implies the uniform boundedness of the densities ft (x) due to the inverse formula. Similarly, the integrability of |α||Φt (α)| gives the uniform boundedness of the derivative of

23

ft (x). As a matter of fact, we have Z Z d 1 −iαx −iαx ft (x) = 1 d e Φ (α)dα (−iα)e Φ (α)dα = t t 2π dx 2π dx Z 1 ≤ |α||Φt (α)|dα ≤ C. 2π It follows immediately from the mean value theorem that supx |ft (x + y) − ft (x)| ≤ C|y|. The normality approximation can be found in literature, for example, equation (5.11) of Wang and Phillips (2009a, p. 729).  Lemma A.2. Suppose that Assumption 1 holds. Let j be a fixed integer and j ≤ t. For any functions R U and g : R 7→ R such that |U (w)|dw < ∞ and E|j g(j )| < ∞, and for large t or t − s, we have (1) E[U (xt )g(j )] = E[U (xt/j )]E[g(j )] + cU d−2 t where cU is such that |cU | ≤ O(1)E|j g(j )| R × |U (w)|dw. In particular, if Eg(j ) = 0, then E[U (xt )g(j )] = cU d−2 t ; R (2) E|U (xt )g(j )| ≤ O(1)d−1 t E|g(j )| |U (w)|dw; (3) For any `: j 6= ` ≤ t, E[U (xt )g(j )|` ] = E[U (xt/j )|` ]E[g(j )] + d−2 t η` where η` is a random R variable depending on ` such that |η` | ≤ O(1) |U (w)|dw almost surely. If E[g(j )] = 0, then E[U (xt )g(j )|` ] = d−2 t η` . R Meanwhile, E[|U (xt )g(j )||` ] ≤ O(1)d−1 t E|g(j )| |U (w)|dw almost surely. (4) For 1 ≤ s < j ≤ t, E[U (xt )g(j )|Fs ] = E[g(j )]E[U (xt/j )|Fs ] + d−2 ts ξs where |ξs | ≤ O(1)E|j g(j )| R R |U (x)|dx almost surely; meanwhile, E[|U (xt )g(j )||Fs ] ≤ O(1)d−1 ts E[|g(j )|] |U (w)|dw a.s. Proof of Lemma A.2: (1) Let f (·) be the density of 0 . Recalling that xt = bt,j j + xt/j with bt,j = d−1 t xt/j

Pt

i=1∨j

has a uniformly bounded density ft/j (x) satisfying Lipschitz condition, we have E[U (xt )g(j )] = E[U (bt,j j + xt/j )g(j )] ZZ = U (bt,j v + dt w)g(v)f (v)ft/j (w)dvdw   ZZ w − bt,j v −1 dvdw =dt U (w)g(v)f (v)ft/j dt   ZZ w =d−1 U (w)g(v)f (v)ft/j dvdw t dt      ZZ w − bt,j v w −1 + dt U (w)g(v)f (v) ft/j − ft/j dvdw dt dt   Z Z w =d−1 g(v)f (v)dv U (w)ft/j dw + d−2 t t cU dt Z −2 =E[g(j )] U (dt w)ft/j (w)dw + d−2 t cU = E[U (xt/j )]E[g(j )] + dt cU ,

where cU := dt

RR

h    i w−bt,j v w U (w)g(v)h (v) ft/j − f dvdw satisfies t/j dt dt Z

|cU | ≤dt

Z |g(v)|f (v)

    w − bt,j v w |U (w)| ft/j − ft/j dwdv dt dt

24

ψi−j = O(1),

Z ≤O(1)

Z |g(v)|f (v)

Z |U (w)||bt,j v|dwdv = O(1)E|j g(j )|

|U (w)|dw,

using Lipschitz condition for ft/j in Lemma A.1. Clearly, E[U (xt )f (j )] = d−2 t cU , if Ef (j ) = 0. (2) It follows that E|U (xt )g(j )| =E|U (bt,j j + xt/j )g(j )| ZZ = |U (bt,j v + dt w)g(v)|f (v)ft/j (w)dvdw   ZZ w − bt,j v −1 dvdw =dt |U (w)g(v)|f (v)ft/j dt ZZ ≤O(1)d−1 |U (w)g(v)|f (v)dvdw t Z Z −1 =O(1)dt |g(v)|f (v)dv |U (w)|dw Z −1 =O(1)dt E|g(j )| |U (w)|dw. (3) Because of similarity we only consider here ` > j > 0. In this case, we have the following decomposition, xt =bt,j j + bt,` ` + xt/j` , where xt/j` includes all terms in xt except those terms involving ` and j . Moreover, Ex2t/j` = Ex2t −b2t,j −b2t,` = O(1)t and, similar to Lemma A.1, we may show that d−1 t xt/j` has density ft/j` (x) and ft/j` (x) satisfies Lipschtiz condition uniformly on R. Recalling that j has density f (v), E[U (xt )g(j )|` ] = E[U (bt,j j + bt,` ` + xt/j` )g(j )|` ] ZZ = U (bt,j v + bt,` ` + dt w)g(v)f (v)ft/j` (w)dvdw   ZZ w − bt,j v − bt,` ` −1 =dt U (w)g(v)f (v)ft/j` dvdw dt   ZZ w − bt,` ` =d−1 U (w)g(v)f (v)ft/j` dvdw t dt      ZZ w − bt,j v − bt,` ` w − bt,` ` −1 + dt U (w)g(v)f (v) ft/j` − ft/j` dvdw dt dt −2 =E[U (xt/j` + bt,` ` )|` ]E[g(j )] + d−2 t η` = E[U (xt/j )|` ]E[g(j )] + dt η`

and using Lipschitz condition Z Z Z Z |η` | ≤O(1) |g(v)|f (v) |U (w)||bt,j v|dwdv = O(1) |vg(v)|f (v)dv |U (w)|dw. When E[g(j )] = 0 we shall have E[U (xt )g(j )|` ] = d−2 t η` . Additionally, ZZ E[|U (xt )g(j )||` ] = |U (bt,j v + bt,` ` + dt w)g(v)|f (v)ft/j` (w)dvdw   ZZ w − bt,j v − bt,` ` −1 =dt |U (w)g(v)f (v)|ft/j` dvdw dt ZZ Z −1 −1 ≤O(1)dt |U (w)g(v)f (v)|dvdw = O(1)dt E|g(j )| |U (w)|dw,

25

almost surely. (4) Recalling that xt = x∗s + bt,j j + xts/j and d−1 ts xts/j has a uniformly bounded density fts/j (x) satisfying uniform Lipschitz condition, E[U (xt )g(j )|Fs ] = E[U (x∗s + bt,j j + xts/j )g(j )|Fs ] ZZ = U (x∗s + bt,j v + dts x)g(v)f (v)fts/j (x)dxdv   ZZ x − bt,j v − x∗s −1 dvdx =dts U (x)g(v)f (v)fts/j dts   ZZ x − x∗s −1 =dts U (x)g(v)f (v)fts/j dvdx dts      ZZ x − bt,j v − x∗s x − x∗s −1 − fts/j dvdx + dts U (x)g(v)f (v) fts/j dts dts   Z Z x − x∗s −1 =dts g(v)f (v)dv U (x)fts/j dvdx + d−2 ts ξs dts =E[g(j )]E[U (xts/j + x∗s )|Fs ] + d−2 ts ξs , where ξs = dts

RR

h    i x−bt,j v−x∗s x−x∗s U (x)g(v)f (v) fts/j − f dvdx, and using Lipschitz condits/j dts dts

tion, ZZ |ξs | ≤C

Z |U (x)g(v)|f (v)|bt,j v|dvdx = O(1)E|j g(j )|

|U (x)|dx

a.s.

Consequently, when E[g(j )] = 0, E[U (xt )g(j )|Fs ] = d−2 ts ξs . Meanwhile, ZZ E[|U (xt )g(j )||Fs ] = |U (x∗s + bt,j v + dts x)g(v)|f (v)fts/j (x)dxdv   ZZ x − bt,j v − x∗s −1 =dts |U (x)g(v)|f (v)fts/j dvdx dts ZZ Z −1 ≤O(1)d−1 |U (x)g(v)|g (v)dxdv = O(1)d E|g( )| |U (x)|dx.  j ts ts  Lemma A.3. (1) (i) kZp (x)k2 = O(1)p; (ii) R 2 2 x Fi (x)dx = (2i + 1)/2.

R

kZp (x)k2 dx = p; (iii)

R

kZp (x)kdx = O(1)p11/12 ; (iv)

 R (2) Let Assumption 2 hold. Then, (i) supx |γp (x)| = o p−(r−1)/2−1/12 ; (ii) γp2 (x)dx = o(1)p−r . Proof of Lemma A.3: R (1) The assertions of (i) and (ii) follows trivially, since Fi (x) are uniformly bounded and Fi2 (x)dx = p 2 (x) − 1. To prove (iii), from Christoffel-Darboux formula, kZp (x)k2 = pFp−1 (p − 1)pFp−2 (x)Fp (x), which implies kZp (x)k ≤



p|Fp−1 (x)| +

q p 4 (p − 1)p |Fp−2 (x)Fp (x)|.

Meanwhile, by Askey and Wainger (1965, p. 700) there exist two positive constants c1 and c2 such that |Fi (x)| ≤ c1 (|N − x2 | + N 1/3 )−1/4 whenever x2 < N = 2i + 1, otherwise |Fi (x)| < c1 exp(−c2 x2 ). R Straightforward calculation yields |Fi (x)|dx = O(1)i5/12 that, along with the above inequality, implies the assertion.

26

The assertion of (iv) holds because of the recursion relation for Hermite functions: 1 xF0 (x) = √ F1 (x), 2

√ 1 √ xFi (x) = √ ( iFi−1 (x) + i + 1Fi+1 (x)), 2

and the orthogonality of the Hermite functions. (2) We calculate the coefficient θi in the orthogonal expansion (2.3). Let φ(x) = exp(−x2 ) and √ b2i = π2i i!. For i large, integration by parts gives Z Z 1 2 g(x)Hi (x)e−x /2 dx θi (g) = g(x)Fi (x)dx = bi Z Z 1 2 i (i) x2 /2 i1 =(−1) g(x)φ (x)e dx = (−1) g(x)ex /2 dφ(i−1) (x) bi bi Z 1 2 2 x /2 (i−1) ∞ i1 i φ (x)|−∞ − (−1) =(−1) g(x)e φ(i−1) (x)[g(x)ex /2 ]0 dx bi bi Z 1 2 =(−1)i−1 φ(i−1) (x)[g(x)ex /2 ]0 dx bi Z Z bi−1 1 2 2 x2 /2 0 [g(x)e ] Hi−1 (x)φ(x)dx = [g(x)ex /2 ]0 e−x /2 Fi−1 (x)dx = bi bi 1 = √ θi−1 (e g1 ). 2i where we define gem = [gφ−1/2 ](m) φ1/2 for positive integer m for notational convenience. We obtain by repeatedly use of the above derivation that  1/2 X X ∞ X ∞ ∞ −r/2 |γp (x)| = θi Fi (x) = O(1) i θi−r (e gr )Fi (x) ≤ o(1)  i−r Fi2 (x) i=p i=p i=p  1/2 ∞ X  =o(1) i−r−1/6  = o(1)p−(r−1)/2−1/12 , i=p

hence, |γp (x)| = o(1)p−(r−1)/2−1/12 uniformly in x. In addition, by the orthogonality, Z

γp2 (x)dx =

∞ X

θi2 = O(1)

i=p

∞ X

2 i−r θi−r (e gr ) ≤ O(1)p−r

i=p

∞ X

2 θi−r (e gr ) = o(1)p−r .

i=p

 Lemma A.4. Zpτ (x)Zp (y) → δ(x − y) as p → ∞ where δ(u) is the Dirac delta function. Proof of Lemma A.4: For any smooth f (x) ∈ L2 (R), we have Z lim

p→∞

= lim

p→∞

f (x)[Zpτ (x)Zp (y)]dx = lim

Z

p→∞

p−1 Z X

f (x)Fi (x)dx Fi (y) = lim

p→∞

i=0

f (x)

p−1 X

Fi (x)Fi (y)dx

i=0 p−1 X

θi Fi (y) = f (y)

i=0

due to smoothness of f (x). Hence, Zpτ (x)Zp (y) is a delta-convergent sequence defined in Kanwal (1983, p. 14). Then, the assertion holds by the definition of the delta-convergent sequence.

27



Lemma A.5. Let mT be a sequence such that mT → ∞ and mT /T → 0 as T → ∞. Let {aj } be any P sequence of nonnegative real numbers satisfying Ti=mT ai = 1. (1) For any s ≥ 1 and t ≥ mT , es and d−1 t xt are asymptotically independent. Consequently, for any given k and t ≥ mT , k and d−1 t xt are asymptotically independent. PT (2) For any s ≥ 1, es and j=mT aj d−1 j xj are asymptotically independent. Proof of Lemma A.5: (1) Since es is a stationary process, its density and characteristic function are independent of s. Denote by ρ(u) the density of es and by κt (x, u) the joint density of (d−1 t xt , es ). Let Ψt (α, λ), −1 Φt (α) and Γ(λ) be the characteristic functions of (d−1 t xt , es ), dt xt and es , respectively. Pt Recall that xt = j=−∞ bt,j j . Here, bt,j = ψ0 +· · · +ψt−j for j ≥ 1 and bt,j = ψ1−j +· · · +ψt−j for P j ≤ 0. Thus, the sequence {|bt,j |}j is uniformly bounded by the summability of b0 := ∞ j=0 |ψj | < ∞. νt → 0 as t → ∞. Note that Let νt be a positive sequence chosen such that νt → ∞ and √ t Ps P∞ es = j=−∞ φs−j j and j=0 |φj | < ∞. As a result, φνt = o(1) for large t and without loss of

generality assume that |φj | ≤ |φνt | for all j ≥ νt since φj → 0 as j → ∞. 2 0 For any given  > 0, denote R ≡ {(α, λ) : d−1 t b0 |α| + |φνt λ| < } a region in R and R  its

complement. From the inverse formula we have κt (x, u) − ft (x)ρ(u) ZZ 1 = e−i(αx+λu) [Ψt (α, λ) − Φt (α)Γ(λ)]dαdλ (2π)2 ZZ 1 = e−i(αx+λu) [Ψt (α, λ) − Φt (α)Γ(λ)]dαdλ (2π)2 R ZZ 1 + e−i(αx+λu) [Ψt (α, λ) − Φt (α)Γ(λ)]dαdλ := T1 + T2 , (2π)2 R0 where ft (x) is the density of d−1 t xt . R It has been shown that |α||Φt (α)|dα < ∞ in Lemma A.1 and similarly we may show that R R k(α, λ)k|Ψt (α, λ)|dαdλ < ∞ and |λ||Γ(λ)|dλ < ∞. By virtue of these, it is easily seen that T2 = o(1). In fact, ZZ 1 |Ψt (α, λ) − Φt (α)Γ(λ)|dαdλ |T2 | ≤ (2π)2 R0 ZZ 1 ≤ (d−1 |α| + |φνt λ|)|Ψt (α, λ) − Φt (α)Γ(λ)|dαdλ (2π)2  R0 t ≤C1 d−1 t + C2 |φνt | = o(1). We then deal with T1 in the sequel. For the sake of exposition, suppose that t ≥ s and it can be seen that the case t < s is similar and easier. To see this, note that for t < s, all j , j = t + 1, · · · , s are not included in xt so that {j : j = t + 1, · · · , s} are independent of xt , while when t ≥ s, all the information in es is contained in xt . Observe that Ψt (α, λ) =E exp[i(αd−1 t xt + λes )]

28

 =E exp i

t X

αd−1 t bt,j j + i

j=s+1 t Y

=

s X

  (αd−1 t bt,j + λφs−j )j

j=−∞

ϕ(αdt−1 bt,j )

j=s+1

s Y

ϕ(αd−1 t bt,j + λφs−j ),

j=−∞

where ϕ(·) is the characteristic function of 1 . Meanwhile,   t t X Y −1   Φt (α) =E exp(iαd−1 x ) = E exp i αd b  = ϕ(αd−1 t t,j j t t t bt,j ), j=−∞

j=−∞

and  Γ(λ) =E exp(iλes ) = E exp i



s X

s Y

λφs−j j  =

j=−∞

ϕ(λφs−j ).

j=−∞

Hence, 

Ψt (α, λ) −1 Ψt (α, λ) − Φt (α)Γ(λ) = Φt (α)Γ(λ) Φt (α)Γ(λ)   s −1 Y ϕ(αd b + λφ ) t,j s−j t − 1 . =Φt (α)Γ(λ)  −1 ϕ(αd b )ϕ(λφ ) t,j s−j t j=−∞



We now consider s Y ϕ(αd−1 t bt,j + λφs−j ) −1 −1 ϕ(αdt bt,j )ϕ(λφs−j ) j=−∞ s−ν Yt

s s Y Y ϕ(αd−1 ϕ(αd−1 1 t bt,j + λφs−j ) t bt,j + λφs−j ) = × × −1 −1 −1 ϕ(λφs−j ) ϕ(αdt bt,j )ϕ(λφs−j ) j=s−ν +1 ϕ(αdt bt,j ) j=s−ν +1 j=−∞ t

:=A1 × A2 × A3 − 1,

t

say.

We shall show A1 −1 = o(1), A2 −1 = o(1) and A3 −1 = o(1), which imply that A1 A2 A3 −1 = o(1). −1 By the definition of R , for j ≤ s−νt , we have |αd−1 t bt,j +λφs−j | < , |αdt bt,j | <  and |λφs−j | < 

simultaneously on R . As a result, all characteristic functions in A1 can be expanded at zero by Taylor expansion. That is, for j ≤ s − νt , 1 −1 2 ϕ(αd−1 t bt,j + λφs−j ) =1 − (αdt bt,j + λφs−j ) (1 + o(1)), 2 1 −1 2 ϕ(αd−1 t bt,j ) =1 − (αdt bt,j ) (1 + o(1)), 2 1 ϕ(λφs−j ) =1 − (λφs−j )2 (1 + o(1)). 2 Therefore, s−ν Yt

ϕ(αd−1 t bt,j + λφs−j ) −1 −1 ϕ(αdt bt,j )ϕ(λφs−j ) j=−∞ Qs−νt −1 1 2 j=−∞ [1 − 2 (αdt bt,j + λφs−j ) ] = Qs−νt −1 Q s−νt −1 1 1 2 2 j=−∞ [1 − 2 (αdt bt,j ) ] j=−∞ [1 − 2 (λφs−j ) ]

A1 − 1 =

=−

s−ν s−ν s−ν 1 Xt 1 Xt 1 Xt −1 2 2 (αd−1 b + λφ ) + (αd b ) + (λφs−j )2 t,j s−j t,j t t 2 2 2 j=−∞

j=−∞

29

j=−∞

s−ν Xt

= − αλ

d−1 t bt,j φs−j ,

j=−∞

implying that |A1 − 1| ≤ C|αλ|d−1 t due to |bt,j | ≤ b0 and

P∞

j=0 |φj |

< ∞.

Meanwhile, we have s Y

A2 − 1 =

j=s−νt

1 ϕ(αd−1 t bt,j ) +1

−1=

s X j=s−νt +1

1 2 −2 1 2 (αd−1 t bt,j ) = α dt 2 2

s X

b2t,j

j=s−νt +1

1 −1 ≤ b20 α2 d−2 t νt ≤ C|α|dt νt , 2 due to b2t,j < b20 and b0 |α|d−1 t <  on R . −1 Moreover, since |αd−1 t bt,j | <  on R , for s − νt + 1 ≤ j ≤ s we may expand ϕ(αdt bt,j + λφs−j ) at −1 0 the point λφs−j , giving that ϕ(αd−1 t bt,j + λφs−j ) = ϕ(λφs−j ) + ϕ (λφs−j )αdt bt,j (1 + o(1)). It follows

that s Y

ϕ(αd−1 t bt,j + λφs−j ) −1 ϕ(λφs−j ) j=s−νt +1   s Y ϕ0 (λφs−j ) −1 αdt bt,j − 1 = 1+ ϕ(λφs−j )

A3 − 1 =

=

j=s−νt +1 s X

j=s−νt +1

ϕ0 (λφs−j ) −1 αd bt,j = ϕ(λφs−j ) t

s X

h(λφs−j )αd−1 t bt,j ,

j=s−νt +1

where we omit the higher order terms and h(u) = ϕ0 (u)/ϕ(u) defined in Assumption 1(d). Invoking the condition on h(u) in Assumption 1 gives |A3 − 1| ≤

s X

−1 |h(λφs−j )αd−1 t bt,j | ≤ |α|dt b0

j=s−νt +1

νt X

|h(λφj )| ≤ C d−1 t νt |α| k(λ),

j=0

where we have used Assumption 1(d) to deduce maxj≥0 |h(λφj )| ≤ k(λ), and 0 < C < ∞ is a constant. Finally, it follows that ZZ 1 |Ψt (α, λ) − Φt (α)Γ(λ)|dαdλ |T1 | ≤ (2π)2 R ZZ Ψt (α, λ) 1 dαdλ = |Φ (α)Γ(λ)| − 1 t Φt (α)Γ(λ) (2π)2 R ZZ 1 = |Φt (α)Γ(λ)| |A1 A2 A3 − 1| dαdλ (2π)2 R ZZ =C |Φt (α)Γ(λ)|(|A1 − 1| + |A2 − 1| + |A3 − 1|)dαdλ (1 + o(1)) R ZZ ZZ −1 −1 ≤C1 dt |αλ Φt (α)Γ(λ)|dαdλ + C2 dt νt |α||Φt (α)Γ(λ)|dαdλ R R ZZ + C3 d−1 |αv(λ)Φt (α)Γ(λ)|dαdλ t R ZZ ZZ −1 −1 ≤C1 dt |αλΦt (α)Γ(λ)|dαdλ + C2 dt νt  |α||Φt (α)Γ(λ)|dαdλ ZZ + C3 d−1 |α k(λ)Φt (α)Γ(λ)|dαdλ = o(1), t νt where we have used Assumption 1(d) again. This shows that |κt (x, u) − ft (x)ρ(u)| → 0 as t → ∞.

30

(2) Recalling that xi =

Pi

j=−∞ bi,j j ,

for any nonnegative real numbers {ai } satisfying

PT

i=mT

ai =

1, we define T X

ξT =

ai d−1 i xi

=

i=mT



For a better exposition, let BT,j :=

bi,j j

j=−∞



T X

 j . ai d−1 i bi,j

 j=−∞

i X

ai d−1 i

i=mT

T X

=

T X

i=max(j,mT )

−1 i=max(j,mT ) ai di bi,j

PT

in what follows.

In order to prove part (2) of Lemma A.5, it suffices to show that es and ξT are asymptotically independent. Notice that BT,j perform similarly as d−1 t bt,j in xt . First, bi,j are uniformly bounded P across both i and j, i.e. |bi,j | < b0 = k |ψk | hold for all i and j. Second, for given ai , |BT,j | ≤ c0 d−1 mT for some constant c0 . Therefore, ξT can be treated as d−1 T xT and hence following the same schedule as in (1), we can show the asymptotic independence of ξT and es for any given s ≥ 1.  Proof of Theorem 3.2 Note that Z τ Z has element

PT

t=1 Fi−1 (xt )Fj−1 (xt )

at the place of (i, j). For the sake of conve-

nience, denote that Fij (·) = Fi (·)Fj (·) for i 6= j. It follows that !2 p−1

2 p−1 p−1

T X X X X

dT τ d T

Z Z − LB (1, 0)Ip = Fij (xt ) +

T T t=1

i=0 j=0,6=i

:=A1T + A2T ,

i=0

!2 T dT X 2 Fi (xt ) − LB (1, 0) T t=1

say.

Firstly, we shall show that A1T = oP (1) by proving E[A1T ] → 0 as T → ∞. Noting that A1T

p p−1 X p−1 p−1 T T t−1 X X X d2T X 2 d2T X X F (x ) + 2 [Fij (xt )Fij (xs )] = t ij T2 T2 i=0 j=0,6=i

t=1

i=0 j=0,6=i

t=2 s=1

:=Aa1T + Ab1T , it suffices to show E[Aa1T ] → 0 and E[Ab1T ] → 0 as T → ∞. Using the uniformly boundedness of the density ft (x) for d−1 t xt from Lemma A.1, Z p−1 X p−1 p−1 X p−1 n T X X d2T X −1 d2T X 2 E[Aa1T ] = E[Fij (xt )] = dt Fij2 (x)ft (d−1 t x)dx T2 T2 i=0 j=0,6=i

≤C

t=1

T d2T X −1/2 t T2 t=1

p−1 X p−1 X i=0 j=0,6=i

i=0 j=0,6=i

Z

t=1

1 Fi2 (x)Fj2 (x)dx = C(p2 − p) √ = O(1)p2 T −1/2 = o(1), T

by Assumption 3, where we have used the fact that

R

Fi2 (x) = 1 and the uniform boundedness of the

sequence Fi2 (x) as well as ft (x). Note that, for t > s, xt = xts + x∗s . Recall Lemma A.1 that d−1 ts xts have densities fts (x) that are uniformly bounded over all t, s and x ∈ R, and that satisfy Lipschitz condition. Thus, by the R orthogonality Fij (x)dx = 0, we have E[Ab1T ] =

p−1 X p−1 T t−1 X d2T X X E[Fij (xt )Fij (xs )] T2 i=0 j=0,6=i

t=2 s=1

31

=

p−1 X p−1 T t−1 X d2T X X E[Fij (xts + x∗s )Fij (xs )] T2 t=2 s=1

i=0 j=0,6=i

  Z  p−1 X p−1 T t−1 X d2T X X −1 x − x∗s dxFij (xs ) dts E = Fij (x)fts T2 dts t=2 s=1

i=0 j=0,6=i

=

Z    ∗    p−1 X p−1 T t−1 X d2T X X −1 x − x∗s −xs d E F (x) f − f dxF (x ) ij ts ts ij s ts T2 dts dts t=2 s=1

i=0 j=0,6=i

which gives Z p−1 X p−1 T t−1 X d2T X X −2 |E[Ab1T ]| ≤C dts |xFij (x)|dxE|Fij (xs )| T2 t=2 s=1

i=0 j=0,6=i

≤C

p−1 X p−1 X i=0 j=0,6=i

≤C

Z Z T t−1 d2T X X −2 −1 dts ds |xFij (x)|dx |Fij (x)|dx T2 t=2 s=1

p−1 X T t−1 i−1 p X d2 X X 2j + 1 T2 (t − s)−1 s−1/2 T t=2 s=1

i=2 j=0

=O(1)p5/2

T t−1 d2T X X (t − s)−1 s−1/2 = O(1)p5/2 T −1/2 ln(T ) = o(1), T2 t=2 s=1

R

R

x2 Fj2 (x)dx

1/2

= (j + 1/2)1/2

|xFij (x)|dx =

R

|xFi (x)Fj (x)|dx ≤

by a recursive relation for Hermite functions and

R

|Fij (x)|dx ≤ 1 due to Cauchy-Schwarz inequality.

where we have used the facts that

Secondly, to tackle A2T , define for any i : 0 ≤ i ≤ p − 1 and  > 0, φ (x) = P R φ(z) = φ1 (z), and UT (i, ) = dTT Tt=1 Fi2 (xt + dT x)φ(x)dx.

2 2 √ 1 e−x /(2 ) , 2π

Observe that A2T

!2 T dT X 2 = Fi (xt ) − LB (1, 0) T t=1 i=0 !2 !2 p−1 p−1 T T X X dT X 2 1X −1 ≤4 Fi (xt ) − UT (i, ) + 4 φ (dT xt ) UT (i, ) − T T t=1 t=1 i=0 i=0 !2 Z 1 2 Z 1 T 1X −1 + 4p φ (dT xt ) − φ (B(r))dr + 4p φ (B(r))dr − LB (1, 0) T 0 0 p−1 X

t=1

:=4Aa2T + 4Ab2T + 4Ac2T + 4Ad2T . Hence, it is sufficient to show that E[A`2T ] = o(1) for ` = a, b, c and d. For the first term Aa2T , notice that !2 T Z dT X = [Fi2 (xt ) − Fi2 (xt + dT x)]φ(x)dx T t=1 t=1 !2 !2 Z X T T 2 Z X d [Fi2 (xt ) − Fi2 (xt + dT x)]φ(x)dx ≤ T2 [Fi2 (xt ) − Fi2 (xt + dT x)] φ(x)dx, T

T dT X 2 Fi (xt ) − UT (i, ) T

=

d2T T2

!2

t=1

t=1

by Cauchy-Schwarz inequality.

32

Recalling xt = xts + x∗s and d−1 ts xts has density fts (·) which satisfies Lipschitz condition, we then have T X

d2T E T2 d2 ≤ T2 T

!2 [Fi2 (xt )



Fi2 (xt

+ dT x)]

t=1

T X

E[Fi2 (xt ) − Fi2 (xt + dT x)]2

t=1

T t−1 d2T X X 2 2 2 2 E{[F (x ) − F (x + d x)] · [F (x ) − F (x + d x)]} t t s s T T i i i i T2 t=2 s=1   Z T T t−1 2 X dT d2 X X −1 z −1 2 2 2 ≤C 2 dz + 2 T2 dt [Fi (z) − Fi (z + dT x)] ft dts T dt T t=1 t=2 s=1 Z   ∗  2   2  u − x s 2 2 du · Fi (xs ) − Fi (xs + dT x) × E Fi (u) − Fi (u + dT x) fts dts

+2

T T t−1 d2T X −1 d2T X X −1 d + C dts 2 t T2 T2 t=1 t=2 s=1 Z       2  u − x∗s u − dT x − x∗s 2 2 × E Fi (u) fts − fts du · Fi (xs ) − Fi (xs + dT x) dts dts

≤C1

T t−1 d2T X X −2 dts E|Fi2 (xs ) − Fi2 (xs + dT x)| T2 t=2 s=1 Z T t−1 d2T X X −2 −1 −1/2 ≤C1 · T + C2 |x|dT 2 dts ds |Fi2 (u) − Fi2 (u + dT x)|du T

≤C1 · T −1/2 + C2 |x|dT

t=2 s=1

≤C1 · T

−1/2

+ C2 |x| ln(T )

where 0 < C1 , C2 < ∞ are some constants which may be different at each appearance, and we have used variable change in the integrals, i.e.        Z Z  2  u − x∗s u − x∗s u − dT x − x∗s 2 2 Fi (u) − Fi (u + dT x) fts du = Fi (u) fts − fts du dts dts dts R R and then the Lipschitz condition is applied; meanwhile, |Fi2 (u) − Fi2 (u + dT x)|du ≤ 2 Fi2 (u)du = 2 is also derived from a variable change. Then, noting that there is φ(x) in the above equation, E[Aa2T ] is bounded by p−1  X

C1 T −1/2 + C2  T −1/2 ln(T )



Z |x|φ(x)dx

= C1 T −1/2 p + C2  ln(T ) p = o(1)

i=0

by a proper choice of . To deal with the second term Ab2T , denote GiT (x) := dT

Rx

Fi2 (dT u)du =

R dT x

Fi2 (u)du, so that

dGiT (x) = dT Fi2 (dT x)dx. Also define, G(x) = 1 if x > 0, and G(x) = 0 if x < 0. We have for any R fixed i, GiT (x) → G(x) as T → ∞ at all continuous points of G(x) since Fi2 (x)dx = 1. Notice that T Z T Z dT X dT X 2 UT (i, ) = Fi (xt + dT x)φ(x)dx = Fi2 (dT x)φ (x − d−1 T xt )dx T T t=1 t=1 Z T 1X φ (x − dT−1 xt ))dGiT (x). = T t=1

33

Hence, we obtain Ab2T =

p−1 X

T 1X UT (i, ) − φ (d−1 T xt ) T

!2

t=1

i=0

!2 p−1 Z T T X 1X 1X −1 −1 φ (x − dT xt ))dGiT (x) − φ (dT xt ) = T T t=1

i=0

≤2

p−1 X

Z

i=0

+2

p−1 X i=0

|x|≤v

t=1

T X

1 T

φ (x −

d−1 T xt )dGiT (x)

t=1

Z |x|>v

!2 T 1X −1 − φ (dT xt ) T t=1 !2

T 1X φ (x − d−1 T xt ))dGiT (x) T

,

t=1

where v > 0 is sufficiently large and fixed. For the second term above, observe that !2 !2 p−1 Z p−1 Z T X X 1X −1 φ (x − dT xt ))dGiT (x) ≤ OP (1) dGiT (x) T |x|>v |x|>v t=1 i=0 i=0 !2  2 Z Z p−1 p−1 X 1 X 2 2 |x|Fi (x)dx =OP (1) ≤ OP (1) Fi (x)dx (dT v)2 |x|>dT v i=0 i=0 p−1 Z p−1 1 X 1 X 2 2 ≤OP (1) x Fi (x)dx = OP (1) i = OP (1)p2 d−2 T , (dT v)2 (dT v)2 i=0 i=0 R by Cauchy-Swarchitz inequality and (iv) of Lemma A.3, x2 Fi2 (x)dx = (2i + 1)/2, where we have −1 used φ (x − d−1 T xt )) = OP (1) on the region |x| > v for all  > 0 since dT xt = OP (1) and v is large

enough so that x − d−1 T xt 6= 0 in probability on the region |x| > v. Now, divide the interval [−v, v] into 2m + 1 subintervals with equal length by a grid {sm,` : ` = −m, · · · , m} where sm,−m = −v < sm,−m+1 < · · · < sm,m < sm,m+1 = v. Note also that 0 ∈ (sm,0 , sm,1 ). We have Z m−1 T T Z sm,`+1 X 1X 1X −1 φ (x − d−1 x ))dG (x) − φ (s − d x )dG (x) t t  iT iT m,` T T |x|≤v T T t=1 t=1 sm,` `=−m m # " Z sm,`+1 T X 1X −1 x ))dG (x) = (φ (x − d−1 x ) − φ (s − d t  t iT m,` T T T sm,` t=1 `=−m Z Z 2v 2v 2v ≤C dGiT (x) = C Fi2 (x)dx ≤ C 2m + 1 |x|≤v 2m + 1 |z|≤dT v 2m + 1 due to the boundedness of the derivative of φ (·). Moreover, we have m T Z m−1 T Z sm,`+1 X X 1X 1 X sm,`+1 −1 φ (sm,` − d−1 x )dG (x) − φ (s − d x )dG(x) t t  iT m,` T T T T t=1 sm,` t=1 sm,` `=−m `=−m Z sm,`+1 T m−1 X 1X −1 = φ (sm,` − dT xt ) d(GiT (x) − G(x)) T sm,` t=1 `=−m Z m−1 m Z sm,`+1 sm,1 X Z sm,`+1 X ≤ d(G (x) − G(x)) = d(G (x) − G(x)) + 2 dG (x) iT iT iT sm,` sm,0 sm,` `=−m `=1 m Z dT sm,`+1 X ≤|GiT (sm,1 ) − 1 − GiT (sm,0 )| + 2 Fi2 (x)dx `=1

dT sm,`

34

Z d sm,1 Z dT sm,0 Z ∞ T 2 2 = Fi (x)dx − 1 − Fi (x)dx + 2 Fi2 (x)dx dT sm,1 Z Z ∞ Z dT sm,0 Fi2 (x)dx Fi2 (x)dx + 2 F 2 (x)dx + = dT sm,1 i dT sm,1 Z 1/2 Z ∞ √ 2 −1 2 2 Fi (x)dx ≤ 4(dT sm,1 ) =4 x Fi (x)dx = O(1)(dT v/m)−1 i, dT sm,1

where we have used the facts that Fi2 (x) is an even function as Fi (x) is either even or odd function, R 2 R Fi (x)dz = 1, x2 Fi2 (x)dx = O(1)i and Cauchy-Schwarz inequality. R sm,`+1 dG(x) = 0 if 0 6∈ (sm,` , sm,`+1 ) and 1 otherwise, we have In addition, by sm,` m−1 Z sm,`+1 T n X 1X X 1 −1 dG(x) − φ (sm,` − d−1 φ (d x ) x )  t t T T T T sm,` t=1 t=1 `=−m T 1 X v −1 [φ (sm,0 − d−1 = x ) − φ (−d x )] ≤ C|sm,0 | = C , t  t T T T m t=1

by the symmetry of φ (·) and the boundedness of its derivative. It follows that the second term Tb2T is surely bounded by p−1 X

!2 T dT X 2 2 −2 φ (d−1 ≤ C(p2 d−2 UT (i, ) − T xt ) T + p(v/m) + p (dT v/m) ). T t=1

i=0

τ Noting that v is fixed, pd−1 T = o(1) as T → ∞, we may choose a m = mT = T such that not only

p/m2 → 0 but also pm2 d−2 T → 0. This is fulfilled if we let τ satisfy α < 2τ < 1 − α. Such τ does exist due to Assumption 3. For the third term Ac2T , notice that !2 Z 2 Z 1 Z 1 T 1 1X 1 −1 −1 φ (B(r))dr φ (dT xt ) − = φ (WT (r))dr − φ (B(r))dr + φ (dT xT ) T T 0 0 0 t=1 Z 1 2     Z 1 1 1 2 ≤2 [φ (WT (r)) − φ (B(r))]dr + OP ≤2 [φ (WT (r)) − φ (B(r))] dr + OP , 2 T T2 0 0 where WT (r) := d−1 T x[T r] . By the strong approximation, in a richer probability space we have sup0≤r≤1 |WT (r) − B(r)| = o(T −1/4 log(T )) Ac2T

a.s. (Phillips, 2001, p. 391) in an expanded probability space. Thus, the third term   is almost surely bounded by oP (pT −1/2 log2 (T ) ) + OP pT −2 , by noting that the derivative

of φ (·) is bounded and neglecting some constants. For the last term Ad2T , note that the derivative of the Heaviside function I(x ≥ 0) is the Deric delta d dx I(x

≥ 0) = δ(x), or equivalently, Heaviside function I(x ≥ 0) is the distribution R function of the delta function. Thus, using the property of the delta function (i.e. f (x)δ(x)dx = f (0) R R for any continuous function f (x)), for any  > 0 we have LB (1, x)δ(x)dx = LB (1, x)δ(x)dx = function δ(x), i.e.,

LB (1, 0) because δ(x) = δ(x)/. See Gel’fand and Shilov (1964) for detailed facts on the generalized functions. It follows from the occupation time formula that Z 1 Z φ (B(r))dr − LB (1, 0) = φ (x)LB (1, x)dx − LB (1, 0) 0

35

Z

Z φ(x)LB (1, x)dx −

=

Z LB (1, x)δ(x)dx =

LB (1, x)d(Φ(x) − I(x ≥ 0))

Z :=

LB (1, x)dΨ(x),

where Φ(x) is the distribution function of a standard normal variable and Ψ(x) := Φ(x) − I(x ≥ 0), so that Ψ(∞) = Ψ(−∞) = 0. R λ Lemma 2.1 of Borodin (1986, p. 239) shows that E LB (1, x)dΨ(x) ≤ Cλ/2 for any λ = 1, 2, · · · . Therefore, taking λ = 1, E[Ad2T ]2 ≤ p. If we choose  = o(p−1 ), we will conclude the assertion.



References Andrews, D. W. K. (1991). Asymptotic normality of series estimator for nonparametric and semiparametric models. Econometrica, 59:307–345. Askey, R. and Wainger, S. (1965). Mean convergence of expansions in Laguerre and Hermite series. American Journal of Mathematics, 87:695–708. Balke, N. S. and Fomby, T. B. (1997). Threshold cointegration. International Economic Review, 38:627–645. Borodin, A. N. (1986). On the character of convergence to Brownian local time. I. Probability Theory and Related Fields, 72:231–250. Chan, N. and Wang, Q. (2014). Uniform convergence for nonparametric estimators with nonstationary data. Econometric Theory, 30:1110–1133. Chen, J., Gao, J., and Li, D. (2012). Estimation in semiparametric regression with nonstationary regressors. Bernoulli, 18:678–702. Chen, X. and Shen, X. (1998). Sieve extremum estimates for weakly dependent data. Econometrica, 66:289–314. Dong, C. and Gao, J. (2013). Orthogonal expansion of functionals of l´evy processes: theory and practice. http://www.buseco.monash.edu.au/ebs/pubs/wpapers/2013/03-13.php. Dong, C. and Gao, J. (2014). Specification testing in structural nonparametric cointegration: theory and practice. http://www.buseco.monash.edu.au/ebs/pubs/wpapers/2014/wp02-14.php. Engle, R. F. and Granger, C. W. J. (1987). Cointegration and error correction: representation, estimation and testing. Econometrica, 55:251–276. Gao, J. (2007). Nonlinear Time Series: Semiparametric and Nonparametric Methods. Chapman & Hall/CRC, New York. Gao, J., Li, D., and Tjøstheim, D. (2009). Uniform consistency for nonparametric estimators in null recurrent time series. http://economics.adelaide.edu.au/research/papers/doc/wp2009-26.pdf. Gao, J. and Phillips, P. C. B. (2013). Semiparametric estimation in triangular system equations with nonstationarity. Journal of Econometrics, 176:59–79. Gel’fand, I. M. and Shilov, G. E. (1964). Generalized Functions. Academic Press, New York.

36

Granger, C. W. J. (1995). Modelling nonlinear relationship between extended-memory variables. Econometrica, 63(2):265–279. Kanwal, R. P. (1983). Generalized Functions: Theory and Technique. Academic Press, New York. Karlsen, H. A., Mykelbust, T., and Tjøstheim, D. (2007). Nonparametric estimation in a nonlinear cointegration type model. Annals of Statistics, 35:252–299. Karlsen, H. A. and Tjøstheim, D. (2001). Nonparametric estimation in null recurrent time series. Annals of Statistics, 29:372–416. Lukacs, E. (1970). Characteristic Functions. Griffin, London, second edition edition. Marmer, V. (2008). Nonlinearity, nonstationarity and spurious forecasts. Journal of Econometrics, 142:1–27. Newey, W. K. (1997). Convergence rates and asymptotic normality for series estimators. Journal of Econometrics, 79:147–168. Park, J. Y. and Phillips, P. C. B. (1999). Asymptotics for nonlinear transformations of integrated time series. Econometric Theory, 15:269–298. Park, J. Y. and Phillips, P. C. B. (2001). Nonlinear regression with integreted time series. Econometrica, 69(1):117–161. Phillips, P. C. B. (2001). Descriptive econometrics for nonstationary time series with empirical applications. Journal of Applied Econometrics, 16:389–413. Phillips, P. C. B. and Solo, V. (1992). Asymptotics for linear processes. Annals of Statistics, 20(2):971–1001. Robinson, P. M. (1997). Large-sample inference for nonparametric regression with dependent errors. Annals of Statistics, 25:2054–2083. Schwartz, S. C. (1967). Estimation of probability density by an orthogonal series. Annals of Mathematical Statistics, 38:1261–1265. Szego, G. (1975). Orthogonal Polynomials. Colloquium publications XXIII. American Mathematical Association, Providence, Rhode Island. Ter¨ asvirta, T., Tjøstheim, D., and Granger, C. W. J. (2010). Modelling Nonlinear Economic Time Series. Advanced Texts in Econometrics. Oxford University Press, New York. Wang, Q. and Chan, N. (2014). Uniform convergence rates for a class of martingales with application in nonlinear cointegrating regression. Bernoulli, 20:207–230. Wang, Q. and Phillips, P. C. B. (2009a). Asymptotic theory for local time density estimation and nonparametric cointegreting regression. Econometric Theory, 25:710–738. Wang, Q. and Phillips, P. C. B. (2009b). Structure nonparametric cointegrating regression. Econometrica, 77:1901–1948. Wang, Q. and Phillips, P. C. B. (2011). Asymptotic theory for zero energy functionals with nonparametric regression applications. Econometric Theory, 27:235–259. Wang, Q. and Phillips, P. C. B. (2014). Nonparametric cointegrating regression with endogeneity and long memory. Working paper available at http://www.maths.usyd.edu.au/u/pubs/publist/preprints/2014/wang12.pdf.

37