asymptotic theory for nonlinear quantile regression ...

1 downloads 0 Views 222KB Size Report
ASYMPTOTIC THEORY FOR. NONLINEAR QUANTILE REGRESSION. UNDER WEAK DEPENDENCE∗. Walter Oberhofer and Harry Haupt. University of ...
ASYMPTOTIC THEORY FOR NONLINEAR QUANTILE REGRESSION UNDER WEAK DEPENDENCE∗ Walter Oberhofer and Harry Haupt University of Regensburg Department of Economics and Econometrics, University of Regensburg. Universitaetsstr. 31, 93053 Regensburg, Germany. Tel.: +49-941-943-2739, fax: +49-941-943-4917. [email protected], [email protected] Running title: NONLINEAR QUANTILE REGRESSION Address correspondence to: Harry Haupt, Department of Economics and Econometrics, University of Regensburg, Universit¨atsstr. 31, 93053 Regensburg, Germany e-mail: [email protected]. Abstract: This paper derives the consistency and asymptotic normality of the nonlinear quantile regression estimator with weakly dependent errors under mild assumptions. The notion of weak dependence introduced in this paper, can be considered in part as a quantile specific local variant of recently introduced concepts. In addition, we provide new and detailed results on the connection between the measure of dependence and the rate of convergence of regression quantiles and introduce a new probability inequality for nonlinear regression estimators based on the L1 -norm, extending existing results for least squares estimation. There is an obvious connection of the derived asymptotic results to the corresponding results of least squares estimation.



We wish to thank the participants of the workshop “New trouble for standard regression analysis”, November 2005, Regensburg/Germany, and the participants of the ICMS workshop “Quantile Regression, LMS Method and Robust Statistics in the 21st Century”, June 2006, Edinburgh/Scotland for many valuable comments. The second author gratefully acknowledges the support of the International Centre for Mathematical Sciences (ICMS). We also thank three anonymous referees and an associate editor for helpful comments.

1

Introduction

The concept of quantile regression, introduced in the seminal paper of Koenker and Bassett (1978), has become a widely used and accepted technique in many areas of theoretical and applied econometrics. Only recently, the first monograph on this topic has been published by Roger Koenker (2005), covering a wide scope of well established foundations and (even a ‘twilight zone’ of) actual research frontiers. In addition, many of the numerous contributions in this fast evolving field have been reviewed and summarized in recent survey articles (see inter alia Buchinsky, 1998, and Yu et al., 2003). In contrast to the more methodological literature, there are also important, nontechnical attempts to bring the key concepts and especially the applicability of quantile estimation to a wider audience outside the statistical profession (see for example Koenker and Hallock, 2001). In this paper we consider quantile regressions where the dependent variable y and covariates x1 , . . . , xK satisfy a nonlinear model with additive errors. Nonlinear quantile regression models have been discussed by several authors. Oberhofer (1982) and Wang (1995) considered the consistency and asymptotic normality of the least absolute deviations (LAD) estimator under the assumption of independent and identically distributed (i.i.d.) errors, respectively. The i.i.d. assumption has been challenged in different ways in the quantile regression literature. Koenker and Bassett (1982) first investigated the case of heteroscedasticity based on regression quantiles, other authors discussed this case for the most prominent quantile, the median (see for example Knight, 1999, Zhao, 2001, and the literature cited there). Quantile regression with dependent errors have been studied for LAD estimation by Phillips (1991) and Weiss (1991), for unconditional quantiles in a parametric context by Oberhofer and Haupt (2005), and in a nonparametric context by De Gooijer and Zerom (2003) and Ioannides (2004). In the context of pure time series models, the nonparametric estimation of regression quantiles under dependence has been discussed by Cai (2002), who also provides a survey of the preceding literature in this context. Other relevant works in this context include Portnoy (1991), Koenker and Park (1994), Jureckova and Prochazka (1994), and, more recently, Mukherjee (1999,2000), Chernozhukov and Umantsev (2001), Chen et al. (2004), and Engle and Manganelli (2004). We contribute to this literature by providing mild, quantile specific conditions for the consistency and asymptotic normality of nonlinear regression quantiles when the errors are weakly dependent. We also provide an in-depth analysis of the connection between the rate of convergence and the measure of dependence, and derive a new probability inequality for nonlinear regression estimators based on the L1 -norm. The following Section 2 introduces the model framework and derives the loss function of conditional quantile estimation. The remainder of the paper is organized as follows: In Section 3, after a discussion of the basic consistency assumptions, we prove consistency. In Section 4 we discuss rates of convergence and prove consistency in a wider sense for both linear and nonlinear regression functions. In Section 5 we derive the assumptions for asymptotic normality of regression quantiles under weak dependence, again for both linear and nonlinear regression functions. Finally, after short conclusions in Section 6, in two Appendices we prove several preliminary results required for consistency and asymptotic normality, respectively. 2

2

Regression quantiles

Let (Ω, F, P ) be a complete probability space and let {yt }t∈N be an F-measurable scalar random sequence. Then, consider the regression model yt − g(xt , β0 ) = ut ,

1 ≤ t ≤ T,

(2.1)

where β0 ∈ Dβ ⊂ RK is a vector of unknown parameters, the 1 × L vectors xt are deterministic and given, the dependent variables yt are observable, g(x, β) is in general a nonlinear function defined for x ∈ Dx and β ∈ Dβ from Dx × Dβ → R, where xt ∈ Dx for all t, and ut is an error term. Our aim is to analyze the asymptotic behavior of the ϑ-quantile regression estimator βˆT , i.e. β = βˆT minimizing the objective function T · X

¸ ϑ|yt − g(xt , β)|+ + (1 − ϑ)|yt − g(xt , β)|− ,

(2.2)

t=1

where 0 < ϑ < 1, and for z ∈ R, we define ( ( z if z > 0, 0 if z > 0, def def |z|+ = and |z|− = 0 if z ≤ 0, −z if z ≤ 0. For technical reasons – i.e. to avoid the need of using E(ut ) – instead of the objective function (2.2) we use the equivalent objective function T · X

¸ ϑ|yt − g(xt , β)|+ + (1 − ϑ)|yt − g(xt , β)|− − ϑ|ut |+ − (1 − ϑ)|ut |− .

(2.3)

t=1

For the derivation and discussion of asymptotic results it is convenient to introduce some further definitions. Using the abbreviation def

ht = ht (β) = g(xt , β) − g(xt , β0 ) the objective function (2.3) can be rendered to T · X

¸ ϑ|ut − ht |+ + (1 − ϑ)|ut − ht |− − ϑ|ut |+ − (1 − ϑ)|ut |− .

(2.4)

t=1

A typical element of the sum (2.4), is denoted by at (β), leading to the objective function  −ϑht if ut > max(0, ht ),    T  X ut − ϑht if ht < ut ≤ 0, def AT (β) = at (β), where at (β) =  −ut + (1 − ϑ)ht if 0 < ut ≤ ht ,  t=1   (1 − ϑ)ht if ut ≤ min(0, ht ).

3

(2.5)

Then split up at (β) in at (β) = bt (β) + ct (β), where

(

def

ct (β) =

(2.6)

−ϑht for ut > 0, (1 − ϑ)ht for ut ≤ 0,

and, consequently  0     u −h def t t bt (β) =  −ut + ht    0

(2.7)

if if if if

ut > max(0, ht ), ht < ut ≤ 0, (2.8) 0 < u t ≤ ht , ut ≤ min(0, ht ). P P Furthermore, we define BT (β) = Tt=1 bt (β) and CT (β) = Tt=1 ct (β). The expression ct (β) defined in (2.7) has an interesting interpretation. It contains the factor ht , which arises from the deviation between the regression function and its true value, and, as a second component, a Bernoulli random variable {γt }, where γt = −ϑ for yt lying above and γt = 1 − ϑ for yt lying below the true regression surface. The decomposition of at (β) in (2.6) allows us to study the asymptotic behavior of the objective function by studying separately that of bt (β) and ct (β) in Lemmas 1C-3C, given in Appendix C (for consistency), and in Lemmas 1N-3N, given in Appendix N (for asymptotic normality), respectively. In order to illustrate equations (2.6)-(2.8), consider least squares for the linear regression model yt − xt β = ut . Using the definition of ht and simple algebra, in analogy to (2.4), the least squares P P P P loss function can be written as Tt=1 (ut − ht )2 − Tt=1 u2t = −2 Tt=1 ut ht + Tt=1 h2t . Hence, the P P decomposition is given by BT (β) = Tt=1 h2t and CT (β) = −2 Tt=1 ut ht .

3 3.1

Consistency Basic consistency assumptions

In this framework, the existence of a measurable estimator βˆT usually is ensured by Theorem 3.10 of Pfanzagl (1969), which, if β0 is an inner point of a compact set Dβ , is valid under the assumptions stated below. By Ft (z) we denote the distribution of ut and by Fs,t (z, w) the common distribution of (us , ut ) for s 6= t. Assumptions (C1) to (C3) refer to properties of the covariates and the regression, assumptions (C4) to (C6) refer to the error process. (C1) The 1 × L vectors xt are deterministic and known, t = 1, 2, . . . (C2) Dβ is compact and T 1X lim sup [g(xt , β) − g(xt , β0 )]2 < ∞. T, β∈Dβ T t=1

4

(C3) For every ² > 0 there exists a positive δg such that lim inf T

T 1X |g(xt , β) − g(xt , β0 )| > δg . ||β−β0 ||≥ ² T t=1

inf

(C4) There exist a positive f0 and a positive δ0 , such that for all | x| ≤ δ0 lim inf {min [Ft (| x|) − Ft (0), Ft (0) − Ft (−| x|)]} ≥ f0 | x|. t

(C5) Ft (0) = P (ut ≤ 0) = ϑ, 0 < ϑ < 1, for all t. (C6) For 1 ≤ s, t and s 6= t define αs, t = sup |P (F ∩ G) − P (F )P (G)|, where the supremum is taken over all F elements of the σ-algebra σ(us ) and all G elements of the σ-algebra σ(ut ). Additionally define αk = supt | αt, t+k | for k = 1, 2, 3, ... Then, αk → 0. In assumption (C1) it is not necessary for the regressors to be deterministic as postulated in assumption (C1), as similar behavior can be expected of random regressors {xt } independent of the disturbances {ut }. Consider the example of a linear regression function and let {xt } be a stationary sequence with E(x0t xt ) finite and non-singular. Then, almost all realizations would have the necessary limiting properties. Assumption (C2) is a weaker version of the standard dominance condition given in P¨otscher and Prucha (1997, p.28). In the linear case assumption (C2) is implied P by the assumption that T −1 x0t xt converges to a finite matrix. The identifiable uniqueness condition in assumption (C3) can be compared with the analogous condition in the linear regression model using least squares, i.e. T ¢2 1 1 X¡ xt β − xt β0 = lim inf inf (β − β0 )0 X 0 X(β − β0 ) > 0, lim inf inf T T ||β−β0 ||> ² ||β−β0 ||≥ ² T T t=1

implied by the non-singularity of the limit of the matrix T −1 X 0 X. On the one hand assumption (C2) rules out a too strong growth of the covariates, on the other hand the identification condition (C3) guarantees enough variation. The resulting trade-off problem between assumptions (C2) and (C3) is rather involved for the general nonlinear case and lies beyond the focus of the present paper. Familiar throughout the literature on quantile regression is the assumption that the density ft (z) of Ft (z) exists in a neighborhood of z = 0 and ft (0) ≥ f0 > 0 for all t, implying assumption (C4). Violations of this assumption are treated in the literature on so called non-standard conditions (e.g., Knight, 1998, and Rogers, 2001). Assumption (C5) is a common normalization in quantile regression, when g contains an intercept, implying E(ct (β)) = 0. By virtue of assumption (C6) the error process {ut } is strongly mixing, excluding a too strong dependency. Interestingly, we do not require any moment assumptions. THEOREM 1. Under assumptions (C1)-(C6) the estimator βˆT is consistent. 5

PROOF. From the Chebychev-inequality follows for a stochastic variable Y and a constant M , where both expectation and variance of Y exist, ¡ ¢ V ar(Y ) P M > Y − E(Y ) > −M > 1 − . M2 ¡ ¢ Hence, for Y = AT (β)/T and M = E AT (β)/T /2 6= 0 we have ¡ ¢ µ ¶ ¢ 4V ar AT (β)/T 1 ¡ P AT (β)/T > E AT (β)/T >1− £ ¡ ¢¤2 . 2 E AT (β)/T

(3.1)

According to Lemma 1C and assumption (C3) for ||β − β0 || ≥ ², where ² is an arbitrary positive number, follows ¡ ¢ E AT (β) /2 > 0 = AT (β0 ) ≥ AT (βˆT )). Thus, equation (3.1) implies ¡ ¢ ¢ 4V ar AT (β)/T ˆ P AT (β) > AT (βT ) > 1 − £ ¡ ¢¤2 . E AT (β)/T ¡

(3.2)

Due to assumptions (C2) and (C6) and according to Lemma 2C and Lemma 3C we obtain ¡ ¢ V ar AT (β)/T = o(1). (3.3) Thus, from (3.2) and (3.3), and making use of assumption (C3) and Lemma 1C,  o(1)   ¡ ¢  1 − (f0 δ 2 )2 for β ∈ B>, T (²), 0 P AT (β) > AT (βˆT ) > o(1)   for β ∈ B≤, T (²).  1− 2 f0 (δg /4)4 ¯ Due to B>, T (²) ∪ B≤, T (²) = {β ¯ || β − β0 || ≥ ²}, (3.4) implies for T → ∞

(3.4)

P (|| βˆT − β0 || > ²) → 0,

(3.5)

which proves the assertion. ¥

4

Rate of convergence

When we prove the consistency of (βˆT − β0 ), it is sufficient that {αk } is a null sequence. We now replace assumption (C6) by (C6’) αk = O(k −p ), where 0 < p < 1, in order to make an assertion on the convergence rate of the consistent estimate βˆT . As intuition suggests, this rate depends on the size of the mixing coefficient αk . Hence, in contrast to the argumentation in Theorem 1, we now try to show the consistency of T r (βˆT −β0 ), where 0 < r < p/2. 6

Due to Theorem 1 we can assume without loss of generality that β ∈ B≤, T (²). It is convenient to introduce the variable def

α = T r (β − β0 ).

(4.1)

Then, substitute β by β0 + α/T r in the objective function, and consider β as a function of α: β = β(α). From the minimization of AT (β(α)) we obtain α ˆ T . Thus, following from (3.2), ¡ ¢ µ ¶ 4V ar AT (β(α))/T (4.2) P AT (β(α)) > AT (β(ˆ αT )) > 1 − £ ¡ ¢¤2 . E AT (β(α))/T Having expressed (3.2) in terms of β(α) by considering (4.2), we are in a position to derive the extended consistency result outlined above and stated in Theorems 2 and 3 at the end of this section. Due to reasons of clarity we distinguish the cases of linear and nonlinear regression functions.

4.1

Linear regression functions

Instead of (2.1) we consider yt − xt β0 = ut ,

1 ≤ t ≤ T,

where L = K. Hence, ht = xt (β − β0 ), and the sums in assumptions (C2) and (C3) are given by T ¢2 1 X¡ lim sup xt (β − β0 ) < ∞, T, β∈Dβ T t=1

and

T 1X lim inf inf | xt (β − β0 )| > δg , T || β−β0 ||≥ ² T t=1

respectively. According to Theorem 1 and assumption (C2), we can choose β − β0 small enough, such that T 1 X | ht | < δ0 . 4T t=1

(4.3)

P From assumption (C6’) follows T −1 Tk=1 αk = O(T −p ). In addition, from equation (4.1) follows ht = xt (β − β0 ) = xt α/T r . Together with Lemmas 2C and 3C this leads to ¡

¢

V ar AT (β(α))/T = O(T

−p

)T

−2r

T 1X (xt α)2 , T t=1

(4.4)

and, due to E(CT (β)) = 0 and Lemma 1C, and (4.3), " #2 T X ¡ ¢ 1 E AT (β(α))/T ≥ T −2r f0 |xt α| . 4T t=1

(4.5)

From (4.4) and (4.5), and by virtue of assumptions (C2) and (C3), follows that the right hand side of (4.2) converges to 1 for every fixed α with || α|| 6= 0, and p > 2r. Consequently, for T → ∞, P (|| α ˆ T || > ²) → 0.

(4.6) 7

Since the numerator on the right hand side of (4.2) is of order || α||2 , and the denominator is of order || α||4 , convergence still holds for || α|| → ∞, implying βˆT = βT (ˆ αT ) and P (T r || βˆT − β0 || > ²) → 0. Since ² can be chosen arbitrarily small, however, this implies the consistency of βˆT of order T −r , where r < p/2.

4.2

Nonlinear regression functions

In the nonlinear case the regression function g(x, β) must be smooth in the neighborhood of β = β0 . We assume that g(x, β) has the following Taylor expansion (with remainder) for all x ∈ Dx and β in the neighborhood of β0 : à ! ¯ ¯ 2 ¯ ∂g(x, β) ¯¯ 1 ∂ g(x, β) ¯ g(x, β) = g(x, β0 ) + (β − β0 ) + (β − β0 )0 (β − β0 ), (4.7) ∂β 0 ¯β=β0 2 ∂β∂β 0 ¯β=β ∗ where β ∗ = β0 + ξ(β − β0 ) and 0 < ξ < 1. For ease of notation we introduce the row vector ¯ def ∂g(xt , β) ¯¯ zt = , (4.8) ∂β 0 ¯β=β0 and the K × K matrix ¯ 2 ¯ ∗ def 1 ∂ g(xt , β) ¯ Wt (βt ) = , 2 ∂β∂β 0 ¯β=β ∗

(4.9)

t

where βt∗ = β0 + ξt (β − β0 ) and 0 < ξt < 1. Thus we obtain ht = g(xt , β) − g(xt , β0 ) = zt (β − β0 ) + (β − β0 )0 Wt (βt∗ )(β − β0 ).

(4.10)

Then we assume (C7) If all βt are in a neighborhood of β0 , then, for a fixed MW < ∞, sup α0 Wt (βt )α ≤ MW . t, || α||=1

In addition we need a local version of assumptions (C2) and (C3), P (C8) lim supT, || α||=1 T −1 Tt=1 (zt α)2 ≤ λ0 < ∞, P (C9) lim inf T, || α||=1 T −1 Tt=1 |zt α| ≥ δ0 > 0. Obviously, in the linear case, assumption (C8) corresponds to assumption (C2), and assumption (C9) corresponds to assumption (C3). Note that according to the consistency established in Theorem 1, without loss of generality for an arbitrary η > 0, we can restrict β such that || β − β0 || < η. 8

Then, due to assumptions (C7) and (C8), T T T ¢2 ¢2 1 X¡ 1 X¡ 1X 2 ht ≤ 2 zt (β − β0 ) + 2 (β − β0 )0 Wt (βt )(β − β0 ) , T t=1 T t=1 T t=1

and, by inserting β − β0 = α/T r , T 1X 2 h ≤ T −2r 2λ0 ||α||2 + T −4r 2MW ||α||4 = O(T −2r ). T t=1 t

Then, in analogy to the linear case, ¡ ¢ V ar AT (β(α))/T = O(T −p−2r ).

(4.11)

Due to assumptions (C7) and (C9), for T large enough, T T 11X 1X | zt (β − β0 )| ≤ | ht |, 2 T t=1 T t=1

(4.12)

and in Lemma 1C we can use the left hand side of (4.12) as lower bound instead of the right hand side. Thus, according to (4.11) and (4.12), inequality (4.2) can be applied in analogy to the linear case. The considerations in the previous two subsections can be subsumed to the following result: THEOREM 2. For linear regression functions under assumptions (C1)-(C5) and (C6’), and for nonlinear regression functions under assumptions (C1)-(C5), (C6’) and (C7)-(C9), for T → ∞, P (T r || βˆT − β0 || > ²) → 0, for every ² > 0 and r < p/2 < 1/2. We now prove a new probability inequality for nonlinear regression estimators based on the L1 -norm, which extends results for nonlinear least squares estimation established by of Ivanov (1976) for i.i.d. errors, and Prakasa Rao (1984), and Hu (2004) for mixing errors. Note that the assumptions of Theorem 3 are less strict than the assumptions applied in Hu (2004), where the latter already uses milder assumptions than Prakasa Rao (1984). We have to fortify assumption (C6’) in the following way: (C6”) αk = O(k −p ), where p > 1. Assumption (C6”) implies the convergence of

P∞ k=1

αk .

THEOREM 3. For linear regression functions under assumptions (C1)-(C5) and (C6”), and for nonlinear regression functions under assumptions (C1)-(C5), (C6”) and (C7)-(C9), √ P ( T || βˆT − β0 || > ρ) ≤ O(ρ−2 ). 9

PROOF. For the linear case consider equations (4.4) and (4.5) let technically p = 1 and r = 1/2. Then, the right hand side of (4.2) equals P T −1 Tt=1 (xt α)2 1− ¡ (4.13) ¢4 . P f02 T −1 Tt=1 | xt α|/4 The fraction in (4.13) is of order O(|| α||−2 ). Thus, for all α satisfying ||α|| ≥ ρ > 0, we get P (|| α ˆ T || ≤ ρ) ≥ 1 − O(ρ−2 ), which proves the assertion. The proof of the nonlinear case is analogous to the proof of Theorem 2 and is left to the reader. ¥

5

Asymptotic normality

For the derivation of asymptotic normality we need consistency and some additional assumptions. Typically, these assumptions are local. First, the regression function g(x, β) must be smooth in the neighborhood of β = β0 and we assume the Taylor expansion (4.7) and apply the notation √ introduced in (4.8)-(4.10). Because we try to show the asymptotic normality of T (βˆT − β0 ) it is convenient to introduce the variable √ def α = T (β − β0 ) (5.1) √ and replace β by β0 + α/ T in the objective function. In the proof of asymptotic normality, we need the compactness of the argument of the objective function, α in this case. If α is an element of a compact set and AT (β) is maximized, then β must be contained in a set depending on T which is contracted in the limit to the single point β0 . Due to Theorem 3, we can use this reparametrization and minimize with respect to α. Prakasa Rao (1987) investigates these issues for least squares estimation of (2.1). √ After replacing β by β0 + α/ T we get µ ¶ 1 0 1 ∗ ht = √ zt α + α Wt (βt ) α. (5.2) T T √ Note that βt∗ can be rewritten as βt∗ (α) = β0 + ξt α/ T . Then, the left hand side of the model based on (2.1) can be transformed in the following way: µ ¶ 1 0 1 ∗ yt − g(xt , β) = g(xt , β0 ) + ut − g(xt , β) = ut − √ zt α − α Wt (βt (α)) α, 1 ≤ t ≤ T. T T Also note that ht depends on β. However, due to the substitution of β by α, ht should be properly denoted as ht (β(α)). For the same reason we write at (β(α)), bt (β(α)), and ct (β(α)). The following assumptions are familiar for the cases of linear (see Subsection 5.1) and nonlinear (see Subsection 5.2) regression functions, respectively. Note that assumption (N1) implies assumption (C4). (N1) For some ²f > 0 the density ft (z) of Ft (z) exists for |z| ≤ ²f > 0, is continuous at z = 0 √ uniformly in t, and sup1≤t≤T ft (0) = o( T ). 10

(N2) The density fs, t (z, w) of Fs, t (z, w) exists for | z| ≤ ²f and | w| ≤ ²f , is continuous at (z, w) = (0, 0) uniformly in s and t, α0 (k| u) → 0, where α0 (k| u) = supt | ft, t+k (0, 0) − ft (0)ft+k (0)|, and the supremum is taken over t, k, t + k ∈ N. (N3) For 1 ≤ s < t ≤ T define ωs, t = Fs, t (0, 0) − Fs (0)Ft (0) and ωt, t = Ft (0) − Ft (0)2 . Then T −1 ZT0 ΩT ZT converges for T → ∞ to a K × K matrix Σ, where the rows zt are collected in the T × K matrix ZT , and ΩT is a T × T matrix with generic element ωs, t . (N4) Define ωk = supt | ωt, t+k |, k = 0, 1, 2, . . .. Then ωk = O(k −λ ) for some λ > q > 2. (N5) T −1 ZT0 ΦT ZT converges for T → ∞ to a non-singular K × K matrix V , where ΦT is a T × T diagonal matrix with diagonal elements ϕt = ft (0), 1 ≤ t ≤ T . As some of the assumptions may appear rather abstract at first sight, the following discussion of the assumptions is illustrated with a comparison of linear least squares and quantile regression, respectively, in Table 1, in order to facilitate intution. If the disturbances are i.i.d., assumptions (N1) and (N2) are implied by the existence of f1 (z) in the neighborhood of z = 0, and the continuity of f1 (z) at z = 0. Usually, the stronger assumption that ft (0) is uniformly bounded is made, implying assumption (N1). Note that without loss of generality we use the same upper bound ²f in assumptions (N1) and (N2). Assumptions (N2), (N3), and (N4) restrict the dependence structures imposed on the quantile regression model. Assumption (N2) can be considered as an infinitesimal weak dependence condition and it can be interpreted as a quantile specific variant of the “dependence index sequence” introduced by Castellana and Leadbetter (1986). In the case of independence, the sum in assumption (N2) is equal to zero for all T . Assumption (N3) ensures the existence of the covariance matrix in the limit. Obviously, independence of the two events {us ≤ 0} and {ut ≤ 0} implies ωs, t = 0. At the same time (N3) reflects the dependence structure and heterogeneity of the error process. Note, that a too strong dependence hinders convergence in assumption (N3). If x1t = 1, then, according to assumption P (N3), T −1 Ts, t=1 ωs, t must converge. Further, it is important to note, that the mixing property ¯ ¯ does not require the whole σ-algebras σ(ut ¯t ≤ s) and σ(ut ¯t ≥ s + k), for all s = 1, 2, . . ., respectively. A peculiarity of quantile regression lies in the fact, that the only thing that matters is a local mixing condition for the point (0, 0). In this sense, for s 6= t, we can view ωs,t = Fs, t (0, 0) − Fs (0)Ft (0) as a local measure of dependence (or a local mixing coefficient), and fs, t (0, 0)−fs (0)ft (0) as an analogous infinitesimal measure. If the limit of the matrix ZT0 ΩT ZT is singular, then the √ limiting distribution of T (βˆ −β0 ) is singular, too. In least squares estimation the matrix ZT0 ΩT ZT corresponds to T −1 XT0 E(uu0 )XT /σ 2 , and ZT0 ΦT ZT in assumption (N5), which controls the form of heteroscedasticity, corresponds to T −1 σ 2 X 0 X, respectively. Obviously, no moments of the error process are required for quantile estimation. As we will see, (ϑ−ϑ2 )/ft (0)2 corresponds to variances and (Fs, t (0, 0) − ϑ2 )/fs (0)ft (0) to covariances (see Table 1). Consider the Bernoulli process γt = 1 − ϑ for ut ≤ 0, and γt = −ϑ for ut > 0. Then, from assumption (N4) follows that the process {γt } is strongly mixing of size −r, where r > 2. Generally, the properties of the Bernoulli process {γt }, and the behavior of the distribution functions and densities in the near of z = 0, and (z, w) = (0, 0), respectively, are vital for weak dependence concepts in the quantile estimation framework. 11

Least squares regression model loss function stochastics normalization “variance” “covariance” asymptotic “covariance” matrix

Quantile regression

y t = x t β + ut , 1 ≤ t ≤ T P t [ϑ| ut |+ + (1 − ϑ)| ut |− ] 1. and 2. moments exist ft (z) exists, ft (0) > 0, fs,t (z, w) exists E(ut ) = 0 Ft (0) = ϑ 2 2 E(ut ) = σ ωt,t ϑ(1 − ϑ)/ft2 (0) E(us ut ) = σ 2 ωs,t [Fs,t (0, 0) − Fs (0)Ft (0)]/fs (0)ft (0) 2 0 −1 0 0 −1 σ (X X/T ) (X ΩX/T )(X X/T ) (X 0 ΦX/T )−1 (X 0 ΩX/T )(X 0 ΦX/T )−1 , Φ is diagonal with φt = 1/ft (0) ωs,s = ϑ(1 − ϑ), ωs,t = Fs,t (0, 0) − ϑ2 P

2 t ut

Table 1: Comparison of least squares and quantile regression

5.1

Linear regression functions

Our aim is to derive the asymptotic distribution of the ϑ-quantile regression estimator βˆT defined in Section 2. In this subsection we discuss only the special case of a linear regression function, and the general nonlinear case will be treated in the next subsection. Again, instead of (2.1) we consider the linear regression model yt − xt β0 = ut , 1 ≤ t ≤ T . Then, due to g(xt , β) = xt β, we obtain zt = xt , Wt = 0, ht = xt (β − β0 ). In the linear case, in addition to assumptions (N1)-(N5) we have to assume: (N6) sup1≤ t≤ T T −r || xt || = o(1), for an r < 1/2, where the assumptions of Theorem 2 are given P and T −1 Tt=1 || xt ||4 = O(1). According to Theorem 2, we have plim T r || βˆT − β0 || = 0. For an arbitrary ² > 0 we minimize AT (β) for T r || β − β0 || ≤ ². The resulting estimator will be denoted as βT, ² . Thus, from Theorem 2 follows the consistency of βˆT , implying P (βˆT 6= βT, ² ) → 0 for T → ∞. Hence, if βT, ² converges in distribution, the same applies to βˆT . Then, due to ht = xt (β − β0 ), | ht | ≤ T −r || xt || · T r || β − β0 ||.

(5.3)

Due to assumption (N6) and Theorem 2, (5.3) implies for every fixed α and for T → ∞ sup ht → 0.

(5.4)

1≤ t≤ T

As a consequence, we can assume without loss of generality that | ht | ≤ ²f , and therefore apply √ assumptions (N1) and (N2). According to Theorem 2, we can consider β = β0 + α/ T . THEOREM 4. For linear regression functions under assumptions (C1)-(C3), (C5), (C6’), and √ (N1)-(N6), the minimizing value α ˆ T of AT (β(α)), where α = T (β − β0 ), converges in distribution to a normal distribution with mean zero and covariance matrix ¸ · 1 0 −1 V lim XT ΩT XT V −1 . T →∞ T 12

PROOF. The proof of asymptotic normality is based on three preliminary Lemmas (given in Appendix N). In Lemma 1N it is shown that E[BT (β(α))] converges to 11 0 0 1 α XT ΦT XT α = α0 V α. T →∞ 2 T 2 lim

Lemma 2N establishes lim V ar[BT (β(α))] = 0.

T →∞

In both cases the convergence is uniform for α element of a compact set. Finally, in Lemma 3N it is shown that CT (β(α)) converges in distribution to Cα for all α, where the 1 × K random vector C is normally distributed with mean zero and covariance matrix 1 0 XT ΩT XT . T →∞ T lim

As a consequence, AT (β(α)) converges in distribution to 1 A(β(α)) = α0 V α + Cα, 2 with the minimizing value α ˆ = −V −1 C 0 which will be normal with mean zero and covariance matrix V −1 lim [T −1 XT0 ΩT XT ] V −1 . For the convergence in distribution of the minimizing value T →∞

α ˆ T to α ˆ , it is sufficient that the function AT (β(α)) is convex and converges uniformly for α ∈ C, where C is any compact subset of RK . That the former requirement is fulfilled has been shown by Pollard (1991) and Geyer (1996), the latter has been shown in Lemmas 1N-3N. Of course this convexity argument can only be applied in the linear case. ¥ Interestingly, for linear regression functions, we do not need a Taylor approximation in the proof. The analogy to the corresponding covariance matrix of least squares for nonspherical disturbances is obvious (see the last line in Table 1).

5.2

Nonlinear regression functions

In the nonlinear case (2.1), in addition to assumptions (N1)-(N5) we have to assume: (N6’) sup1≤ t≤ T T −r || zt || = o(1), for an r < 1/2, where the assumptions of Theorem 2 are given P and T −1 Tt=1 || zt ||4 = O(1). THEOREM 5. The assertion of Theorem 4 holds for nonlinear regression functions under assumptions (C1)-(C3), (C5), (C6’), (C7)-(C9), (N1)-(N5), and (N6’). PROOF. From Theorem 2 follows plim T r ||βˆT − β0 || = 0. Then we obtain | ht | ≤ T −r || zt || · T r || β − β0 || + T r (β − β0 )0 T −2r Wt (βt∗ )T r (β − β0 ), and, due to assumptions (N6’) and (C7), for T → ∞, similarly to (5.4), sup1≤ t≤ T ht → 0. As a consequence, we can assume without loss of generality that | ht | ≤ ²f , and therefore apply assumptions (N1) and (N2). 13

√ According to Theorem 3 we can consider β = β0 + α/ T . As a first step we show that the three preliminary Lemmas 1N-3N in Appendix N remain valid in the case of a nonlinear regression function, if we replace assumption (N6) by assumption (N6’), and add (C7). Lemma 1N remains valid, if P (i) Tt=1 h2t = O(1), and P P (ii) limT →∞ Tt=1 h2t ft (0) = limT →∞ T −1 Tt=1 (zt α)2 ft (0) = limT →∞ α0 T −1 ZT0 ΦT ZT α = α0 V α, hold true. From (5.2) and assumptions (C7) and (C8) follows (i). Making additional use of assumption (N5), analogous reasoning establishes (ii). According to assumptions (N6’) and (N2) the assertion of Lemma 2N remains true. Finally, Lemma 3N also remains valid if lim

T →∞

T X

1 0 0 α ZT ΩT ZT α = α0 Σα. T →∞ T

hs ωst ht = lim

s, t=1

√ Due to ht = zt α/ T + α0 Wt (βt∗ )α/T we have v v u u T T −1 T T u 1 X X u1 X ¡ 0 ¢ 1 X ∗ t 2t ω (z α)ω α W (β )α ≤ 2 (z α) (α0 Wt (βt∗ )α)2 k s st t t t 2 T 3/2 s, t T T t=1 t=1 k=0

(5.5)

(5.6)

and T T −1 T X ¢ ¡ 0 ¢ 1 X 0 1 X¡ 0 ∗ ∗ ωk 2 α Ws (βs )α ωst α Wt (βt )α ≤ 2 (α Wt (βt∗ )α)2 , T 2 s, t T t=1 k=0

(5.7)

where the right hand sides of (5.6) and (5.7) tend to zero for T → ∞, respectively, according to assumptions (N4), (C7), and (C8). Then, by virtue of assumption (N3), we have established equation (5.5). Due to the fact that in the nonlinear case the loss function AT (α) is not convex in general, the proof of Theorem 4 can not be extended in such an easy way. In Lemma 1N in Appendix N we consider the matrix T −1 ZT0 ΦT ZT with the limit V . According to assumption (N5), the matrix V is nonsingular and we define def

α ¯ T = −V −1 ΓT , where ΓT is implicitly used in Lemma 3N, since ¸ T T · X X 1 0 1 def 0 ∗ √ zt + Wt (βt )α γt = α0 ΓT . CT (β(α)) = h t γt = α T T t=1 t=1 By virtue of Lemma 3N and assumptions (C7) and (C8), ΓT converges in distribution to C, and C is normally distributed. Therefore, in order to prove Theorem 4 for the nonlinear case, we have to show plim(ˆ αT − α ¯ T ) = 0. Due to Lemmas 1N and 2N, the loss function can be written as 1 AT (β(α)) = BT (β(α)) + CT (β(α)) = lim E[BT (β(α))] + α0 ΓT + RT (α) = α0 V α + α0 ΓT + RT (α), T →∞ 2 where plimT →∞ RT (α) = 0, for α ∈ C, where C is any compact set. 14

Due to the definition of α ¯ T , we obtain 1 AT (β(α)) − AT (β(¯ αT )) = (α − α ¯ T )0 V (α − α ¯ T ) + RT (α) − RT (¯ αT ). 2

(5.8)

Finally, due to (5.8) and the positive definiteness of V , for α = α ˆ T (the minimizing value of AT (β(α))), for every ² > 0, η > 0 there exists a T0 such that for T > T0 P [(ˆ αT − α ¯ T )0 (ˆ αT − α ¯ T ) > ²] ≤ η.

(5.9)

This implies plim(ˆ αT − α ¯ T ) = 0, and we have shown the assertion. ¥

6

Conclusions

We establish consistency (Theorem 1) and asymptotic normality for linear (Theorem 4) and nonlinear regression (Theorem 5) quantiles under weak dependence, using a set of rather weak assumptions. The results suggest, that there are even weaker, quantile specific conditions for modeling weak dependence structures. In fact, it can be shown that these conditions are a special case of Doukhan’s and Louhichi’s (1999) notion of weak dependence, where the latter notion is implied by near epoch dependence as has been shown recently by Nze and Doukhan (2004). These issues remain to be investigated in detail, especially in the light of possible relevant econometric and statistical applications. In addition, we provide a detailed analysis of the connection between the rate of convergence of the quantile regression estimator and the degree of dependence measured by the size of the mixing coefficient (Theorem 2). As a further result (Theorem 3) of this analysis, we have extended existing results on probability inequalities of Ivanov (1976, i.i.d. errors), Prakasa Rao (1984, mixing errors), and Hu (2004, general forms of dependence) for nonlinear least squares estimators, to the case of nonlinear regression estimators based on the L1 -norm and weakly dependent errors.

Appendix C: Proofs of Lemmas 1C-3C LEMMA 1C. Under assumptions (C1) and (C4), for every ² > 0, · E

1 T

T X t=1

 f0 δ02 for β ∈ B>, T (²), ¸   Ã !2 T bt (β) ≥ 1 X  for β ∈ B≤, T (²), | ht |  f0 4T t=1

where ( ¯ ) T ¯ 1 X B>, T (²) = β ¯¯ | ht | > δ0 , ||β − β0 || ≥ ² , 4T t=1

15

( ¯ ) T ¯ 1 X B≤, T (²) = β ¯¯ | ht | ≤ δ0 , ||β − β0 || ≥ ² . 4T t=1

PROOF. Due to assumption (C4) and taking into account the monotonicity of Ft (x), for all t and all δ ≤ δ0 , ( £ ¤ f0 | x| for | x| ≤ δ, min Ft (| x|) − Ft (0), Ft (0) − Ft (−| x|) ≥ f0 δ for | x| > δ. From the definition of bt (β) follows ( R ht £ ¤ (h − z)dFt (z) for ht > 0, 0 R0 t E bt (β) = (z − ht )dFt (z) for ht ≤ 0. ht

(C.1)

By limiting the integration domain in (C.1) to [0, ht /2] and [ht /2, 0], respectively, we obtain ¯ ¯ · X ¸ µ ¶2 T T T 1 f0 X ¯¯ ht ¯¯ f0 X ht E + bt (β) ≥ ¯ 2 ¯δ T t=1 T 2 T t, |ht |≤ 2δ t, |ht |>2δ à ! T T f0 δ X f0 δ 1 X ≥ |ht | − 2δ . |ht | ≥ 2T 2 T t=1 t, |ht |>2δ

The assertion follows from setting δ = δ0

for β ∈ B>, T (²),

T 1 X | ht | for β ∈ B≤, T (²) and δ = 4T t=1

for T large enough. ¥ LEMMA 2C. Under assumptions (C1) and (C6), ! Ã T T −1 T 8X 1X 2 1X bt (β) ≤ αk h, V ar T t=1 T k=0 T t=1 t

(C.3)

where α0 = 1. PROOF. Due to Lemma 3 of Doukhan (1994) and from the definition of bt (β), we get for s > t | Cov (bs (β), bt (β))| ≤ 4| hs || ht |αs, t

and V ar (bt (β)) ≤ h2t ,

where αs, t is defined in assumption (C6). Hence, according to assumption (C6) " T −1 # Ã ! T T T 1 X 8X 1 1X 2 1X 1 X 2 h + | hs || ht |α|s−t| ≤ V ar bt (β) ≤ 2 αk + h, T t=1 T t=1 t T 2 s6=t T k=1 T T t=1 t which proves the assertion. ¥ LEMMA 3C. Under assumptions (C1) and (C6), Ã ! T T −1 T 1X 8X 1X 2 V ar ct (β) ≤ αk h, T t=1 T k=0 T t=1 t where α0 = 1. 16

(C.4)

PROOF. For s 6= t, the covariance between cs (β) = hs γs and ct (β) = ht γt is given by £ hs ht ϑ2 P (us > 0, ut > 0) + (1 − ϑ)2 P (us ≤ 0, ut ≤ 0) − ϑ(1 − ϑ)P (us > 0, ut ≤ 0) − ϑ(1 − ϑ)P (us ≤ ¤ 0, ut > 0) = hs ht {ϑ2 [1−Fs, t (∞, 0)−Fs, t (0, ∞)+Fs, t (0, 0)]+(1−ϑ)2 Fs, t (0, 0)−ϑ(1−ϑ)[Fs, t (∞, 0)− Fs, t (0, 0) + Fs, t (0, ∞) − Fs, t (0, 0)]}. Thus, Cov[hs γs , ht γt ] = hs ht [Fs, t (0, 0) − ϑ2 .] For 1 ≤ s < t we define ωs, t = Fs, t (0, 0) − ϑ2 . Then, we have |ωs, t | ≤ αs, t , and analogously to the proof of Lemma 2C, and, according to the definition of ct (β), " # T T −1 T T −1 T T X 1 2X 1X 2 8X 1X 2 1 X 2 2 V ar (C.5) ct (β) ≤ 2 h (ϑ − ϑ ) + αk h ≤ αk h. T t=1 T t=1 t T k=1 T t=1 t T k=0 T t=1 t ¥

Appendix N: Proofs of Lemmas 1N-3N LEMMA 1N. For linear regression functions under assumptions (N1), (N5), and (C2), for β = √ β0 + α/ T , " T # · ¸ X 1 0 1 0 1 lim E bt (β(α)) = α lim ZT ΦT ZT α = α0 V α. T →∞ T →∞ T 2 2 t=1 The convergence is uniform for α ∈ C ⊂ RK , where C is any compact set. PROOF. In the following, according to (5.4), we assume | ht | ≤ ²f . Then, by the definitions of ht and bt (α), under assumption (N1) we get ( R ht (ht − z)ft (z)dz, if ht > 0, £ ¤ 0 E bt (β(α)) = R0 (z − ht )ft (z)dz, if ht ≤ 0, ht and for ht > 0, £ ¤ 1 1 2 ht inf ft (z) ≤ E bt (β(α)) ≤ h2t sup ft (z). 2 0≤ z≤ ht 2 0≤ z≤ ht The argumentation is analogous for the case ht < 0 and is left to the reader. From assumption P P ¡ ¢2 (C2) follows Tt=1 h2t = T −1 Tt=1 xt α = O(1). Then, due to (5.5) and assumption (N1), ·X ¸ T T 1X 2 ht ft (0) (N.1) lim E bt (β(α)) = lim T →∞ T →∞ 2 t=1 t=1 uniformly in α ∈ C. Thus, due to assumption (N5), and the definition of ht , from (N.1) follows the proof of the assertion. ¥ LEMMA 2N. For linear regression functions under assumptions (N1), (N2) and (N6), for β = √ β0 + α/ T , T hX i lim V ar bt (β(α)) = 0. T →∞

t=1

The convergence is uniform for α ∈ C ⊂ RK , where C is any compact set. 17

PROOF. In the following, according to (5.4), we assume | ht | < ²f . By definition " T # T X X V ar bt (β(α)) = {E[bs (β(α))bt (β(α))] − E[bs (β(α))]E[bt (β(α))]}. t=1

s,t=1

Then, for s 6= t, hs > 0, ht > 0, and by the definition of bt (β) Z Z ¡ ¢ Cov bs (β(α)), bt (β(α)) = (hs − z)(ht − w)[fs, t (z, w) − fs (z)ft (w)]dzdw 0≤ z≤ hs 0≤ w≤ ht



1 2 2 h h sup | fs, t (z, w) − fs (z)ft (w)|. 4 s t 0≤ z≤ hs 0≤ w≤ ht

By analogous argumentation for the remaining cases (hs > 0, ht < 0 etc.), we finally get ¯ ¯ ¯ ¯ T T ¯X ¯ 1X ¯ ¯ Cov[bs (β(α)), bt (β(α))]¯ ≤ h2 h2 sup | fs, t (z, w) − fs (z)ft (w)|, ¯ ¯ s,t=1 ¯ 4 s,t=1 s t | z|≤| hs | ¯ s6=t ¯ | w|≤| ht | s6=t

(N.2)

where the expression on the right hand side of (N.2) is bounded from above by T −1

1X sup 2 k=1 t, t+k∈N

sup

| ft, t+k (z, w) − ft (z)ft+k (w)|

| z|≤| ht | | w|≤| ht+k |

T −k X

h4s .

(N.3)

s=1

Thus, according to (N2) we have T −1 T −k X ¯ £ ¤¯ 1X ¯Cov bs (β(α)), bt (β(α)) ¯ ≤ 1 α0 (k|u) (xt α)4 . T k=1 T t=1

(N.4)

Due to (N2) and (N6) the left hand side of (N.2) tends to zero for T to infinity. Analogously, for s = t we get " # T T T X X 1 1X 1 3 V ar[bt (β(α))] ≤ | ht | sup | ft (z)| ≤ | xt α|3 √ sup |fs (0)| 3 T t=1 T 1≤ s≤ T | z|≤| ht | t=1 t=1

(N.5)

and due to (N1) and (N6) the left hand side of (N.5) tends to zero for T to infinity. Thus, the assertion follows. ¥ LEMMA 3N. For linear regression functions under assumptions (N3) and (N4) and for β = β0 + √ P α/ T , CT (β(α)) = ct (β(α)) converges for T → ∞ in distribution to Cα, where the 1 × K random vector C is normally distributed with mean zero and covariance matrix limT →∞ T −1 ZT0 ΩT ZT . PROOF. Obviously, from the definitions of ct (β) = ht γt , we have E(ht γt ) = 0 and V ar(ht γt ) = ϑ(1 − ϑ)h2t . In addition according to the considerations in the proof of Lemma 3C follows for s 6= t ¤ £ Cov[hs γs , ht γt ] = hs ht Fs, t (0, 0) − ϑ2 = hs ht ωs,t . 18

Thus, V ar[CT (β(α))] =

1 0 0 α ZT ΩT ZT α. T

Then, due to assumptions (N3) and (N4), {γt } is α-mixing of size −q, where q > 2, and the proof of the assertion follows from the CLT given in P¨otscher and Prucha (1997, Theorem 10.2) and upon application of the Cram´er-Wold device. ¥

Literatur [1] Buchinsky, M. (1998) Recent advances in quantile regression. Journal of Human Resources, 33, 88-126. [2] Cai, Z. (2002) Regression quantiles for time series. Econometric Theory, 18, 169-192. [3] Castellana, J.V., & M.R. Leadbetter (1986) On smoothed probability density estimation for stationary processes. Stochastic Processes and their Applications, 21, 179-193. [4] Chen, L.-A., L.T. Tran, & L.-C. Lin (2004) Symmetric regression quantile and its application to robust estimation for the nonlinear regression model. Journal of Statistical Planning and Inference, 126, 423-440. [5] Chernozhukov, V., & L. Umantsev (2001) Conditional value-at-risk: Aspects of modeling and estimation. Empirical Economics, 26, 271-292. [6] De Gooijer, J.G., & D. Zerom (2003) On additive conditional quantiles with high-dimensional covariates. Journal of the American Statistical Association, 98, 135-146. [7] Doukhan, P. (1994): Mixing. Springer Verlag, New York. [8] Doukhan, P., & S. Louhichi (1999) A new weak dependence condition and applications to moment inequalities. Stochastic Processes and their Applications, 84, 313-342. [9] Engle, R.F., & S. Manganelli (2004) CAViaR: Conditional autoregressive Value at Risk by regression quantiles. Journal of Business and Economics Statistics, 22, 367-381. [10] Geyer, C.J. (1996) On the asymptotics of convex stochastic optimization. unpublished manuscript. [11] Hu, S.H. (2004) Consistency for the least squares estimator in nonlinear regression model. Statistics and Probability Letters, 67, 183-192. [12] Ioannides, D.A. (2004) Fixed design regression quantiles for time series. Statistics and Probability Letters, 68, 235-245. [13] Ivanov, A.V. (1976) An asymptotic expansion for the distribution of the least squares estimator of the non-linear regression parameter. Theory of Probability and its Applications, 21, 557-570. 19

[14] Jureckova, J., & B. Prochazka (1994) Regression quantiles and trimmed least squares estimators in nonlinear regression models. Journal of Nonparametric Statistics, 3, 201-222. [15] Knight, K. (1998) Limiting distributions for L1 regression estimators under general conditions. Annals of Statistics, 26, 755-770. [16] Knight, K. (1999) Asympotics for L1 of regression parameters under heteroscedasticity. Canadian Journal of Statistics, 27, 497-507. [17] Koenker, R. (2005) Quantile regression. Econometric Society Monographs No. 38. Cambridge University Press. [18] Koenker, R., & G. Bassett (1978) Regression quantiles. Econometrica, 46, 33-50. [19] Koenker, R., & G. Bassett (1982) Robust tests for heteroscedasticity based on regression quantiles. Econometrica, 50, 43-61. [20] Koenker, R., & K.F. Hallock (2001) Quantile regression. Journal of Economic Perspectives, 15, 143-156. [21] Koenker, R., & B. Park (1994) An interior point algorithm for nonlinear quantile regression. Journal of Econometrics, 71, 265-283. [22] Mukherjee, K. (1999) Asymptotics of quantiles and rank scores in nonlinear time series. Journal of Time Series Analysis, 20, 173-192. [23] Mukherjee, K. (2000) Linearization of randomly weighted empiricals under long range dependence with applications to nonlinear regression quantiles. Econometric Theory, 16, 301-323. [24] Nze, P.A., & P. Doukhan (2004) Weak dependence: models and applications to econometrics. Econometric Theory, 20, 169-192. [25] Oberhofer, W. (1982) The consistency of nonlinear regression minimizing the L1 norm. Annals of Statistics, 10, 316-319. [26] Oberhofer, W., & H. Haupt (2005) The asymptotic distribution of the unconditional quantile estimator under dependence. Statistics and Probability Letters, 73, 243-250. [27] Pfanzagl, J. (1969) On the measurability and consistency of minimum contrast estimates. Metrika, 14, 249-272. [28] Phillips, P.C.B. (1991) A shortcut to LAD estimator asymptotics. Econometric Theory, 7, 450-463. [29] P¨otscher, B.M., & I.R. Prucha (1997) Dynamic nonlinear econometric models: asymptotic theory. Springer-Verlag.

20

[30] Pollard, D. (1991) Asymptotics for least absolute deviation regression estimators. Econometric Theory, 7, 186-199. [31] Portnoy, S. (1991) Asymptotic behavior of regression quantiles in non-stationary, dependent cases. Journal of Multivariate Analysis, 38, 100-113. [32] Prakasa Rao, B.L.S. (1984) The rate of convergence of the least squares estimator in a nonlinear regression model with dependent errors. Journal of Multivariate Analysis, 14, 315-322. [33] Rogers, A.J. (2001) Least absolute deviations regression under nonstandard conditions. Econometric Theory, 17, 820-852. [34] Wang, J. (1995) Asymptotic normality of L1 -estimators in nonlinear regression. Journal of Multivariate Analysis, 54, 227-238. [35] Weiss, A.A. (1991) Estimating nonlinear dynamic models using least absolute error estimation. Econometric Theory, 7, 46-68. [36] Yu, K., Z. Lu, & J. Stander (2003) Quantile regression: applications and current research areas. The Statistician, 52, 331-350. [37] Zhao, Q. (2001) Asymptotic efficient median regression in the presence of heteroscedasticity of unknown form. Econometric Theory, 17, 765-784.

21