Quantile Regression on Quantile Ranges

11 downloads 0 Views 392KB Size Report
1, Sec. 4, Roosevelt Road, Taipei. 106, Taiwan. ...... Corollary 4.2 Under assumptions A[1]-A[6] and under the null hypothesis, for some fixed τ ∈ T and unknown ...
Quantile Regression on Quantile Ranges Chung-Ming Kuan, Christos Michalopoulos and Zhijie Xiao (First version: December 2010) This version: February 12, 2014

Abstract Motivated by empirical evidence that a linear specification in a quantile regression setting is unable to describe the non-linear relations among economic variables, we formulate a general threshold quantile regression model and derive the asymptotic properties of the model parameters assuming heteroskedastic and autocorrelated errors. We derive the limiting distribution of the threshold value under the asymptotic framework of fixed and asymptotically shrinking change-point. We construct confidence intervals for the estimated threshold value via a likelihood-ratio-type statistic and investigate via simulation the coverage probability of the confidence intervals for different quantiles. We develop inferential procedures to identify heterogeneous effects of different covariate quantile ranges on different quantiles of the response and show that the proposed sup-Wald statistic converges to a two-parameter Gaussian process that generalizes that of Galvao et al. (2011). Monte Carlo simulations provide evidence on size and power. Our asymptotic results complement those found in the existing literature on threshold regression models.

∗ Author for correspondence: Christos Michalopoulos, Economics Dept. Soochow University, 56, KueiYang St., Sec. 1, Taipei, Taiwan. Email: [email protected]. ∗∗ Chung-Ming Kuan, Finance Dept. National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 106, Taiwan. Email: [email protected]. ∗∗∗ Zhijie Xiao, Economics Dept. Boston College, Chestnut Hill, MA, 02467, USA. Email: zhi [email protected]. This paper has benefitted from comments received in SETA 2010 conference in Singapore and in the 2013 International Conference of Financial Econometrics at Shandong University. Michalopoulos aknowledges the financial help received from the National Science Council of Taiwan: NSC 100-2420-H-002-013-DR.

1

Introduction

Quantile regression, as introduced in Koenker and Bassett (1978), is a robust to outliers and flexible to error distribution alternative to the least squares regression, capable of exploring the whole conditional distribution of the response and not just its mean. The considerable interest it has generated in applied and theoretical statistics and econometrics is reflected in the monograph of Koenker (2005). While in its simplest and most widely used setting, a linear model is specified, increasing empirical research has shown that such a specification is unable to describe the intricate and often non-linear relations among economic variables. Economic factors such as technical changes, policy shocks and other unforeseen events in the economic environment often have a non-linear effect on the relationship between economic variables. A flexible modeling approach to capture such non-linear effects in the data without assuming a specific non-linear functional form for the covariates, is the so-called “threshold model” with known or unknown, one or multiple threshold variables. Such models basically split the sample into different “classes” or “regimes” according to the magnitude of the threshold variable, hence admitting a threshold-type non-linear effect on the conditional distribution of the response. Threshold regression models have witnessed an increasing interest in applied and theoretical statistics and econometrics research, with a wide array of applications; see Tong (1983, 1990). In the least squares context, threshold models have been used in many fields. In economics, they have been utilized to explain the cross-country behavior of GDP growth, Durlauf and Johnson (1995)1 and Hansen (2000); modeling the different regimes of GNP and unemployment, Potter (1995), Chan and Tsay (1998) and Gonzalo and Wolf (2005), investigating the non-linear adjustement in the deviation of exchange rates from their equilibrium, Kilian and Taylor (2003) and searching for threshold effects in the relationship between growth and inflation, Khan and Senhadji (2001). In finance, self-exciting threshold autoregressive models have been used to check for mean reversion in interest rates, Pfann et al. (1996) and Gospodinov (2005), and classify stock market regimes as in Chen et al. (2009). Further theoretical results and applications in linear least squares with threshold effects can be found in Chan (1993), Chan and Tsay (1998), Hansen (1996, 2000), Caner and Hansen (2001) and Seo and Linton (2007). It is useful and natural to consider threshold effects on conditional quantile func1

The authors use regression tree analysis, but as Hansen (2000) notes, this is another form of a threshold regression model.

1

tions rather that conditional mean functions in order to capture non-linear effects of different quantiles of a covariate on different quantiles of the response. Recently, it has been noticed that different ranges of some covariate can have a different impact on the conditional distribution of the response. Koenker and Machado (1999) found heterogeneous effects of public consumption and terms of trade, on different quantiles of the GDP growth rate for a panel of countries. Kuan and Chen (2009) studied the effects of National Health Insurance (NHI) on precautionary saving in Taiwan and showed that different quantile ranges of the income and age covariates, affect differently the conditional quantiles of savings. Also Koenker (2005) analyzing Engel’s data, showed that different quantiles of household income, affect the shape and location of the conditional quantile distribution of food expenditure in a different way. We therefore aim in this paper to move beyond the linear least squares framework and study a quantile regression threshold model called the quantile regression on quantile ranges (QRQR), with some known or unknown threshold value(s). The issue has not received much attention in the econometrics literature. To the best of our knowledge, only two papers have used the threshold regression principle in quantile regression, but they have done so in the dynamic setting of quantile autoregressive timeseries models assuming non-serially correlated errors. In particular, Cai and Stander (2008) propose a quantile self-exciting threshold autoregressive time-series model and adopt Bayesian inferencial procedures. Galvao et al. (2011), consider a threshold related to a particular quantile of the errors process; for some known function of covariates, they assume there is a threshold effect at some point on this function related to the error’s process. However, both papers do not consider inference regarding the unknown threshold value. In this paper, we formulate a more general quantile regression threshold model with the aim of estimating directly the effects of a specific quantile range of a covariate, on the same or different quantiles of the response distribution, when the errors are possibly serially correlated. By partitioning the state space into several subspaces dictated by appropriate threshold values, we model the unknown non-linearity of the underlying data process by a piecewise linear approximation, without assuming some complicated non-linear parametric form, therefore avoiding misspecification issues; see Angrist et al. (2006). More specifically, at first we consider the case of estimating one unknown threshold value on quantile functions. Our framework is slightly more general than Hansen (2000),

2

in that we consider covariates that do not exhibit threshold effects themselves, but are allowed to be affected by the covariate subject to regime-change. We give consistency and rates of convergence results for the model parameters and derive the Bahadur representation assuming serially correlated errors thus generalizing the results of Galvao et al. (2011). We derive the limiting distribution of the threshold quantile under two asymptotic frameworks: the first assumes a constant magnitude of the threshold effect, see Bai (1998); the second assumes an asymptotically diminishing threshold effect, see Picard (1985) and Hansen (2000). In the first case, the limiting distribution depends on the distribution of the regressors and errors, hence inference is prohibited. The second asymptotic framework enables us to circumvent this problem and allows us to obtain a limiting distribution invariant to the distribution of the regressors and errors. We then construct confidence intervals for the unknown threshold value using a likelihood-ratiotype statistic. Our simulation results suggest that the confdence regions signifcantly undercover as we move closer to the extreme quantiles. We then develop inferential procedures to identify heterogeneous effects of different covariate quantile ranges on different quantiles of the response and show that our proposed sup-Wald test converges to a two-parameter Gaussian process. The Monte Carlo simulation exercise uses a NeweyWest-type estimator to capture the effects of serially correlated errors and shows our test has decent empirical size and power. The paper is organized as follows. Section 2 introduces the model, its estimation and the asymptotic properties including the Bahadur representation. Section 3 derives the limiting distribution of the estimated partition quantile under shrinking magnitude of shifts, contains its confidence intervals construction and the simulation results. Inference is treated in section 4, where the sup-Wald test is introduced under known and unknown partition quantile and its limiting distribution is given together with the simulation exercise. Section 5, includes a summary and future research issues. All proofs are delegated to the Technical Appendix at the end of the paper. We also leave for the Appendix the proof of the limiting distribution of the estimated partition quantile under fised magnitude of shifts.

3

2

The QRQR Model: Estimation and Asymptotics

2.1

The model

Let {yi : i = 1, . . . , n} be the regressand and {zi : i = 1, . . . , n} a p × 1 vector of random variables. We consider a linear regression model |

yi = zi θ + i ,

(1)

for a sequence of random errors {i : i = 1, . . . , n}, not necessarilly i.i.d. We partition zi into two groups z1i and z2i , where z2i may contain a single continuous covariate xi that is subject to regime change. We assume that this regime change affects the covariates in z2i , but it does not affect the covariates in z1i . Thus we may rewrite (1) in the partitioned form: |

|

yi = z1i γ + z2i β + i .

(2)

We assume the τth conditional quantile function of yi is written as |

|

Q yi (τ|zi , xi ) = z1,i γ(τ) + z2i β(τ, xi ).

(3)

n o where Q yi (τ|zi , xi ) ≡ inf y : F y|z,x (yi |zi , xi ) ≥ τ , defines the quantile function of yi conditional on zi , xi and the parameter β(τ, xi ) reflects the possible regime-change behavior of xi in some quantile τ. For convenience of discussion, we consider case where regime shift occurs at the q∗ -th quantile of xi , i.e.    xi ≤ q∗  β(τ), β(τ, xi ) =  ,   β(τ) + δ(τ), xi > q∗ where xi ≤ F−1 (τx ) ≡ q∗ , assuming covariate xi is continuous. We call q∗ , the partition X quantile. Allowing both parameters γ and β to vary over quantiles τ ∈ [0, 1], we can rewrite (2) as |

|

|

|

Q yi (τ|zi ) = z1i γ(τ) + z2i β(τ) + zq∗ i δ(τ) = Zq∗ i α(τ),

(4)

where δ(τ) denotes the size of change in the slope coefficient of z2i due to the regime4

|

|

|

|

|

|

change effect, Zq∗ i = (z1i , z2i , zq0 i ), zq∗ i = z2i 1{xi ≤ q∗ } and α(τ)| = (γ(τ)| , β(τ)| , δ(τ)| ). The formulation of threshold regression models shares similarities with modeling structural breaks since in the later, the threshold variable is just the index i or t for cross-section and time series data respectively. But it is different to a change-point model in that regime changes are driven by some random variable which can be an exogenous and unobservable Markov chain in Markov-switching models, or another random variable for threshold models, thereby allowing for modeling a much richer behavior of the data. For structural breaks models in the linear least squares framework, see among others, Andrews (1993), Kuan and Hornik (1995), Bai (1996), Kuan and Hsu (1998), Bai and Perron (1998). For structural breaks models in the median and quantile regression framework see Bai (1995), Qu (2008) and Su and Xiao (2008, 2009). Note that our model nests (and generalizes) the threshold quantile autoregressive model of Galvao et al. (2011) if we write the conditional quantile function of a pth -order autoregressive specification as |

|

|

Q yi (τ|Fi−1 ) = z1,i γ(τ) + z2,i β(τ) + zq∗ ,i δ(τ),   | | | where z1,i = 1, yi−1 , · · · , yi−(p−1) , z2,i = yi−p and ; zq,i = yi−p 1{yi−p ≤ q∗ } with Fi−1 the sigma-algebra generated by lagged values of yi , i ∈ Z.

2.2

Estimation

In this section, we focus our interest in estimating equation (4). When the partition quantile of interest q∗ is known, we simply run quantile regressions using the specification of equation (4), hence obtaining estimates of the parameters γ(τ), β(τ) and δ(τ). It is more realistic to assume an unknown partition quantile q∗ , and proceed to estimate it along with the other model parameters. We first define the sum of asymmetrically weighted deviations as n   X   | Sn τ, q, α = ρτ yi − Zq,i α , i=1

for ρτ (·) the asymmetric loss function (check function) defined as ρτ (u) = u(τ − 1{u < 0}). Given q ∈ Q = [qL , qU ], a compact set for lower and upper quantiles qL and qU

5

respectively, we obtain the estimates of our parameters γ(τ), β(τ) and δ(τ) as   ˆ q)| )| = arg min Sn τ, q, α , ˆ q)| , δ(τ, ˆ q) = (γ(τ, ˆ q)| , β(τ, α(τ, α

for all q ∈ Q. We then estimate the unknown partition quantile q∗ ,   ˆ q) , qˆ = arg min Sn τ, q, α(τ, q∈Q

ˆ qˆ ). therefore obtaining the estimated model parameters by αˆ ∗ (τ, q) = α(τ, Some comments are required here. We have assumed that only covariate xi exhibits regime change behavior, therefore we are searching for the partition quantile among the x(k) s, the ordered statistics of xi , k = 1, . . . , n. Since there are n observations, computing the partition quantile requires less than n function evaluations. To control the tails, we consider the regression quantile process on a sub-interval of [0, 1], Q = {q : qL ≤ q ≤ qU } x −1 x where qL = F−1 x (τL ) and qU = Fx (τU ); for more details, see Su and Xiao (2008).

A safe but conservative choice is π = 0.15; see Andrews (1993) and Franses and van Dijk (2000). Another well-known issue in quantile regression is that of quantile crossings since the quantile representation of equation (3) is not guaranteed to be monotonic. We do not investigate this problem here but we note that Galvao et al. (2011) discuss this issue and in a simulation exercise they show that the monotonicity requirement was violated in about 5% of the cases examined. We have simulated the performance of the above estimation procedure and the results are displayed in the tables and figures that follow. The tables display statistics for the empirical distribution of the estimated partition quantile for different specification of the error distribution, e ∼ N(0, 1), e ∼ t3 and e ∼ 0.7 × N(0, 3) + 0.3 × N(1, 2), different magnitude of the regime-change, i.e. δ(τ) = 0.5, 2, and different values of the sample size n = 200, 500. The threshold has been set to occur at q∗ = 0.

6

Table 1: Quantiles of qˆ distribution: i ∼ N(0, 1). n=200 n=500 δ = 0.5, q0 = 0 Quantiles τ=0.10 τ=0.25 τ=0.50 τ=0.75 τ=0.90

qˆ 5% -1.504 -1.064 -1.313 -1.217 -1.636

δ = 2, q0 = 0 Quantiles τ=0.10 τ=0.25 τ=0.50 τ=0.75 τ=0.90



50% -0.021 -0.178 -0.063 -0.023 -0.046

95% 1.331 1.289 1.048 1.270 1.382

5% -0.598 -0.301 -0.648 -0.214 -0.526

qˆ 5% -0.074 -0.019 -0.119 -0.027 -0.066

50% 0.017 0.002 0.003 -0.006 0.002

95% 0.707 0.127 0.669 0.160 0.547



50% -0.048 0.014 -0.016 -0.024 -0.007

95% -0.016 0.020 0.067 0.044 0.076

5% -0.051 -0.048 -0.038 -0.036 -0.012

50% 0.001 -0.025 -0.034 0.001 0.000

95% 0.014 0.001 -0.025 0.018 0.001

Table 2: Quantiles of qˆ distribution: i ∼ t3 . n=200 n=500 δ = 0.5, q0 = 0 Quantiles τ=0.10 τ=0.25 τ=0.50 τ=0.75 τ=0.90

qˆ 5% -1.663 -1.642 -1.396 -1.555 -1.611

δ = 2, q0 = 0 Quantiles τ=0.10 τ=0.25 τ=0.50 τ=0.75 τ=0.90



50% 0.056 -0.096 -0.007 -0.037 -0.051

95% 1.704 1.437 1.264 1.399 1.661

5% -1.410 -0.369 -1.038 -0.585 -1.342

qˆ 5% -0.334 -0.026 -0.049 -0.059 -0.095

50% 0.008 -0.007 0.006 -0.051 0.006

95% 1.317 0.633 0.673 0.676 1.208



50% -0.030 -0.011 0.017 -0.005 0.071

95% 0.051 0.022 0.223 0.075 0.288

7

5% -0.017 -0.008 -0.053 -0.019 -0.042

50% 0.007 0.002 0.012 0.003 0.001

95% 0.064 0.071 0.098 0.010 0.053

80% 60% 0% 10 20%

40%

Relative Frequency

60% 40% 0% 10 20%

Relative Frequency

80%

100%

^ Empirical Distribution Function of q

100%

^ Empirical Distribution Function of q

−2

−1

0 Threshold n=200, N(1,2)

1

2

−2

1

2

200 150 Relative Frequency

n=500, e~t_3

0

50

100

150 100 0

50

−1

0

1

2

−2

−1

0

1

^ Empirical Distribution Function of q

^ Empirical Distribution Function of q

2

80% 60% 0% 10 20%

0% 10 20%

40%

60%

Relative Frequency

80%

100%

^ q

100%

^ q

40%

Relative Frequency

n=200, e~t_3

−2

Relative Frequency

0 Threshold n=500, N(1,2)

^ Empirical Distribution Function of q

200

^ Empirical Distribution Function of q

−1

−2

−1

0 Threshold n=200,e~Mix.Norm.

1

2

−2

−1

0 Threshold n=500,e~Mix.Norm.

1

2

Figure 1: EDF of partition quantile q, e ∼ N(1, 2), t3 , 0.7 × N(0, 3) + 0.3 × N(1, 2), δ = 2, n=200, 500.

8

Table 3: Quantiles of qˆ distribution: i ∼ 0.7 × N(0, 3) + 0.3 × N(1, 2). n=200 n=500 δ = 0.5, q0 = 0 Quantiles τ=0.10 τ=0.25 τ=0.50 τ=0.75 τ=0.90

qˆ 5% -1.576 -1.709 -1.466 -1.410 -1.675

δ = 2, q0 = 0 Quantiles τ=0.10 τ=0.25 τ=0.50 τ=0.75 τ=0.90

2.3



50% -0.180 -0.089 -0.113 -0.066 0.016

95% 1.529 1.248 1.552 1.412 1.719

5% -1.299 -1.138 -1.437 -1.043 -1.345

qˆ 5% -0.348 -0.249 -0.180 -0.201 -0.189

50% 0.001 -0.007 0.029 -0.019 0.001

95% 1.491 1.180 1.040 0.688 1.409



50% -0.035 -0.015 -0.002 -0.062 0.023

95% 0.368 0.0236 0.287 0.081 0.208

5% -0.042 -0.077 -0.056 -0.021 -0.106

50% 0.003 -0.009 0.003 -0.007 0.003

95% 0.065 0.042 0.042 0.030 0.032

Asymptotic Properties

In this section provide the asymptotic analysis for the estimated model parameters under known and unknown partition quantile q, and for the estimated partition quantile q itself. Details and proofs are delegated to the Technical Appendix. In the remaining, we denote by k · k the classical Euclidean norm, by

P

weak convergence and by → convergence in

probability with respect to the outer probability; see van der Vaart and Wellner (1996). Also q∗ denotes the “true” partition quantile. We need the following assumptions: [A.1] (i) The conditional on the regressors distribution functions of the Fi of yi have continuous Lebesgue densities fi uniformly bounded away from 0 and ∞ at points Fi −1 (τ), uniformly in τ ∈ T , i.e. 0 < fi (F−1 (τ)) < ∞, for τ ∈ T . i (ii) We assume {yi , zi , xi }ni=1 form a β-mixing sequence of vectors with mixing coefficients βm satisfying mp/(p−2) (log m)2(p−1)/(p−2) βm → 0 as m → ∞.

9

[A.2] For any δ > 0, there exists some σ(δ) > 0 such that −1 −1 sup fi (Fi (τ) + c) − fi (Fi (τ)) < δ, τ∈T

for all |c| < σ(δ) and all 1 ≤ i ≤ n. n  o ˆ q) = arg minα∈A E ρτ yi − Z|qi α(τ) exists and is [A.3] For all τ ∈ T , q ∈ Q, α(τ, unique. Furthermore, α(τ) is in the interior of the parameter space A, with A compact and convex. [A.4] kzi k2+ < ∞ and max1≤i≤n kzi k = oP (n1/2 ). |

|

[A.5] E(zi zi ) > E(Zqi Zqi ) for all q ∈ Q. Assumption [A.1] is standard in quantile regression literature. The assumption about the errors is the same as in Arcones and Yu (1994) and is required for the empirical process to satisfy a Functional Central Limit Theorem2 . Assumption [A.2] is borrowed from the structural breaks literature and imposes smoothness of the conditional densities in some neighborhood of F−1 (τ), uniformly in i = 1, · · · , n; see Su and Xiao (2008). i Assumption [A.3] is needed for identification while assumption [A.4] is used to verify that our regressors satisfy the Lindeberg condition for a central limit theorem; see Qu (2008). Finally, assumption [A.5] excludes the possibility the true partition quantile q0 is on the boundary of the support of zi . The next Theorem, is Lemma 2 of Galvao et al. (2011) adapted to our model. Theorem 2.1 (Consistency) Given assumptions [A.1]-[A.5], and for some fixed τ ∈ T , P

P

ˆ qˆ ) → α0 (τ, q∗ ) and qˆ → q∗ , α(τ, and



ˆ qˆ ) − α0 (τ, q∗ )

= OP (1) and n|ˆq − q∗ | = OP (1). n

α(τ, We can see that the estimated parameter of the partition quantile is super-consistent, due to the discontinuous threshold effect specification of our model. The fast rate of convergence implies that in the estimation procedure described before, we can assume the partition quantile q as “known” and proceed to the estimation of the remaining parameters without worrying about threshold estimation effects on the slope coefficients. 2

See Theorem 2.1 of Arcones and Yu (1994). The concept of β-mixing is defined there as well.

10

For a known partition quantile q ∈ Q, it is straightforward to deduce the asymptotic normality of our estimated parameters. We need some further definitions and assumptions. Define Hn (τ, q) = n

−1

n X

|

|

fi (Zq∗ ,i α0 (τ))Zq,i Zq,i

i=1

and Jn (τ, q) = n

−1/2

n X

 |  | ψτ yi − Zq∗ i α0 (τ) Zq,i ,

i=1

where α0 (τ) = (γ0 (τ)| , β0 (τ)| , δ0 (τ)| )| , with δ0 (τ) = 0 under no regime-change. Also, |

|

define ψτ (yi − xi α) = τ − 1{yi − xi α < 0}, the influence function in quantile regression. [A.6] We assume that Pn | −1 ∗ ∗ (i) sup n i=1 Zq∗ i Zq i − Q0 (q ) = oP (1), for some finite and symmetric, positive defiq∈Q

|

nite matrix Q0 (q∗ ) = E(Zqi Zqi ). (ii) sup Hn (τ, q) − H0 (τ, q) = oP (1), for some finite, symmetric and positive definite τ×q∈T ×Q   |  |  matrix H0 (τ, q) = E fi Zq∗ i α0 (τ) Zqi Zqi for some τ ∈ T and q ∈ Q. Assumption [A.6] facilitates the asymptotic analysis and can be easily modified to accommodate a location-shift specification. Under the above assumptions and assuming the partition quantile q is known, we have the following result; Proposition 2.2 Under assumptions A[1]-A[6], for known q ∈ Q and for all τ ∈ T ,



  ˆ q) − α0 (τ, q) − Hn (τ, q)−1 Jn (τ, q)

= oP (1), sup

n1/2 α(τ, τ∈T

with the oP (1) uniform in τ, and n  | 1 X  | Jn (τ, q) ≡ √ ψτ yi − Zq∗ i α0 (τ) Zqi n i=1

B(τ),

where B(τ) is a mean zero Gaussian process with variance/covariance matrix given by   | E Jn (τ1 , q), Jn (τ2 , q) = (τ1 ∧ τ2 − τ1 τ2 )E(Zqi Zqi ) 11

+

n ∞ X X

  | | | E(Zqi Zqi+j )E ψτ (yi − Zqi α0 (τ1 ))ψτ (yi+j − Zqi+j α0 (τ2 )) zi , xi .

i=1 i,j=−∞

This result takes into account the possibly serially correlated errors and makes the necessary adjustments in adding the covariance terms in the Gaussian process above. Assuming uncorrelated errors, we obtain the result of Gutenbrunner and Jureckova (1992); that is −1/2

Jn (τ, q) ≡ n

n X

  | | ψτ yi − Zq,i α0 (τ) Zq,i

 1/2 Q0 (q∗ ) B(τ),

i=1

where B(τ) is a vector of p independent Brownian bridges in C[0, 1], the space of continuous functions with domain [0, 1]. This implies   ˆ q∗ ) − α0 (τ, q∗ ) n1/2 α(τ,



−1  1/2 H0 (τ, q∗ ) Q0 (q∗ ) B(τ).

For fixed τ, from Slutsky and the Central Limit Theorem , we have ˆ q∗ ) − α0 (τ, q∗ )) n1/2 (α(τ,

  N 0, V(τ, q∗ ) ,

where N is a zero-mean Normal distribution with variance/covariance matrix V(τ, q0 ) = τ(1 − τ)H0 (τ, q∗ )−1 Q0 (q∗ )H0 (τ, q∗ )−1 . When the partition quantile is unknown, we have to derive the limiting distribution   ˆ q) − α0 (τ, q) in order to develop nonlinearity tests of the normalized estimator n1/2 α(τ, later on. The following Theorem does that. Theorem 2.3 Under assumptions A[1]-A[6], for some unknown q ∈ Q = [qL , qU ] and for all τ ∈ T ,



  ˆ q) − α0 (τ, q) − Hn (τ, q)−1 Jn (τ, q)

= oP (1), sup

n1/2 α(τ,

(τ,q)∈T ×Q

and Jn (τ, q) ≡ −n−1/2

n X

  | | ψτ yi − Zq∗ i α0 (τ) Zqi

B∗ (τ, q),

i=1

where B∗ (τ, q) is a two-parameter Gaussian process with mean zero and variance/covariance 12

matrix given by   | E Jn (τ1 , q1 ), Jn (τ2 , q2 ) = (τ1 ∧ τ2 − τ1 τ2 )E(Zq1 i Zq2 i ) +

n ∞ X X

  | | | E(Zq1 i Zq2 i+j )E ψτ (yi − Zq∗ i α0 (τ1 ))ψτ (yi+j − Zq∗ i+j α0 (τ2 )) zi , xi . 2

1

i=1 i,j=−∞

The above asymptotic representation implies that   ˆ q) − α0 (τ, q) n1/2 α(τ,

 −1 H0 (τ, q) B∗ (τ, q),

for the same B∗ (τ, q) as above. This is a generalization of Theorem 1 in Galvao et al. (2011) since in their model, the errors are uncorrelated. For uncorrelated errors, we have the same Gaussian process of Theorem 1 of Galvao et al. (2011). We proceed now to the asymptotic analysis of the partition quantile q. This is not done in Galvao et al. (2011)

3

The limiting distribution of the partition quantile under shrinking magnitude of shifts

In Galvao et al. (2011), an asymptotic framework for the threshold parameter q is lacking. Here, we attempt to fill in this void in the literature by deriving the limiting properties of qˆ . We need some additional definitions first. We define the following matrix functionals:    |  | D(τ, q) = E fi (F−1 (τ)|z )z z |x = q and V(τ, q) = E z z |x = q . i 2i i 2i i i 2i 2i Denote by g(q) the density of xi at the partition quantile q, assumed continuous here, and by g(q0 ) the same density at the true partition quantile q0 . Similarly, define V0 = V(τ, q0 ) and D0 = D(τ, q0 ). We need to make some additional assumptions: [A.7] For all q ∈ Q, E[kz2i k4 |xi = q] < ∞, E[ fi (F−1 (τ)|zi )2 kz2i k4 |xi = q] < ∞ for i 0 < g(q) ≤ g < ∞. [A.8] The matrix functionals V0 , D0 and the density g0 , are continuous. 13

[A.9] δ(τ) = cn−α with c , 0 and α ∈ (0, 1/2).3 [A.10] c| V0 c > 0 and c| D0 c > 0. The above assumptions, resemble those of Hansen (2000) and Caner (2002), and we have tried to be faithful to their notation for comparison of their results with ours. Some discussion is necessary though. Assumption [A.7] bounds the conditional moments of our regressors while [A.8] requires the distribution of the regime-change regressor xi to be continuous. Assumption [A.9] requires that the difference in regression slopes gets smaller as the sample size gets bigger; see Picard (1986) and Bai (1995). This shrinking threshold (shrinking shifts) asymptotics framework is necessary if we want to have a nuisance-parameter free asymptotic distribution of the partition quantile q. Finally assumption [A.10] is a full-rank condition, required in order to have a non-degenerate asymptotic distribution. It also excludes the case of having a continuous threshold model. For more details, see Hansen (2000) section 3.1. We can now state the asymptotic distribution of the estimated partition quantile qˆ in the following theorem. Theorem 3.1 Under assumptions [A.1] and [A.7]-[A.10], and for α ∈ (0, 1/2) n1−2α (ˆq − q0 )

ωT,

where ω = τ(1 − τ)

|

c V0 c | (c D0 c)2 g0

     1    and T = arg max W(r) − |r| ,    2  r∈R

with W(r) a two-sided Wiener process. Some comments on the above theorem are needed. The difference between a threshold and a change-point model is that in the first, the asymptotic precision of qˆ is proportional |

to the conditional matrix functional E(z2i z2i |xi = q) while in the second, the asymptotic |

precision is proportional to the unconditional moment matrix E(z2i z2i ). In addition, the asymptotic distribution of qˆ becomes less dispersed with larger g0 , that is when there is an increasing number of observations near the true partition quantile q0 . Comparing our result to that of the current literature, our scale term ω generalizes that of Caner (2002) while in Hansen (2000), the same term under the assumption of 3

For simplicity, we supress notation that makes c dependent on τ, that is c = cτ .

14

conditional homoskedasticity, takes the form ω=

σ2 , | (c| E[z2i z2i ]c)g0

for σ2 the variance of the error term. In Theorem 3(iii) of Bai (1995), for LAD estimation, the same term is given by ω=

1 . | 4 f (0)2 (c| E[z2i z2i ]c)g0

The parameter α, controls the rate at which δ, the size of the partition quantile effect decreases to zero. Since we have adopted a shrinking-thresholds asymptotic framework, we cannot have a super-consistent partition quantile estimator since intuitively it is harder to detect a very small shift in the data, therefore the convergence rate slows down. Notice though that if α is small enough, the rate of convergence approaches the super-consistent rate n; in this case, the threshold effect is large enough to be detected. The two sided Wiener process W(r) appearing in the formula above, is defined by     W1 (r) when r ≥ 0 W(r) =  ,   W2 (−r) when r < 0 for two independent Wiener processes W1 (r), W2 (r) on the non-negative half line with W1 (0) = W2 (0) = 0. The distribution function for T is found in Bhattacharya and Brockwell (1976) and is given by  1/2       x   3x1/2   x + 5   3  −  Φ − P(T ≤ x) = 1 +   exp(−x/8) + exp(x)Φ −     2π 2 2   2  

 x  , 2

where Φ(x) denotes the standard normal cumulative distribution function and P(T ≤ x) = 1 − P(T ≤ −x), for x < 0. The two-sided Wiener process results from the discontinuity created by the threshold parameter q of the regression function; see Chan (1993). We conclude by mentioning that assuming i.i.d. errors, the formula for the scale term ω can be simplified. More specifically, we have ω=

τ(1 − τ) . c| D0 cg0

fi (F−1 (τ)|zi )2 i

In the Appendix, we derive the limiting distribution of the partition quantile under 15

fixed magnitude of shifts.

3.1

Confidence Intervals for the unknown threshold estimator

The approach here follows Hansen (2000) and Caner (2002). We construct a Likelihoodratio-type test to test the null hypothesis H0 : q = q0 but also to construct confidence intervals for our estimator. The reason we have chosen this test is that confidence intervals based on Wald statistics are found to be less reliable. In particular, Dufour (1997) has shown that inverting a Wald-type statistic to build confidence intervals for some parameter that is locally almost unidentified (LAU - as is the case in threshold regression models), will lead to confidence intervals with true level that deviates arbitrarily from its nominal level. In addition, approximations of the statistic based on Edgeworth expansions or the bootstrap will not help. On the contrary, likelihood-ratio tests are found to be more reliable. For more, see Gleser and Hwang (1987), Nelson and Savin (1990) and Dufour (1997). Our Likelihood-ratio-type statistic4 takes the following form: LRn (τ, q, α) =

Sn (τ, q0 , α) − Sn (τ, qˆ , α) , τ(1 − τ)

where Sn (·) denotes the sum of asymmetrically weighted absolute residuals for the restricted (under the null) and the unrestricted model for each τ ∈ T . We reject the null for large values of LRn (τ, q, α). Our statistic extends that of Caner (2002) where he considers only the case τ = 1/2, and also extends the statistic of Koenker and Basset (1982) where they considered conditionally homoskedastic errors with no threshold effects. Theorem 3.2 Under assumptions A[1]-A[9], assuming q = q0 , we have LRn (τ, q0 , α)

η2 ξ,

where   |   c V c |r| 0 η2 = | and ξ = maxW(r) − , r∈R c D0 c 2 for a two-sided Wiener process W(r) and V0 , D0 as defined before. Furthermore, the 4

It is a “LR-type” statistic since we do not assume normality of the errors.

16

distribution function of ξ is given by P(ξ ≤ t) = (1 − exp(−t))2 . The above Theorem allows us to tabulate critical values in the special case of conditional homoskedasticity. Notice that in that case, since η2 =

1 fˆi (F−1 (τ)) i

,

and the Likelihood-ratio-type statistic can be written as LRn (τ, q, α) =

h i ˆ fˆi (F−1 (τ)) S (τ, q, α) − S (τ, q , α) n n i τ(1 − τ)

ξ.

This requires only the estimation of the τth -quantile of the unconditional density of the errors, hence it is nuisance-free. As Caner (2002) notes, any kernel estimator can be used (i.e. Epanechnikov). In the following table, we provide critical values for a range of quantiles τ, hence expanding Table 1 of Hansen (2000)5 .

τ 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95

Table 4: Critical Values for the LR statistic 1-α 0.80 0.85 0.90 0.925 0.95 0.975 4.497 5.101 5.939 6.528 7.352 8.751 4.519 5.127 5.969 6.561 7.389 8.796 4.589 5.206 6.062 6.663 7.504 8.932 4.714 5.347 6.226 6.843 7.707 9.174 4.906 5.566 6.481 7.123 8.022 9.549 5.192 5.890 6.858 7.538 8.490 10.105 5.621 6.376 7.424 8.160 9.190 10.939 6.297 7.143 8.317 9.141 10.295 12.254 7.495 8.502 9.899 10.880 12.254 14.586 10.316 11.702 13.626 14.977 16.867 20.077

0.99 10.592 10.645 10.810 11.103 11.556 12.230 13.240 14.831 17.653 24.299

We assess the performance of the proposed confidence regions for the estimated partition quantile qˆ by simulation, under the assumption of homoskedastic and heteroskedastic errors, following the test-inversion method of Hansen (2000). In the case of homoskedas5

The row for τ = 0.50 in our table, corresponds to Table 1 of Hansen (2000).

17

tic error term, we write n o Qˆ = q : LRn (τ, q, α) ≤ cξ (β) , where β denotes the asymptotic confidence level, i.e. 90%, 95%, etc, while cξ (β) is the critical value taken from Table 1. In case of heteroskedasticity, we have n o Qˆ ∗ = q : LR∗n (τ, q, α) ≤ cξ (β) , where LR∗n (·) = LRn (·)/ηˆ 2 , for a consistent estimator of η. Therefore, Qˆ ∗ is a heteroskedasticity robust confidence interval for the estimated threshold value qˆ . The simulation design is the same as in Hansen (2000) and Caner (2002). In particular, in the simple linear regression model, |

|

yi = zi γ + zq,i δ + i , we have set zi = (1 , xi )| , δ = (δ1 , δ2 ), δ1 = 0, xi , i ∼ N(0, 1) and q = 0.75. We have used threshold sizes δ = 0.50, 1, and 2 and sample size n = 100, 200, 300. For the homoskedastic case, η2 = 1. Therefore, in order to compute the Likelihoodratio-type statistic, an estimator of the unconditional density of the errors at the quantile of interest is required. This can be done by using a kernel estimator, such as the Epanechnikov kernel, Khn (u) = 34 (1−u2 ){|u| ≤ 1}, with hn the bandwidth such that hn → 0, √ nhn → ∞ and Khn (u) = h−1 n K(u/hn ); see Caner (2002) and Hardle and Linton (1994). Here, we have used the bandwidth recommended on page 81 of Koenker (2005), that is n  R y o −1 hn = min σˆ y , Φ (τ + bn ) − Φ−1 (τ − bn ) , 1.34 where Φ is the Gaussian cumulative distribution function with density function φ, R y =  1/5  4.5φ4 (Φ−1 (τ))  1  ˆ y, ˆ y,  [2Φ−1 (τ)2 +1]2  . ˆ 0.75) − Q( ˆ 0.25) the interquartile range, and bn given by bn = n1/5 Q(   The second case assumes heterogeneous errors, therefore we follow Hansen (2000), and denote  | 2  | 2 E(r1i |xi = q0,τ ) ˆ (τ)|z , x ) g δ(τ) and η2 = r1i = gˆ i,τ δ(τ) , r2i = f (F−1 . i i v i,τ E(r2i |xi = q0,τ )

18

Following Weiss (1991) and Caner (2002), we estimate η2 by kernel regression as Pn ηˆ = Pn 2

i=1

|

2 ˆ Khn (ˆqτ − xi )( gˆ i,τ δ(τ)) |

2 ˆ qτ − xi )Khn (uˆ i )( gˆ i,τ δ(τ)) i=1 Khn (ˆ

.

Since it is known that the kernel function matters less and the bandwidth is the important parameter in nonparametric estimation, see Li and Racine (2007), we have tried two bandwidths: the first is Silverman’s “rule of thumb” and the second is based on crossvalidation; see Li and Racine (2007). In the heterogeneity tables, we display the coverage probability estimated with Silverman’s “rule of thumb” while in brackets we display the bandwidth selected by cross-validation. Table 5: 90% Coverage Probabilities for qˆ : Homogeneity. σx = 1 σ = 1 τ= 0.50 δ= 0.50 δ= 1.00 δ= 2.00 n = 100 85.3% 89.0% 92.3% n = 200 82.6% 86.0% 88.6% n = 300 86.6% 88.6% 89.6% n = 100 n = 200 n = 300

σx = 1 85.6% 83.6% 82.3%

σ = 2 86.0% 81.3% 84.0%

82.6% 84.0% 86.6%

n = 100 n = 200 n = 300

σx = 1 78.0% 76.6% 80.3%

σ = 4 80.0% 79.0% 81.6%

81.6% 76.6% 78.6%

19

Table 6: 90% Coverage Probabilities for qˆ : Homogeneity. σ = 1 σx = 2 τ= 0.50 δ= 0.50 δ= 1.00 δ= 2.00 n = 100 94.3% 94.0% 94.1% n = 200 88.3% 87.3% 89.6% n = 300 86.0% 93.3% 92.3% n = 100 n = 200 n = 300

σ = 1 96.0% 96.0% 90.6%

σx = 4 91.6% 96.3% 92.0%

93.4% 98.6% 94.4%

Table 7: 90% Coverage Probabilities for qˆ : Homogeneity. σx = 1 σ = 1 τ= 0.10 δ= 0.50 δ= 1.00 δ= 2.00 n = 100 94.3% 87.0% 95.3% n = 200 92.0% 90.3% 88.3% n = 300 91.3% 89.0% 90.0% n = 100 n = 200 n = 300

σx = 1 85.0% 86.3% 90.6%

σ = 2 96.0% 93.6% 89.0%

94.3% 91.6% 89.6%

n = 100 n = 200 n = 300

σx = 1 88.6% 87.3% 88.0%

σ = 4 90.3% 86.6% 89.6%

87.3% 88.6% 89.3%

Table 8: 90% Coverage Probabilities for qˆ : Homogeneity. σ = 1 σx = 2 τ= 0.10 δ= 0.50 δ= 1.00 δ= 2.00 n = 100 91.6% 96.6% 98.3% n = 200 91.0% 95.0% 96.3% n = 300 92.6% 92.0% 94.6% n = 100 n = 200 n = 300

σ = 1 99.0% 96.3% 88.3%

σx = 4 99.3% 98.6% 95.3%

20

99.0% 98.6% 97.3%

Table 9: 90% Coverage Probabilities for qˆ : Homogeneity. σx = 1 σ = 1 τ= 0.25 δ= 0.50 δ= 1.00 δ= 2.00 n = 100 99.6% 99.3% 99.0% n = 200 97.3% 98.6% 99.3% n = 300 97.0% 98.3% 98.0% n = 100 n = 200 n = 300

σx = 1 94.3% 89.6% 90.6%

σ = 2 89.3% 91.6% 91.3%

87.6% 86.6% 88.9%

n = 100 n = 200 n = 300

σx = 1 88.3% 88.0% 85.0%

σ = 4 86.6% 84.3% 87.6%

84.6% 87.6% 89.0%

Table 10: 90% Coverage Probabilities for qˆ : Homogeneity. σ = 1 σx = 2 τ= 0.25 δ= 0.50 δ= 1.00 δ= 2.00 n = 100 83.6% 80.6% 79.6% n = 200 87.3% 85.3% 84.3% n = 300 81.6% 80.0% 86.0% n = 100 n = 200 n = 300

σ = 1 79.8% 82.0% 82.6%

σx = 4 80.0% 79.6% 81.0%

21

87.0% 92.3% 87.6%

Table 11: 90% Coverage Probabilities for qˆ : Homogeneity. σx = 1 σ = 1 τ= 0.75 δ= 0.50 δ= 1.00 δ= 2.00 n = 100 86.9% 92.3% 91.0% n = 200 92.6% 92.0% 89.6% n = 300 88.3% 89.3% 89.3% n = 100 n = 200 n = 300

σx = 1 80.6% 82.0% 79.6%

σ = 2 83.0% 87.6% 87.0%

89.0% 93.6% 92.9%

n = 100 n = 200 n = 300

σx = 1 85.0% 85.6% 85.3%

σ = 4 88.6% 87.3% 88.6%

80.6% 85.0% 87.6%

Table 12: 90% Coverage Probabilities for qˆ : Homogeneity. σ = 1 σx = 2 τ= 0.75 δ= 0.50 δ= 1.00 δ= 2.00 n = 100 93.0% 91.0% 89.6% n = 200 91.0% 89.6% 90.3% n = 300 86.6% 93.0% 91.6% n = 100 n = 200 n = 300

σ = 1 93.0% 92.3% 92.6%

σx = 4 92.6% 94.0% 91.6%

22

91.3% 89.9% 88.6%

Table 13: 90% Coverage Probabilities for qˆ : Homogeneity. σx = 1 σ = 1 τ= 0.90 δ= 0.50 δ= 1.00 δ= 2.00 n = 100 94.3% 97.0% 97.6% n = 200 93.0% 96.0% 96.6% n = 300 93.3% 95.6% 94.0% n = 100 n = 200 n = 300

σx = 1 78.6% 78.3% 82.6%

σ = 2 88.0% 83.6% 85.6%

96.3% 96.0% 92.3%

n = 100 n = 200 n = 300

σx = 1 75.0% 70.3% 71.6%

σ = 4 76.0% 75.0% 76.6%

76.6% 82.0% 86.6%

Table 14: 90% Coverage Probabilities for qˆ : Homogeneity. σ = 1 σx = 2 τ= 0.90 δ= 0.50 δ= 1.00 δ= 2.00 n = 100 98.0% 97.3% 98.0% n = 200 96.0% 96.3% 96.3% n = 300 97.3% 97.0% 95.6% n = 100 n = 200 n = 300

σ = 1 97.0% 97.0% 94.3%

σx = 4 96.6% 96.3% 94.0%

23

96.0% 95.3% 94.6%

Table 15: 90% Coverage Probabilities for qˆ : Heterogeneity. σx = 1 σ = 1 τ= 0.50 δ= 0.50 δ= 1.00 δ= 2.00 n = 100 89.3% (89.6%) 96.0% (97.3%) 94.0% (95.6%) n = 200 85.3% (84.6%) 95.0% (95.1%) 95.6% (96.0%) n = 300 88.3% (88.6%) 94.2% (94.6%) 93.3% (93.8%) n = 100 n = 200 n = 300

σx = 1 87.0% (86.6%) 82.0% (82.3%) 88.3% (80.6%)

σ = 2 91.0% (90.6%) 83.0% (83.0%) 84.3% (83.0%)

93.6% (94.3%) 87.6% (88.0%) 89.6% (90.0%)

n = 100 n = 200 n = 300

σx = 1 75.3% (74.0%) 73.1% (71.3%) 74.0% (73.0%)

σ = 4 76.6% (75.0%) 76.3% (76.0%) 78.3% (78.0%)

82.3% (81.3%) 77.0% (74.0%) 76.6% (74.3%)

Table 16: 90% Coverage Probabilities for qˆ : Heterogeneity. σ = 1 σx = 2 τ= 0.50 δ= 0.50 δ= 1.00 δ= 2.00 n = 100 97.3% (97.0%) 98.3% (98.0%) 99.3% (99.0%) n = 200 89.3% (90.0%) 97.0% (95.6%) 96.0% (95.3%) n = 300 87.3% (88.6%) 96.6% (96.0%) 95.6% (94.3%) n = 100 n = 200 n = 300

σ = 1 98.6% (97.3%) 99.0% (98.6%) 97.6% (95.9%)

σx = 4 98.6% (98.0%) 98.0% (97.3%) 98.0% (97.0%)

24

99.3% (99.0%) 98.0% (98.0%) 96.6% (95.3%)

Table 17: 90% Coverage Probabilities for qˆ : Heterogeneity. σx = 1 σ = 1 τ= 0.10 δ= 0.50 δ= 1.00 δ= 2.00 n = 100 96.3% (92.3%) 88.0% (91.6%) 84.6% (86.6%) n = 200 92.6% (89.6%) 85.0% (84.6%) 80.6% (82.0%) n = 300 88.3% (88.9%) 91.0% (89.6%) 82.6% (84.6%) n = 100 n = 200 n = 300

σx = 1 81.3% (81.6%) 76.0% (78.6%) 80.6% (86.6%)

σ = 2 97.0% (96.0%) 93.0% (91.3%) 86.6% (85.6%)

96.0% (93.6%) 87.6% (92.0%) 92.6% (88.6%)

n = 100 n = 200 n = 300

σx = 1 79.0% (83.3%) 76.0% (76.3%) 73.0% (74.6%)

σ = 4 95.6% (93.6%) 87.6% (92.0%) 81.6% (82.3%)

97.6% (92.3%) 88.6% (92.0%) 86.6% (87.3%)

Table 18: 90% Coverage Probabilities for qˆ : Heterogeneity. σ = 1 σx = 2 τ= 0.10 δ= 0.50 δ= 1.00 δ= 2.00 n = 100 86.3% (88.0%) 96.6% (96.0%) 97.6% (93.6%) n = 200 88.6% (91.6%) 95.0% (94.6%) 93.0% (90.6%) n = 300 86.6% (88.3%) 86.0% (86.6%) 92.3% (91.6%) n = 100 n = 200 n = 300

σ = 1 99.0% (99.0%) 98.6% (98.0%) 98.0% (98.0%)

σx = 4 99.6% (98.6%) 99.0% (98.3%) 98.0% (98.0%)

25

98.3% (98.0%) 99.0% (98.3%) 98.6% (97.6%)

Table 19: 90% Coverage Probabilities for qˆ : Heterogeneity. σx = 1 σ = 1 τ= 0.25 δ= 0.50 δ= 1.00 δ= 2.00 n = 100 95.3% (92.6%) 96.0% (94.6%) 87.6% (93.3%) n = 200 86.0% (92.0%) 88.6% (91.6%) 94.6% (92.3%) n = 300 84.6% (91.6%) 88.3% (92.4%) 86.6% (91.0%) n = 100 n = 200 n = 300

σx = 1 95.6% (94.6%) 97.6% (97.3%) 91.3% (91.0%)

σ = 2 96.0% (94.3%) 95.6% (94.0%) 95.3% (93.6%)

93.0% (92.6%) 86.3% (88.6%) 94.2% (93.6%)

n = 100 n = 200 n = 300

σx = 1 92.0% (81.6%) 91.0% (79.6%) 86.6% (84.6%)

σ = 4 97.0% (92.6%) 92.0% (83.0%) 93.0% (82.6%)

93.6% (86.0%) 93.0% (87.6%) 90.6% (88.0%)

Table 20: 90% Coverage Probabilities for qˆ : Heterogeneity. σ = 1 σx = 2 τ= 0.25 δ= 0.50 δ= 1.00 δ= 2.00 n = 100 99.0% (99.0%) 99.6% (99.0%) 99.6% (98.6%) n = 200 98.0% (97.2%) 98.6% (98.3%) 98.0% (98.0%) n = 300 98.6% (98.3%) 98.0% (98.0%) 97.6% (97.3%) n = 100 n = 200 n = 300

σ = 1 99.0% (98.0%) 98.6% (98.3%) 97.0% (96.0%)

σx = 4 98.6% (98.0%) 96.3% (94.0%) 96.0% (95.3%)

26

98.6% (98.3%) 98.0% (97.0%) 96.6% (96.3%)

Table 21: 90% Coverage Probabilities for qˆ : Heterogeneity. σx = 1 σ = 1 τ= 0.75 δ= 0.50 δ= 1.00 δ= 2.00 n = 100 94.0% (89.0%) 87.3% (88.6%) 91.0% (90.3%) n = 200 85.3% (87.0%) 92.0% (91.6%) 86.6% (87.6%) n = 300 81.6% (84.6%) 82.6% (92.6%) 87.0% (92.3%) n = 100 n = 200 n = 300

σx = 1 86.0% (90.0%) 83.6% (88.0%) 90.3% (89.0%)

σ = 2 87.0% (89.3%) 92.0% (90.3%) 93.0% (92.0%)

96.3% (94.6%) 96.6% (96.0%) 93.6% (92.3%)

n = 100 n = 200 n = 300

σx = 1 71.3% (84.3%) 74.3% (87.0%) 83.6% (94.0%)

σ = 4 92.6% (90.0%) 89.6% (89.3%) 93.0% (87.6%)

86.0% (88.0%) 88.3% (87.9%) 92.0% (86.3%)

Table 22: 90% Coverage Probabilities for qˆ : Heterogeneity. σ = 1 σx = 2 τ= 0.75 δ= 0.50 δ= 1.00 δ= 2.00 n = 100 99.0% (99.0%) 99.6% (99.0%) 99.6% (98.6%) n = 200 98.0% (97.2%) 98.6% (98.3%) 98.0% (98.0%) n = 300 98.6% (98.3%) 98.0% (98.0%) 97.6% (97.3%) n = 100 n = 200 n = 300

σ = 1 99.0% (98.0%) 98.6% (98.3%) 97.0% (96.0%)

σx = 4 98.6% (98.0%) 96.3% (94.0%) 96.0% (95.3%)

27

98.6% (98.3%) 98.0% (97.0%) 96.6% (96.3%)

Table 23: 90% Coverage Probabilities for qˆ : Heterogeneity. σx = 1 σ = 1 τ= 0.90 δ= 0.50 δ= 1.00 δ= 2.00 n = 100 80.0% (88.0%) 87.0% (94.0%) 86.6% (92.3%) n = 200 79.6% (85.0%) 93.0% (86.0%) 91.0% (88.0%) n = 300 76.6% (86.3%) 85.0% (87.0%) 87.0% (88.0%) n = 100 n = 200 n = 300

σx = 1 83.3% (89.3%) 86.6% (87.0%) 94.6% (93.6%)

σ = 2 80.6% (85.6%) 86.3% (87.0%) 94.0% (93.3%)

88.0% (88.3%) 95.0% (92.0%) 92.6% (90.3%)

n = 100 n = 200 n = 300

σx = 1 84.0% (84.6%) 71.0% (75.6%) 70.3% (72.6%)

σ = 4 82.6% (83.0%) 74.0% (76.6%) 74.6% (77.3%)

77.3% (78.6%) 85.0% (85.3%) 87.0% (88.0%)

Table 24: 90% Coverage Probabilities for qˆ : Heterogeneity. σ = 1 σx = 2 τ= 0.90 δ= 0.50 δ= 1.00 δ= 2.00 n = 100 86.3% (86.6%) 88.3% (89.6%) 91.0% (88.0%) n = 200 97.0% (87.6%) 96.0% (94.6%) 95.6% (94.6%) n = 300 96.6% (90.6%) 98.3% (93.3%) 92.3% (89.3%) n = 100 n = 200 n = 300

σ = 1 95.6% (92.3%) 94.0% (90.6%) 95.3% (90.3%)

σx = 4 96.3% (96.0%) 93.0% (90.3%) 94.0% (90.0%)

28

97.0% (97.3%) 95.0% (92.3%) 94.0% (92.0%)

We briefly comment on the simulation results. In the case of the median τ = 0.50, which coincides with the LAD estimator of Caner (2002), we can see that the confidence regions under homogeneity are close to their nominal value but they start overcovering when we increase the variance of regressor x. Increasing the variance of the error , results in confidence regions that undercover. The confidence regions under heterogeneity, in general overcover but not by much as the sample size increases together with the size of the jump. The overcovering gets worse when we increase the variability of  while increasing the variability of x, results again in undercovered confidence regions. For the other quantiles, the picture is different. Increasing the variability of x, results in conservative confidence regions for extreme quantiles τ = 0.10 and τ = 0.90. Increasing the variability of  has smaller effect on the confidence regions when the sample and the size of the jump both increase. The same quantiles under heterogeneity, show similar behavior. For quantiles τ = 0.25 and τ = 0.75, under homogeneity, increasing both the variability of x and  improves the behavior of the confidence sets. Under heterogeneity, increasing the variance of  improves the confidence sets but increasing the variance of x, makes the confidence sets conservative. Concerning the bandwidth parameter, the results from cross-validation look better in general and should be prefered on the basis of this simulation exercise. We shoulde also mention that when the confidence regions overcover, it helps to choose a larger bandwidth, that is, to undersmooth. We have tried this and has helped in conservative confidence regions. We do not report the results here since we cannot really formally advice as to how large one needs to set the bandwidth to improve his/her results. In general, it is known that confidence regions based on Hansen’s threshold least squares estimator, behave better since the errors are assumed standard normally distributed and we know that in this setting the least squares estimator is more efficient than the LAD estimator. In particular, and for the case of homoskedastic errors, the efficiency gains are captured by the ratio of the scaled terms in least squares and quantile regression respectively, i.e. Eff =

ωLS 2 2 = τ(1 − τ) fi (F−1 i (τ)|zi , xi ) σ . ωQR

This can be large for non-Gaussian error distributions; see Caner (2002).

29

4

Inference

For some given τ, we may be interested in testing for threshold effects of some covariate xi , on the response yi . This hypothesis can be formulated as follows: H0 : δ(τ) = 0 for any q ∈ Q v.s. H1 : δ(τ) , 0 for some q ∈ Q, τ ∈ T . Note that under the null hypothesis, q is not identified. This is the so-called “Davies problem”; see Davies (1977, 1987). In the least squares regression setting, the problem of testing has been tackled by Andrews (1993) for structural breaks models, Andrews and Ploberger (1994) and Hansen (1996). In the quantile regression setting recently Galvao et al. (2011) developed double sup-Wald tests6 while Lee et al.(2011) explored the use of sup-likelihood ratio tests. The last paper is of great interest since it deals with testing for threshold effects in a wide range of regression models such as binary response, censored and truncated regression, maximum score, maximum rank correlation etc. Here we consider testing procedures under known and unknown threshold variable (partition quantile). In the first case, we consider known testing procedures, i.e. tests already existing in the literature and we discuss how they can be utilized in our threshold quantile regression framework. In the second case, we consider a sup-Wald statistic based on the hypothesis above.

4.1

Testing under known threshold

If the partition quantile q is known, then we are in a slightly more general situation than equation (9) in Koenker and Machado (1999). Therefore, adopting the same regularity conditions and assumptions as in Koenker and Machado (1999), the Bahadur asymptotic representation takes the same form as above, i.e.  √  ˆ qˆ ) − α0 (τ, q∗ ) = Hˆ n (τ, q)−1 Jˆn (τ, q) + oP (1), n α(τ, ˆ | = (γ(τ)| , β(τ)| , δ(τ)| ). We test [Rα(τ)]| = for the vector of parameter estimates α(τ) δ(τ)| = 0, for some unknown quantile τ ∈ T , at some unknown partition quantile q ∈ Q of the covariate xi , for R = [0, 0, Iq ], Iq identity matrix. 6

Though, the results were not included in the published version of the paper.

30

To test any linear hypothesis of interest, we can use a Wald statistic of the form,  −1 ˆ | R[Hn (τ, q)]−1 Jn (τ, q)[Hn (τ, q)]−1 R| δ(τ). ˆ Wn (τ, q) = nδ(τ) By Lemma 2 of Gutenbrunner and Jureckova (1992), we know that  √  ˆ qˆ ) − α0 (τ, q∗ ) n α(τ,

 −1  1/2 H0 (τ, q) Q0 (q) B(τ),

for B(τ), a p-dimensional vector of independent Brownian bridge processes. Following Koenker and Machado (1999) and extending their Theorem 2 to fit our modeling framework, the limiting distribution of the Wald test under the null hypothesis, follows a squared non-central Bessel process Q2p (τ) of order p and with non-centrality parameter ζ. We give this result in the following lemma. Lemma 4.1 Under assumptions A[1]-A[5], and for some known q ∈ Q, we have Wn (τ, q)

Q2p,ζ(τ) (τ),

for a squared non-central Bessel process Q2p,ζ(τ) of order p, where the non-centrality parameter is defined as

ζ(τ) =

 −1 v| (τ) [H0 (τ, q)]−1 Q0 (q)[H0 (τ, q)]−1 v(τ) ζ2 (τ)

,

for some fixed continuous function v(τ) : [0, 1] → Rp , useful in investigating local alterp natives, while ζ(τ) = τ(1 − τ)[ fi (F−1 (τ))]−1 . i Note that for any fixed τ ∈ (0, 1), Q2p,ζ(τ) ∼ χ2p,ζ(τ) , a non-central χ2p random variable with p degrees of freedom and non-centrality parameter η(τ). Moreover, searching over all τ ∈ T, by the Continuous Mapping Theorem, the supremun of the Wald test statistic converges in distribution to the supremum of the squared non-central Bessel process Q of order p, i.e. sup Wn (τ) τ∈T

τ∈T

sup Q2p,ζ(τ) (τ).

We note that we can also construct rank tests in the same way Koenker and Machado (1999) have done. We do not follow this direction here. For more details, see Koenker and Machado (1999) section 2.4. 31

4.2

Testing under unknown threshold

The more interesting and difficult case is when the partition quantile q is unknown. We need to extend the before mentioned Wald test where we have some unknown partition quantile q ∈ Q for fixed τ ∈ T . We might be interested in testing for threshold effects on some fixed quantile τ ∈ T under unknown q ∈ Q. Then, since   ˆ q) − α0 (τ, q) n1/2 α(τ,

 −1  1/2 H0 (τ, q0 ) Q0 (q) B(q),

for B(q) a vector of p independent Brownian bridges in C[0, 1], we get the following result; Corollary 4.2 Under assumptions A[1]-A[6] and under the null hypothesis, for some fixed τ ∈ T and unknown q ∈ Q, we have sup Wn (τ, q) q∈Q

sup W0 (τ, q),

(5)

q∈Q

where the process W0 (τ, q) is given by  |  −1   R[H0 (τ, q)]−1 Q0 (q)1/2 B(q) R[H0 (τ, q)]−1 Q0 (q)[H0 (τ, q)]−1 R| R[H0 (τ, q)]−1 Q0 (q)1/2 B(q) , for B(q) a Brownian-bridge process as defined before. If we are interested in testing for threshold effects for (τ, q) ∈ T × Q, the Wald test is defined as,    |  −1        −1 −1 | ˆ ˆ sup Wn (τ, q) = sup  n R α(τ, q) R[H (τ, q)] J (τ, q)[H (τ, q)] R R α(τ, q) .  n n n    (τ,q)∈T ×Q (τ,q)∈T ×Q  (6) In particular, testing the hypothesis of no heterogeneity in the effects of xi on the conditional distribution of the response yi , which is translated as testing the null hypothesis H0 : δ(τ) = 0 against H1 : δ(τ) , 0 for some τ ∈ T , we reject the null if sup(τ,q)∈T ×Q Wn (τ, q) > w0 , for some constant w0 as suggested by Davies (1977, 1987). In the next Theorem, we derive the null asymptotic distribution of the supremum test sup(τ,q)∈T ×Q Wn (τ, q). 32

Theorem 4.3 Under assumptions A[1]-A[6], under the null hypothesis and for unknown τ ∈ T and unknown q ∈ Q, we have sup Wn (τ, q) (τ,q)∈T ×Q

sup W0 (τ, q),

(7)

(τ,q)∈T ×Q

where the process W0 (τ, q) is given by  |  −1   R[H0 (τ, q)]−1 B∗ (τ, q) R[H0 (τ, q)]−1 Q0 (q)[H0 (τ, q)]−1 R| R[H0 (τ, q)]−1 B∗ (τ, q) , where B∗ (τ, q) is a 2-parameter Gaussian process with mean zero and covariance kernel defined in Theorem 2.3. Notice that the limiting distribution of our test is not pivotal and therefore critical values cannot be tabulated. Despite this, it is not hard to simulate them according to some adopted modeling framework. An easy way is to utilize the Bahadur representation of Theorem 2.3 and notice that under the null Jn (τ, q) = n−1/2

n X

n  X    | ψτ yi − zi γ = n−1/2 τ − 1{ui ≤ τ} zi ,

i=1

i=1

where ui ∼ Uni f [0, 1]; see Lee et al. (2011). The other matrices can also be estimated using known methods; see Koenker (2005). Using the above result, testing the null hypothesis H0 : δ(τ) = 0 against H1 : δ(τ) , 0, ˆ requires an estimator of the asymptotic variance of the estimator δ(τ). From Theorem 2.3, and under the null hypothesis, we have  √  ˆ − δ0 (τ) n δ(τ)

[H0 (τ, q)]−1 B∗ (τ, q).

All quantities stay as defined before. The asymptotic covariance matrix of  δ0 (τ) under the null, is therefore given by Avar

(8) √  ˆ − n δ(τ)

√   ˆ − δ0 (τ) = τ(1 − τ)R[Hn (τ, q)]−1 B∗ (τ, q)[Hn (τ, q)]−1 R| , n δ(τ)

for an appropriate selector matrix R. In the following, we investigate the empirical size and power of the sup-Wald test through a simulation study. The empirical size of the test is assesed using a simple linear model with no threshold effects at theoretical size of α = 5 and 10%. The power of the 33

test is assessed under the data generating process with one threshold given by yi = β0 + β1 zi + δxi + i , where β0 = β1 = 1, δ = 1.0 and i = φi−1 + ei , i , ei ∼ N(0, 1) and φ = 0.4, that is following Weiss (1990), we have serially correlated errors of the form of an AR(1) model specification for the error process. We therefore need to estimate the correlations appearing in matrix Jn (τ, q) of the variance/covariance of our quantile-regression estimator as given in Theorem 2.3. We do this as in Weiss (1990), adopting a Newey-West-type estimation of the correlations in Jn (τ, q). The next result shows that we can perform this   estimation consistently. Define J0 (τ, q) = E Jn (τ1 , q), Jn (τ2 , q) and the estimator ˆ n (τ, q) = (τ1 ∧ τ2 − τ1 τ2 ) 1 J n

n X i=1

| Zq,i Zq,i

m m X    1X $(j, m) ψτ ˆi (τ1 ) ˆi−j (τ2 ) , + n j=1 i=j+1 j

ˆ k ), k = 1, 2, and for weight function $(j, m) = 1 − m+1 such that where ˆi (τk ) = yi − Zqk ,i α(τ for each j, $(j, m) → 1 as m → ∞ and m = m(n) = o(n1/4 ); see Newey and West (1987) and Weiss (1990). Theorem 4.4 We have P ˆ n (τ, q) → J J0 (τ, q).

In our simulation exercise, the truncation lag in the Newey-West type estimator is set equal to 3 (other choices of the truncation lag such as 2,4 and 5, did not significantly affect the results). We also want to see what happens when the variance of the error term or the covariate xi increases. Therefore we consider the cases i ∼ N(0, 2), N(0, 4) and xi ∼ N(0, 2), N(0, 4). The sample size is n = 200 and we have used 500 repetitions. Note that to obtain the critical values, we have used a sample size of n = 500. The results of this simulation exercise for size and power are summarized on the following Table and plots. Under the theoretical size of α = 5 and 10%, the results show that the test is undersized around the middle quantiles and oversized at the extreme quantiles but the block-bootstrap gives better results especially for the middle quantiles. The “oversize” problem can be tackled by increasing the bandwidth when estimating at those extreme quantiles; see also Qu (2008) for a similar problem.

34

Power Curves for ''tau=0.10'' alpha=10% 100%

100%

Power Curves for ''tau=0.10'' alpha=5%

10

20%

40%

Power

60%

80%

x~N(0,1),e~N(0,1) x~N(0,2),e~N(0,1) x~N(0,4),e~N(0,1) x~N(0,1),e~N(0,2) x~N(0,1),e~N(0,4)

0%

0%

10

20%

40%

Power

60%

80%

x~N(0,1),e~N(0,1) x~N(0,2),e~N(0,1) x~N(0,4),e~N(0,1) x~N(0,1),e~N(0,2) x~N(0,1),e~N(0,4)

0.5

1.0

1.5

2.0

0.0

0.5

1.0

1.5

Threshold size

Threshold size

Power Curves for ''tau=0.25'' alpha=5%

Power Curves for ''tau=0.25'' alpha=10% 100%

100%

0.0

10

20%

40%

Power

60%

80%

x~N(0,1),e~N(0,1) x~N(0,2),e~N(0,1) x~N(0,4),e~N(0,1) x~N(0,1),e~N(0,2) x~N(0,1),e~N(0,4)

0%

0%

10

20%

40%

Power

60%

80%

x~N(0,1),e~N(0,1) x~N(0,2),e~N(0,1) x~N(0,4),e~N(0,1) x~N(0,1),e~N(0,2) x~N(0,1),e~N(0,4)

2.0

0.0

0.5

1.0

1.5

2.0

0.0

Threshold size

0.5

1.0

1.5

Threshold size

Figure 2: Power Curves for τ = 0.10, 0.25 under different simulation settings.

35

2.0

Power Curves for ''tau=0.50'' alpha=10% 100%

100%

Power Curves for ''tau=0.50'' alpha=5%

10

20%

40%

Power

60%

80%

x~N(0,1),e~N(0,1) x~N(0,2),e~N(0,1) x~N(0,4),e~N(0,1) x~N(0,1),e~N(0,2) x~N(0,1),e~N(0,4)

0%

0%

10

20%

40%

Power

60%

80%

x~N(0,1),e~N(0,1) x~N(0,2),e~N(0,1) x~N(0,4),e~N(0,1) x~N(0,1),e~N(0,2) x~N(0,1),e~N(0,4)

0.5

1.0

1.5

2.0

0.0

0.5

1.0

1.5

Threshold size

Threshold size

Power Curves for ''tau=0.75'' alpha=5%

Power Curves for ''tau=0.75'' alpha=10% 100%

100%

0.0

10

20%

40%

Power

60%

80%

x~N(0,1),e~N(0,1) x~N(0,2),e~N(0,1) x~N(0,4),e~N(0,1) x~N(0,1),e~N(0,2) x~N(0,1),e~N(0,4)

0%

0%

10

20%

40%

Power

60%

80%

x~N(0,1),e~N(0,1) x~N(0,2),e~N(0,1) x~N(0,4),e~N(0,1) x~N(0,1),e~N(0,2) x~N(0,1),e~N(0,4)

2.0

0.0

0.5

1.0

1.5

2.0

0.0

Threshold size

0.5

1.0

1.5

Threshold size

Figure 3: Power Curves for τ = 0.50, 0.75 under different simulation settings.

36

2.0

Table 25: Empirical Size of the sup-Wald Statistic. Size Newey-West Block-Boot. τ α = 5% α = 10% α = 5% α = 10% 0.10 8.5% 13.5% 10.5% 18.0% 0.20 3.5% 7.5% 6.5% 13.0% 0.30 3.0% 6.5% 5.5% 8.5% 0.40 2.5% 4.5% 3.5% 6.0% 0.50 1.5% 3.5% 3.0% 5.5% 0.60 2.0% 3.5% 2.5% 6.0% 0.70 3.0% 5.5% 3.5% 6.5% 0.80 4.0% 6.5% 4.5% 7.0% 0.90 4.5% 7.5% 5.5% 8.5%

Power Curves for ''tau=0.90'' alpha=10% 100%

100%

Power Curves for ''tau=0.90'' alpha=5%

10

20%

40%

Power

60%

80%

x~N(0,1),e~N(0,1) x~N(0,2),e~N(0,1) x~N(0,4),e~N(0,1) x~N(0,1),e~N(0,2) x~N(0,1),e~N(0,4)

0%

0%

10

20%

40%

Power

60%

80%

x~N(0,1),e~N(0,1) x~N(0,2),e~N(0,1) x~N(0,4),e~N(0,1) x~N(0,1),e~N(0,2) x~N(0,1),e~N(0,4)

0.0

0.5

1.0

1.5

2.0

0.0

Threshold size

0.5

1.0

1.5

Threshold size

Figure 4: Power Curves for τ = 0.90 under different simulation settings.

37

2.0

In general the power is satisfactory for all quantiles and increases as we move closer to the extreme quantiles. Increasing the variation of x, reduces the powerwhile increasing the variability of the error  increases the power even when the size of the jump is small. When the size of the jump increases, the power also increases in all cases considered in this simulation exercise. We now display the power simulation results for the block-bootstrap.

38

Power Curves for ''tau=0.10''(Block−bootstrap) alpha=10% 100%

100%

Power Curves for ''tau=0.10''(Block−bootstrap) alpha=5%

10

20%

40%

Power

60%

80%

x~N(0,1),e~N(0,1) x~N(0,2),e~N(0,1) x~N(0,4),e~N(0,1) x~N(0,1),e~N(0,2) x~N(0,1),e~N(0,4)

0%

0%

10

20%

40%

Power

60%

80%

x~N(0,1),e~N(0,1) x~N(0,2),e~N(0,1) x~N(0,4),e~N(0,1) x~N(0,1),e~N(0,2) x~N(0,1),e~N(0,4)

0.5

1.0

1.5

2.0

0.0

0.5

1.0

1.5

Threshold size

Threshold size

Power Curves for ''tau=0.25''(Block−bootstrap) alpha=5%

Power Curves for ''tau=0.25''(Block−bootstrap) alpha=10% 100%

100%

0.0

10

20%

40%

Power

60%

80%

x~N(0,1),e~N(0,1) x~N(0,2),e~N(0,1) x~N(0,4),e~N(0,1) x~N(0,1),e~N(0,2) x~N(0,1),e~N(0,4)

0%

0%

10

20%

40%

Power

60%

80%

x~N(0,1),e~N(0,1) x~N(0,2),e~N(0,1) x~N(0,4),e~N(0,1) x~N(0,1),e~N(0,2) x~N(0,1),e~N(0,4)

2.0

0.0

0.5

1.0

1.5

2.0

0.0

Threshold size

0.5

1.0

1.5

2.0

Threshold size

Figure 5: Power Curves for τ = 0.10, 0.25 under different simulation settings (Blockbootstrap).

39

Power Curves for ''tau=0.50''(Block−bootstrap) alpha=10% 100%

100%

Power Curves for ''tau=0.50''(Block−bootstrap) alpha=5%

10

20%

40%

Power

60%

80%

x~N(0,1),e~N(0,1) x~N(0,2),e~N(0,1) x~N(0,4),e~N(0,1) x~N(0,1),e~N(0,2) x~N(0,1),e~N(0,4)

0%

0%

10

20%

40%

Power

60%

80%

x~N(0,1),e~N(0,1) x~N(0,2),e~N(0,1) x~N(0,4),e~N(0,1) x~N(0,1),e~N(0,2) x~N(0,1),e~N(0,4)

0.5

1.0

1.5

2.0

0.0

0.5

1.0

1.5

Threshold size

Threshold size

Power Curves for ''tau=0.75''(Block−bootstrap) alpha=5%

Power Curves for ''tau=0.75''(Block−bootstrap) alpha=10% 100%

100%

0.0

10

20%

40%

Power

60%

80%

x~N(0,1),e~N(0,1) x~N(0,2),e~N(0,1) x~N(0,4),e~N(0,1) x~N(0,1),e~N(0,2) x~N(0,1),e~N(0,4)

0%

0%

10

20%

40%

Power

60%

80%

x~N(0,1),e~N(0,1) x~N(0,2),e~N(0,1) x~N(0,4),e~N(0,1) x~N(0,1),e~N(0,2) x~N(0,1),e~N(0,4)

2.0

0.0

0.5

1.0

1.5

2.0

0.0

Threshold size

0.5

1.0

1.5

2.0

Threshold size

Figure 6: Power Curves for τ = 0.50, 0.75 under different simulation settings (Blockbootstrap).

40

Power Curves for ''tau=0.90''(Block−bootstrap) alpha=10% 100%

100%

Power Curves for ''tau=0.90''(Block−bootstrap) alpha=5%

10

20%

40%

Power

60%

80%

x~N(0,1),e~N(0,1) x~N(0,2),e~N(0,1) x~N(0,4),e~N(0,1) x~N(0,1),e~N(0,2) x~N(0,1),e~N(0,4)

0%

0%

10

20%

40%

Power

60%

80%

x~N(0,1),e~N(0,1) x~N(0,2),e~N(0,1) x~N(0,4),e~N(0,1) x~N(0,1),e~N(0,2) x~N(0,1),e~N(0,4)

0.0

0.5

1.0

1.5

2.0

0.0

0.5

Threshold size

1.0

1.5

2.0

Threshold size

Figure 7: Power Curves for τ = 0.90 under different simulation settings (Blockbootstrap). From the plots above, we can see that the power is higher compared to the first simulation method for all quantiles and simulation designs. The effects of increasing the variance of x and , follow the same pattern as in the previous table.

5

Concluding Remarks

On this chapter we have formulated a threshold quantile regression model with one threshold value which we call “partition quantile”. We have derived the limiting distribution of the estimators as well as the unknown estimated partition quantile assuming a fixed and shrinking shifts asymptotic framework for q. We have construct confidence intervals on the estimated partition quantile qˆ inverting a Likelihood-Ratio-type test statistic, and have derived its limiting distribution. The coverage probability of the estimated confidence regions is assessed through a simulation exercise for a variety of quantiles and the results are satisfactory. An alternative interesting approach that perhaps gives better results for all simulation settings considered in this chanper, is sub-sampling; see Gonzalo and Wolf (2005). This is a topic for future research. We have also develop inferencial procedures to identify heterogeneous effects of dif41

ferent covariate quantile ranges on different quantiles of the response for both known and unknown partition quantile. By fixing the quantile of interest we can test for threshold effects for each quantile of the dependent variable using a Wald-type test. We have shown that the supremum of our Wald test, converges to a two-parameter Gaussian process that generalized that of Galvao et al. (2011) in that we including possible serially correlated errors. A simulation exercise shows good size and power properties of the proposed test for different quantiles. The model is extended to accommodate multiple regimes in an i.i.d setting in Kuan et al. (2013). It is of interest to allow for a continuous threshold by replacing the indicator function by some integrated kernel as in Seo and Linton (2006), as well as developing other inferential procedures such as Rank-score statistics that do not need to estimate the error density that usually is difficult and done imprecisely for extreme quantiles. The authors are currently working on developing such a test.

42

6

Technical Appendix

Proof of Theorem 2.1: This proof follows closely Lemma 2 of Galvao et al. (2011) with minor modifications hence it is omitted.  Proof of Proposition 2.2: This result is proved by using standard quantile regression asymptotics; see Gutenbrunner and Jureckova (1992), Koenker and Machado (1999) or Kato (2009) (here q is the known partition quantile that splits the covariate into two sub-samples) hence is omitted.  Proof of Theorem 2.3: We proceed as in Su and Xiao (2008). We need to define some quantities first. Write |

|

|

α(τ) = (γ(τ)| , β(τ)| , δ(τ)| )| , α0,τ = α0 (τ) = (γ0,τ , β0,τ , δ0,τ )| and ˆ q)| )| = (γˆ |τ,q , βˆ|τ,q , δˆ|τ,q )| . ˆ q)| , δ(τ, ˆ q) = (γ(τ, ˆ q)| , β(τ, αˆ τ,q = α(τ, Also, define

φˆ τ,q

  n1/2 (γˆ τ,q − γ0,τ )   =  n1/2 (βˆτ,q − β0,τ )   n1/2 (δˆ − δ ) τ,q 0,τ

    n1/2 (γ(τ) − γ0,τ )      , φτ =  n1/2 (β(τ) − β0,τ )     n1/2 (δ(τ) − δ ) 0,τ

    | | |  and Zq,i = (z1,i , z2,i , zq,i )| .  

Notice that φˆ τ,q = argmin φτ ∈Rp

n X

  | | ρτ yi − α0,τ Zq,i − n−1/2 φτ Zq,i .

i=1

Set, Sn (τ, q; φτ ) = n

−1/2

n X

  | | ψτ yi − α0,τ Zq,i − n−1/2 φτ Zq,i Zq,i

i=1

and Sn (τ, q; φτ ) = n

−1/2

n X h  i | | E ψτ yi − α0,τ Zq,i − n−1/2 φτ Zq,i Zq,i , i=1

43

|

and observe that −φτ Sn (τ, q; vφτ ) is an increasing function of v ≥ 1. We need some lemmas in order to prove Theorem 2.3. The first is a variant of Bernstein’s inequality that will be used in the proofs that follow. Lemma 6.1 Bernstein’s Inequality: Let zi be a sequence of independent random variP ables with zero mean and |zi | ≤ µ for some µ > 0. Let V ≥ ni=1 Ez2i . Then for all V s ∈ (0, 1) and a ∈ [0, sµ ],

   n   −a2 s(1 − s)    X . zi > a ≤ 2 exp  P  V i=1

Proof: See Babu (1989) and Bai (1995).

Denote ∆zq,i

 o = zi 1{xi ≤ q} − 1{1 ≤ q0 } and notice that 1{xi ≤ q} − 1{1 ≤ q0 } = oP (1). n

We have the following lemma which generalizes Lemma A.1 of Caner (2002). Lemma 6.2 Under assumptions [A.1]-[A.6], inf inf

(τ,q)∈T ×Q φ

n X

    | | | ρτ i (τ) − n−1/2 φτ Zq,i − δτ ∆zq,i − ρτ i (τ) − δτ ∆zq,i = oP (log n).

i=1 |

Proof: By Lemma 6 of Babu (1989) and by replacing i with i (τ) − δτ ∆zq,i , we have that p kφτ k ≤ C0 log n, with probability one as n → ∞ and positive constant C0 , uniformly over q for fixed quantile τ ∈ T . Hence, as n → ∞   p P kφˆ τ k > C log n, ∀q ∈ Q < , p for  > 0. By denoting Mn = C log n, we define     | | | ηi (τ, q; φ) = ρτ i (τ) − δτ ∆zq,i − n−1/2 φτ Zq,i − ρτ i (τ) − δτ ∆zq,i . For any A > 0 and fixed quantile τ, we can write the following decomposition: P



n

X



sup inf ηi (τ, q; φ)

> 2A log n φ

(τ,q)∈T ×Q

i=1

n



X    

≤ P kφˆ τ k > Mn , ∀q ∈ Q + P sup inf ηi (τ, q; φ)

> 2A log n kφk≤M (τ,q)∈T ×Q

44

n

i=1

≤+P

n



X 

sup sup ηi (τ, q; φ)

> 2A log n



(A.1)

(τ,q)∈T ×Q kφk≤Mn i=1

uniformly in q ∈ Q. To show that (A.1) is asymptotically negligible, we need to show that with probability one, n X sup ηi (τ, q; φ) ≤ A log n,

lim sup sup n→∞

(A.2)

(τ,q)∈T ×Q kφk≤Mn i=1

for some A > 0. We follow Babu (1989) and Bai (1995) and apply the chaining technique. We divide kφk ≤ Mn into Cp,q n(p+q)/2 cells, for some constant Cp,q , and for functions g, f belonging to the same cell, we have that kg − f k ≤ Mn n−1/2 . Take φτ,r in the rth -cell and notice that n



X ηi (τ, q; φ)

sup

sup

(τ,q)∈T ×Q kφk≤Mn



sup (τ,q)∈T ×Q

i=1

n

X

sup ηi (τ, q; φτ,r )

+ sup r

|

{z

sup

(τ,q)∈T ×Q kg− f k≤Mn n−1/2

i=1

}

|

n

X



[ηi (g) − ηi ( f )]

{z

(I)

(A.3).

i=1

}

(II)

For (II), we have, n n

X

X −1 −1/2

[ηi (g, τ, q) − ηi ( f, τ, q)](log n) ≤ n [kZq,i k · kg − f k](log n)−1 i=1

i=1

−1/2

≤n

n X

[kzi k · kg − f k](log n)−1 ,

(A.4)

i=1

p since kZq,i k ≤ kzi k · 1{xi ≤ q} ≤ kzi k. Because kg − f k ≤ n−1/2 Mn and Mn = M log n, we get n−1/2

n h X i=1

  n X i   p kzi k·kg− f k (log n)−1/2 ≤ n−1 kzi kM(log n)−1/2 ≤ C1 log n → 0, as n → ∞, (A.5)

for C1 ≥ n−1

i=1

Pn i=1

kzi k.

45

For (I), we have, n X

n  n X  X ηi (τ, q; φτ,r ) = ηi (τ, q; φτ,r ) − Eηi (τ, q; φτ,r ) + Eηi (τ, q; φτ,r ),

i=1

i=1

(A.6).

i=1

Using Knight’s identity7 , we have   Z Z| φτ,r       q,i     | φ τ − 1{ ≤ 0} + Eηi (τ, q; φτ,r ) = E  − Z 1{ ≤ s} − 1{ ≤ 0} ds  τ,r i i i   q,i   0 =

  1 | f (0|zi )kZq,i φτ,r k2 1 + o(1) , 2

(A.7)

|

when n−1/2 Zq,i φτ,r ≈ 0, which implies that kφτ k ≤ Mn . Therefore, we get   n n

X

X   |  | 

Eηi (τ, q; φτ,r )

≤ 2φτ,r n−1 f (0|zi )Zq,i Zq,i φτ,r i=1

(A.8)

i=1

| | ≤ 2φτ,r f (0|zi )Zq,i Zq,i φτ,r ≤ 2M log n, for some constant M such that M ≥ kn−1

(A.9) | i=1 zi zi k

Pn

≥ kn−1

Pn i=1

|

Zq,i Zq,i k.

Using (A.7), (A.8) and (A.9), we get E ηi (τ, q; φτ,r ) ≤ 2M log n. We now analyze the term

(A.10)

 Pn Pn  i=1 ηi (τ, q; φτ,r ) − E ηi (τ, q; φτ,r ) ≡ i=1 ξi (τ, q; φτ,r ). Notice

that   E ξi (τ, q; φτ,r ) = 0 and E ξ2i (τ, q; φτ,r ) ≤ 4 E max kzi k2 n−1 log n. i≤n

Then, p ξi (τ, q; φτ,r ) ≤ 2Mn max kzi k = 2n−1/2 max kzi k log n, i≤n i≤n and n X

E

ξ2i (τ, q; φτ,r )

i=1 7



4M2n

n X

kzi k2

i=1

see Knight (1998)

46

(A.11)

  n X   kzi k2  log n, ≤ 4M2n n−1

(A.12)

i=1

since ξ is mean zero. Using Theorem 2.2.3 of Gyorfi, Hardle, Sarda and Vieu (1989), we write   n   X ξ2i (τ, q; φτ,r ) ≤ 4Mn E  i=1

     n−1 E max kz k2  log n ≤ M log n, 1 i   i

(A.13)

by assumption. Employing the exponential inequality for ρ-mixing random variables of Gyorfi, Hardle, Sarda and Vieu (1989) (Theorem 2.2.2) and under our assumptions, together with (A.13), we get   n

n o  

X ξi (τ, q; φτ,r )

> λ log n ≤ C exp − cλ log n = Cn−cλ , P 

i=1

for constants C, c > 0. We have shown lim sup sup n→∞

sup

(τ,q)∈T ×Q r≤Cp,q n(p+q)/2

n

X



ξi (τ, q; φr )

≤ λ log n

(A.14)

i=1

which concludes the proof.  Theorem 2.3 is a consequence of the following lemmata as in Su and Xiao (2008). Lemma 6.3 Under assumptions [A.1]-[A.6], sup



sup

Sn (τ, q; φ) − Sn (τ, q; 0) − [Sn (τ, q; φ) − Sn (τ, q; 0)]

= oP (1).

(τ,q)∈T ×Q kφk≤M

Proof: Define   n   X     −1/2 Ξn (τ, q; φ) = − S (τ, q; φ) − Sn (τ, q; 0) − [Sn (τ, q; φ) − Sn (τ, q; 0)] =n ξn,i (τ, q; φ),    n  i=1

where   ξn,i (τ, q; φ) = ξn,i (τ, q; φ) − Ei ξn,i (τ, q; φ)

47

and  n o h n  | o  | ξn,i (τ, q; φ) = 1 yi ≤ α0,τ + n−1/2 φ Zq,i − 1 yi ≤ α0,τ Zq,i Zq,i .



We need to show that

Ξn,k (τ, q; φ)

= oP (1), for fixed τ, q, φ and k = 1, 2, · · · , 2p. Notice that Ξn,k (·) is the kth -element of Ξn (·), hence we further define the corresponding kth elements Zq,i,k , ξn,i,k , ξn,i,k of Zq,i , ξ n,i , ξn,i respectively. If each kΞn,k (τ, q; φ)k is oP (1) we can

then conclude that

Ξn , (τ, q; φ)

= oP (1). By using similar arguments as in the proof of Lemma 6.2, in particular using the chaining technique, we can show that h i h i E ξn,i (τ, q; φ) = E ξn,i (τ, q; φ) − E [ξn,i (τ, q; φ)] = 0. We now calculate the variance of Ξn,k (·): n X i h i −1 Var Ξn,k (τ, q; φ) = n E Vari [ξn,i,k (τ, q; φ)]

h

i=1

Jensen Inequality



  n n X X h i   |     n−1 E Ei [ξn,i,k (τ, q; φ)2 ] ≤ n−1 E Fi (α0,τ +n−1/2 φ)Zq,i −Fi α0,τ Zq,i Z2q,i  i=1

−3/2

≤n

n h X

i=1

|

C1i kφ

Zq,i kZ2q,i

i

n h X i   −1/2 −1 ≤ C max n kZq,i k n C1i kZq,i k2

i=1

i≤n

i=1

n h X i   −1/2 −1 ≤ C max n kzi k n C1i kzi k2 = OP (1), i≤n

i=1

by assumption [A.4] and due to kZq,i k ≤ kzi k. An application of Chebyshev’s inequality yields



Ξn,k (τ, q; φ)

= OP (1). We need to prove that this asymptotic negligibility holds uniformly in Φ = {φ : kφk ≤ M}, for fixed τ ∈ T , q ∈ Q and constant M ∈ (0, ∞). For this, it is sufficient to prove the following: sup

+ sup Ξn,k (τ, q; φ) = OP (1)

(τ,q)∈T ×Q kφk≤M

and

sup

− sup Ξn,k (τ, q; φ) = OP (1),

(τ,q)∈T ×Q kφk≤M

48

(A.17)

where Ξ+n,k (τ, q; φ) and Ξ−n,k (τ, q; φ) are defined as Ξn,k (τ, q; φ) but having replaced Zq,i,k by     Z+q,i,k ≡ max Zq,i,k , 0 and Z−q,i,k ≡ max − Zq,i,k , 0 , respectively. We will show only the first relation in (A.17) as the second follows along similar lines. Define, + Ξn,k (τ, q; φ, v)

 n  X   1{y ≤ (α0,τ + n−1/2 φ)| Zq,i + v kn−1/2 Zq,i k } = n−1/2    i i=1

    |    | . −Fi (α0,τ + n−1/2 φ)| Zq,i + v kn−1/2 Zq,i k − 1{yi ≤ α0,τ Zq,i } − Fi α0,τ Zq,i    +

Notice that when v = 0, then Ξn,k (τ, q; φ, 0) = Ξ+n,k (τ, q; φ). As in Su and Xiao (2008), we will show that (A.17) follows from sup (τ,q)∈T ×Q

+

Ξ (τ, q; φ, v)

= oP (1), n,k

(A.18)

for fixed v and φ. Since Φ is compact, we can partition it into a finite number of N(σ) subsets {Φ1 , · · · , ΦN(σ) }, each of diameter not greater than σ. Fix s ∈ {1, · · · , M(σ)} and φs ∈ Φs and note that |

φ| Zq,i ≤ φs Zq,i + σkZq,i k,

∀φ ∈ Φs .

By the monotonicity of the indicator function and the non-negativity of Zq,i , for any φ ∈ Φs , we have +

Ξ+n,k (τ, q; φ) ≤ Ξn,k (τ, q; φs , σ)   n  X          +n−1/2 Fi (α0,τ + n−1/2 φs )| Zq,i + σkn−1/2 Zq,i k − Fi (α0,τ + n−1/2 φs )| Zq,i Z+q,i  .      i=1

The reverse inequality holds for −σ, for all φs ∈ Φs . We have

 

n   X    

   −1/2 | −1/2 −1/2 | + 

sup

n−1/2 F (α + n φ ) Z + σkn Z k − F (α + n φ) Z Z  i 0,τ s q,i q,i i 0,τ q,i q,i     

τ∈T i=1 ≤ σn

−1

n X i=1

C1,i kZq,i kZ+q,i

−1

≤n

n X

C1,i kzi kZ+q,i = σOP (1),

i=1

49

with the OP (1) term uniform ∀q ∈ Q. Therefore,



+

sup Ξn,k (τ, q; φ)

sup

(τ,q)∈T ×Q kφk≤M

≤ sup

sup

s≤N(σ) (τ,q)∈T ×Q



+

Ξ (τ, q; φs , σ)

+ sup n,k

sup

s≤N(σ) (τ,q)∈T ×Q

+

Ξ (τ, q; φs , −σ)

+ σOP (1). n,k

Since Φ is compact, σ can be made arbitrarily small and N(σ) is finite. Therefore (A.17) follows from (A.18). To show (A.18), we will use the chaining technique again. Fix v and φ, let N1 ≡ N1 (n) be an integer such that N1 = [n1/2+d ] + 1 for d ∈ (0, 1/2), where [·] denotes the integer part of the argument and divide T into N1 sub-intervals of points c1 = τ0 < τ1 < · · · < τN1 = 1 − c1 , the length of each sub-interval denoted by δ∗ = (1 − 2c1 )/N1 . By assumption A.2(i) in Su and Xiao (2008), ∀τi , τ j ∈ T such that |τi − τ j | ≤ δ∗ , we get kα0,τ j − α0,τi k ≤ (p + q)C0 |τi − τ j | ≤ C0 δ∗ ≡ C∗ . By the monotonicity of both the indicator and distribution function Fi (·) for τs ≤ τ ≤ τs+1 , we have +

Ξn,k (τ, q; φ, v) − Ξn,k (τs+1 , q; φ, v)   n   X         −1/2 | −1/2 −1/2 | −1/2 F (α +n φ) Z +vkn Z k −F (α +n φ) Z +λkn Z k ≤ n−1/2  i 0,τs+1 q,i q,i i 0,τ q,i q,i      i=1

  n   X     + | | | | +n−1/2 1{y ≤ α Z } − F (α Z ) − 1{y ≤ α Z } + F (α Z ) Z ,   q,i i q,i i q,i q,i i i 0,τ 0,τ 0,τ 0,τ   s+1 s+1  q,i,k  i=1

since

| α0,τ Zq,i

|

|

|

= ζ0,τ zi ≤ ζ0,τs+1 zi = α0,τs+1 Zq,i , where in the first equality we have that under |

|

|

the null hypothesis δτ = 0, hence ζ0,τ = (γ0,τ , β0,τ )| . Also notice that ζ0,τ zi is the τth quantile of yi given zi . A reverse inequality holds by replacing τs+1 by τs . Therefore, we get sup (τ,q)∈T ×Q

+



Ξ (τ, q; φ, v)

≤ max sup

Ξ+ (τs , q; φ, v)

n,k n,k 0≤s≤N 1

(A.19)

q∈Q

 

n      

X     +

−1/2 | −1/2 | + max sup n−1/2

F α +n (φ Z +vkZ k) −F α +n (φ Z +vkZ k) Z (A.20)  i 0,τs+1  q,i q,i i 0,τs q,i q,i    q,i

0≤s≤N1−1 q∈Q

i=1  50

+

sup

τ` ,τm ∈T , |τ` −τm |≤δ∗

 n  n o  

X   Z Z − F α 1 y ≤ α sup n−1/2

 q,i q,i i 0,τ i 0,τ ` ` 

i=1  q∈Q 

n o     +

Z . −1 yi ≤ α0,τm Zq,i + Fi α0,τm Zq,i    q,i

(A.21)

By a mean value expansion, we deduce that (A.20) is OP (1). Now, under the null hypothesis, (A.21) becomes

sup τ` ,τm ∈T , |τ` −τm |≤δ∗

 

n   X

    +

−1/2 1{F (y ) ≤ τ }−τ −1{F (y ) ≤ τ }+τ Z . sup

n   i i ` ` i i m m     q,i

q∈Q i=1

(A.22)

It therefore suffices to prove that (A.22) = OP (1). We know that Fi (yi ) are i.i.d. U(0, 1) from Diebold et. al. (1998). We need to show that (A.22) is stochastically equicontinuous. This can be done by using Lemma A.1 of Qu (2008). Consider the subgradient process under the null, defined by Sn (τ, q, α) = n−1/2

n X i=1

    n o       | | Z+q,i  1 y ≤ α Z + F α Z .  i q,i i q,i 0,τ 0,τ   ` `  

Take T × Q as the parameter space and define the metric   ρ {τ1 , q1 }, {τ2 , q2 } = |q1 − q2 | + |τ1 − τ2 |. We need to show that the stochastic process Sn (τ, q, α) is stochastically equicontinuous on the metric space (T × Q, ρ), which means that for any , η > 0, there exists some δ > 0 such that for large n,  



  P sup

Sn (τ1 , q1 , α(τ1 )) − Sn (τ2 , q2 , α(τ2 ))

> η < , [δ] n o with [δ] = (s1 , s2 ) ∈ T × Q : s1 = {τ1 , q1 }, s2 = {τ2 , q2 }, ρ(s1 , s2 ) < δ . The proof is as in Lemma A.1 of Qu (2008) and therefore is omitted. We conclude that (A.21) is OP (1). We now proceed to show that (A.19) is OP (1). Take  > 0 and notice the following:    

+



+     P  max sup

Ξn,k (τs , q; φ, v)

>  ≤ (N1 +1) max P sup

Ξn,k (τs , q; φ, v)

> . 0≤s≤N1 0≤s≤N1 q∈Q

q∈Q

51

(A.23)

|

|

|

|

|

|

|

|

As before, Zq,i = (z1,i , z2,i , zq,i ) = (zi , zq,i ) and define φ = (φ1 , φ2 )| , φ1 , φ2 be a p-vectors. Let,  |   |  √ | ηn,1,i = n−1/2 φ1 zi + vkzi k , and ηn,2,i = n−1/2 φ1 zi + φ2 zi + 2λkzi k . Then ηn,q,i = ηn,2,i in the regime where xi ≤ q, while ηn,q,i = ηn,1,i in the regime where xi > q. Define, n o n o  |   |  | | s∗n, j,i = 1 yi ≤ ζ0,τs zi + ηn,j,i − Fi ζ0,τs zi + ηn, j,i − 1 yi ≤ ζ0,τs zi + Fi ζ0,τs zi ,

j = 1, 2.

We are going to bound the probability on the RHS of (A.23) by considering two cases corresponding to our two regimes. CASE 1: Regime where xi ≤ q. We have Z+q,i,k = z+q,i,k = x+i,k 1{xi ≤ q}. Therefore + Ξn,k (τs , q; φ, v)

=n

−1/2

n X

s∗n,2,i z+q,i,k

+n

−1/2

i=1

n X

s∗n,1,i z+i,k ,

i=1

and we get       n n



+



X X       s∗n,1,i z+i,k

> /2 . s∗n,2,i z+q,i,k

> /2 + P sup

n−1/2 P sup

Ξn,k (τs , q; φ, v)

>  ≤ P sup

n−1/2 q∈Q q∈Q q∈Q i=1 i=1 | {z } | {z } (A)

(B)

n o Since (s∗n,2,i , z+q,i,k , Fi ), 1 ≤ i ≤ n is a martingale difference sequence with respect to the σ-algebra Fi generated by the regressors and its lagged values, using Doob’s inequality we get

n

16

X ∗

4 sn,2,i . (A) ≤ 2 4 E n i=1 Rosenthal’s inequality gives  n  n n

X

4 i2 X h i  X h  E

s∗n,2,i z+q,i,k

≤ C E (s∗n,2,i z+q,i,k )4 + C E  E (s∗n,2,i z+q,i,k )2 Fi  . i=1

i=1

|

i=1

{z

}

(C)

|

{z

}

(D)

Now, (C) is oP (n1/2 ) and for (D), note that since z+q,i,k is measurable with respect to Fi−1 , 52

we have   |  h i  | ∗ E sn,2,i Fi−1 ≤ Fi ζ0,τs zi + ηn,2,i − Fi ζ0,τs zi ≤ C1,i ηn,2,i , therefore   n   X C1,i ηn,2,i (z+q,i,k )2  = O(n). (D) ≤ C E  i=1

4

P n + ∗

We conclude that E i=1 sn,2,i zq,i,k = O(n). Also     n n X X     s∗n,1,i z+i,k > /2 = O(1/n), s∗n,2,i z+q,i,k > /2 = O(1/n) and P sup n−1/2 P sup n−1/2 q∈Q q∈Q i=1 i=1 therefore   +

  P  max sup Ξn,k (τs , q; φ, v)

>  = O(N1 /n) = o(1). 0≤s≤N1 q∈Q

CASE 2: Regime where xi > q. We have Z+q,i,k = z+i and it follows that + Ξn,k (τs , q; φ, v)

=n

−1/2

n X

s∗n,1,i z+i,k .

i=1

As before, we have  +   P  max sup Ξn,k (τs , q; φ, v) >  = O(N1 /n) = o(1). 0≤s≤N1 q∈Q

Putting the two cases together and applying Chebyshev’s inequality, we conclude that (A.19) = oP (1) and the proof of the Lemma is concluded.  Lemma 6.4 Under assumptions [A.1]-[A.6], sup



sup Sn (τ, q; φ) − Sn (τ, q; 0) + Hn (τ, q)φ

= oP (1).

(τ,q)∈T ×Q kφk≤M

53

Proof: Remember that Hn (τ, q) = n−1

| | i=1 fi (Zq,i α0,τ )Zq,i Zq,i .

Pn

Under assumptions [A.1]-

[A.6], we have



sup

Sn (τ, q; φ) − Sn (τ, q; 0) + Hn (τ, q)φ

sup

(τ,q)∈T ×Q kφk≤M

=

n h



X i | −1/2

sup n Fi ((α0,τ + n−1/2 φ)| Zq,i ) − Fi (α0,τ Zq,i ) Zq,i − Hn (τ, q)φ

sup

(τ,q)∈T ×Q kφk≤M

=

i=1

n Z

X −1/2 sup

n

sup

(τ,q)∈T ×Q kφk≤M



i=1

1

h

0

i | | fi ((α0,τ + n−1/2 φ)| Zq,i ) − fi (α0,τ Zq,i ) ds Zq,i Zq,i φ

n

X

| −1/2 | sup n

C2,i (n φ Zq,i ) Zq,i Zq,i φ

−1

sup

(τ,q)∈T ×Q kφk≤M

i=1

−1/2

2

≤ 2 M max [n 1≤i≤n

−1

kzi k]n

n X

C2,i kzi k2 = oP (1).

i=1

 Lemma 6.5 Under assumptions [A.1]-[A.6], sup (τ,q)∈T ×Q



Sn (τ, q; φˆ τ )

= oP (1).

Proof: By the proof of Lemma 2 in Ruppert and Carroll (1980), we have sup (τ,q)∈T ×Q





Sn (τ, q; φˆ τ )

=

sup (τ,q)∈T ×Q

n−1/2

n X

sup (τ,q)∈T ×Q

n

X 

 |

n−1/2 ψτ yi − αˆ τ Zq,i Zq,i

i=1

√ | 1{yi − αˆ τ Zq,i = 0}kZq,i k ≤ 2 2(p + q)n−1/2 max kzi k = oP (1). 1≤i≤n

i=1

  P Proof of Theorem 2.5 - Conclusion: Remember that Jn (τ, q) = −n−1/2 ni=1 ψτ yi − n o  | | | Zq,i α0 (τ) Zq,i . Under the null hypothesis, τ − 1{yi ≤ Zq,i α0 (τ)} is a martingale difference n o   | sequence with respect to the σ-algebra Fi−1 , i ∈ Z because E 1{yi ≤ Zq,i α0 (τ)} = τ. The Martingale Central Limit Theorem yields the finite dimensional convergence of the process.

54

|

|

To show tightness, for Zi = (z1i , z2i )| and α(τ) = (γ(τ), β(τ))| under the null hypothesis, we write the following process: n n n o 1 X | 1 X | | φ Zq,i − √ φ Zq,i 1 yi ≤ Zi α(τ) Un (τ, q) = τ √ n i=1 n i=1 n  1 X | φ Zq,i − E [φ| Zq,i ] − S(τ, q, 0) ≡ τU1 (q) − S(τ, q, 0), =τ√ n i=1

with S(·) as in Lemma 6.3 before. Modifying the (semi)metric used in the proof of Lemma    | 6.3 as ρ (q1 , α(τ1 )), (q2 , α(τ2 )) = E|φ| Z0,i |p 1{xi ≤ q1 }1{yi ≤ Zq,i α(τ1 )} − 1{xi ≤ q1 }1{yi ≤ p 1/p   | for p > 2, since ρ (q1 , α(τ1 )), (q2 , α(τ2 )) → 0 as k(τ1 − τ2 , q1 − q2 )k → 0, it Zq,i α(τ1 )} follows that S(τ, q, 0) is tight on T × Q. The same applies to the process U1 (q). Due to assuming correlated errors, the variance-covariance is given as in Theorem 2.3 above, that is   | E Jn (τ1 , q), Jn (τ2 , q) = (τ1 ∧ τ2 − τ1 τ2 )E(Zqi Zqi ) +

n ∞ X X

  | | | E(Zqi Zqi+j )E ψτ (yi − Zqi α0 (τ1 ))ψτ (yi+j − Zqi+j α0 (τ2 )) zi , xi .

i=1 i,j=−∞

This concludes the proof of the Theorem.  Proof of Theorem 3.1 In order to prove the Theorem, we need to study the local behavior of the objective function (see Bai (1995)) given by   n    X        −1/2 | Qn (v) = ρ  (τ) − n α Z − ρ  (τ)   τ i q,i τ i 0,τ     i=1

  n    X        | | −1/2 | − ρ  (τ) − n φ Z − δ ∆z − ρ  (τ) − δ ∆z   τ i q,i q,v,i τ i q,v,i τ τ 0,τ     i=1

  n    X        | − ρ  (τ) − δ ∆z − ρ  (τ) ,   τ i q,v,i τ i τ     i=1

n o where ∆zq,v,i = zi 1{xi ≤ q0 + v/an } − 1{xi ≤ q0 } , with an = O(n1−2α ), α ∈ (0, 1/2) and v ∈ V for some V ⊂ R, a compact set. We need the following lemma. 55

Lemma 6.6 Under assumptions [A.1]-[A.11], and uniformly in v ∈ V (for V compact set), we have   n    X        −1/2 | ρ  (τ) − n α Z − ρ  (τ) n2α−1   τ i q,i τ i 0,τ     i=1

  n    X        | | −1/2 | 2α−1 = o (1). ρ  (τ) − n φ Z − δ ∆z − ρ  (τ) − δ ∆z −n   τ i q,i q,v,i τ i q,v,i τ τ 0,τ    P  i=1

Proof: Following Pollard (1991) and Bai (1995), we can write the first term as ζ

−1/2 |

n

n X

zi ψτ (i (τ)) + n

δ

−1/2 |

i=1

n X

h

zq0 ,i ψτ (i (τ)) + ζ n |

−1

n X

i=1

i | fi (F−1 (τ|z ))z z i i i ζ i

i=1

n X h i | −1 +δ n fi (F−1 (τ|z ))z z i q,i q,i δ + oP (1), i |

(A.24).

i=1

The second term, using Pollard (1991) (in quantile form), can be written as   n    X        | | −1/2 | ρ  (τ) − n φ Z − δ ∆z − ρ  (τ) − δ ∆z   τ i q,i q,v,i τ i q,v,i τ τ 0,τ     i=1

h

=ζ n |

−1

n X

| fi (F−1 i (τ|zi ))zi zi

n X i h i | | −1 ζ+δ n fi (F−1 (τ|z ))z z i q,v,i q,v,i δ i

i=1

+n−1/2 ζ|

n X

i=1 |

zi ψτ (i (τ) − δτ ∆zq,v,i )

i=1

−Eψτ (i (τ) −

| δτ ∆zq,v,i )

+n

ζ

−1/2 |

n X

|

zi ψτ (i (τ) − δτ ∆zq,v,i )

i=1

−Eψτ (i (τ) −

| δτ ∆zq,v,i )

+

n h X

i Ri (v) − ERi (v) ,

(A.25)

i=1

where     | | | Ri (v) = ρτ i (τ) − n−1/2 φ0,τ Zq,i − δτ ∆zq,v,i − ρτ i (τ) − δτ ∆zq,v,i n n X X h i h i | | − n−1/2 ζ| zi ψτ (i (τ) − δτ ∆zq,v,i ) − n−1/2 ζ| zi ψτ (i (τ) − ζτ ∆zq,v,i ) . i=1

i=1

56

Following Pollard (1991) (in quantile form), we can write n 2 X Ri (v) − ERi (v) E i=1

≤ 4kφk E1{ki (τ)−δ ∆zq,v,i k ≤ kφk n 2

−1/2

|

−1

max kZq,v,i k}n

n X

kZq,v,i k2 .

(A.26)

i=1

Since Zq,v,i ≤ Zq,i and n−1/2 max kzi k = OP (1), by the consistency of q, the expectation P term on the RHS of (A.26) converges to zero as n → ∞. Also kφk ≤ M and n−1 ni=1 Z2q,i converges to some positive definite matrix by assumption, the last term on the RHS of (A.26) converges to zero in mean square, uniformly over v ∈ V and we conclude that n X

Ri (v) − ERi (v) = oP (1).

(A.27)

i=1

From (A.25) and (A.27), we have   n    X        −1/2 | | | ρ  (τ) − n φ Z − δ ∆z − ρ  (τ) − δ ∆z  τ i q,i q,v,i τ i q,v,i      i=1

n n X i h i hX | | −1/2 | = φ| n−1 ∆z ) fi (F−1 (τ|z ))Z Z φ + n φ Z ψ ( (τ) − ζ q,v,i i q,v,i q,v,i q,v,i τ i τ i i=1

i=1

  | +E ψτ (i (τ) − ζτ ∆zq,v,i ) + oP (1).

(A.28)

From equation 23 in the proof of Lemma A.5 in Hansen (2000), we have n−1

n X

P

|

fi (F−1 i (τ|zi ))Zq,v,i Zq,v,i → H0 (τ, q, v)

i=1

and −1

n

n X

|

P

fi (F−1 i (τ|zi ))Zq0 ,i Zq0 ,i → H0 (τ, q)

i=1

57

  | where H0 (τ, q) = E fi (F−1 (τ|z ))Z Z i q0 ,i q0 ,i . Therefore i h

n φ n −1

|

−1

n X

| fi (F−1 i (τ|zi ))Zq0 ,i Zq0 ,i

i

h

φ−n φ n −1

|

i=1

−1

n X

i P | fi (F−1 (τ|z ))Z Z φ → 0, i q,v,i i q,v,i

(A.29)

i=1

for kφk ≤ M. Now we get δ

−1/2 |

n

n X

zq0 ,i ψτ (i (τ)) − n

δ

−1/2 |

i=1

n X

h  i zq,v,i ψτ (i (τ) − δ| ∆zq,v,i ) − E ψτ (i (τ) − δ| ∆zq,v,i ) ,

i=1

where we have subtracted (A.28) from (A.24) using (A.29). In addition, adding and P subtracting n−1/2 ni=1 zq,v,i ψτ (i (τ)), we get δ

−1/2 |

n

n X

n X   i h −1/2 | | zq,v,i E ψτ (i (τ) − δ| ∆zq,v,i ) zq,v,i ψτ (i (τ)) − ψτ (i (τ) − δ ∆zq,v,i ) + n δ i=1

i=1

δ

−1/2 |

−n

n X

∆zq,v,i ψτ (i (τ)).

(A.30)

i=1

The first two terms of (A.30) can be simplified using Holder’s inequality, hence n X h i   2 n−1 E zq,v,i ψτ (i (τ)) − ψτ (i (τ) − δ| ∆zq,v,i ) − E ψτ (i (τ) − δ| ∆zq,v,i ) i=1

  n n X X      2  ≤ n−1 z2q,v,i n−1 E ψτ (i (τ))−ψτ (i (τ)−δ| ∆zq,v,i ) +E ψτ (i (τ)−δ| ∆zq,v,i ) . (A.31) i=1

i=1

The first term is finite by assumption, while the second term can be analyzed as follows:    2 E ψτ (i (τ)) − ψτ (i (τ) − δ| ∆zq,v,i ) + E ψτ (i (τ) − δ| ∆zq,v,i )  2 h  i2 = E(ψτ (i (τ))2 ) + E ψτ (i (τ) − δ| ∆zq,v,i ) + E ψτ (i (τ) − δ| ∆zq,v,i )   h  i2   −2E ψτ (i (τ))ψτ (i (τ)−δ| ∆zq,v,i ) −2 E ψτ (i (τ)−δ| ∆zq,v,i ) +2E(ψτ (i (τ)))E ψτ (i (τ)−δ| ∆zq,v,i ) . The second and the fifth terms are asymptotically negligible by the zero τth -quantile assumption and the additional shrinking slope assumption. Furthermore,  2   E(ψτ (i (τ))2 ) + E ψτ (i (τ) − δ| ∆zq,v,i ) − 2E ψτ (i (τ))ψτ (i (τ) − δ| ∆zq,v,i )

58

 2 P = E ψτ (i (τ)) − ψτ (i (τ) − δ| ∆zq,v,i ) → 0, by the shrinking slope assumption, the consistency of the partition quantile q and equation 47 of Hansen (2000). Adding all the above results, gives n X

δ

−1/2 |

n

h   i zq,v,i ψτ (i (τ))−ψτ (i (τ)−δ| ∆zq,v,i ) +E ψτ (i (τ)−δ| ∆zq,v,i ) = oP (1),

(A.32)

i=1

uniformly over v ∈ V. By Hansen (2000) and Caner (2002),

Pn i=1

∆zq,v,i ψτ (i (τ)) = oP (nα )

and since α < 1/2, δ

−1/2 |

n

n X

∆zq,v,i ψτ (i (τ)) = oP (1),

(A.33).

i=1

Combining (A.30), (A.32) and (A.33), we get the required result.  Proof of Theorem 3.1 - Conclusion: We analyze the local objective function Qn (v) defined before, to complete the proof of the limiting distribution of the unknown partition quantile q. The third term of Qn (v) can be analyzed as in Pollard (1991), equation A.41. In quantile form, we have   n    X        | ρ  (τ) − δ ∆z − ρ  (τ)   τ i q,v,i τ i τ     i=1



|

n X i=1

∆zq,v,i ψτ (i (τ)) + δ

|

n hX

i | fi (F−1 (τ|z ))∆z ∆z δ + oP (1), (A.34) i q,v,i i q,v,i

i=1

uniformly over v ∈ V. By Lemma A.10 of Hansen (2000), n

i αn | h X | c fi (F−1 (τ|z ))∆z ∆z c i q,v,i i q,v,i n i=1

µ|v|,

(A.35)

where µ = c| D(q)cg(q) and αn = O(n1−2α ). Furthermore, by Lemma A.3 of Caner (2002),

59

we have n i αn h X  ρ ( (τ) − ∆z δ) − ρ ( (τ)) τ i q,v,i τ i n1−2α i=1



1 λW(v) − µ|v| , 2

(A.36)

with λ = c| V(q)cg(q) and W(·) a standard Wiener process. Finally, using Lemma 6.6, we conclude that αn Qn (v) n1−2α

√ − λW(v) − µ|u| + oP (1) = Q(v)

and this completes the proof.  Let Qn (q) = Sn (q0 ) − Sn (q) =

n X

  | √ | √ | √ ρτ i (τ) − n−1/2 z1i ( n(γ − γ(τ))) − n−1/2 z2i ( n(β − β(τ))) − n−1/2 zq0 i ( n(δ − δ(τ)))

i=1



n X

  | √ | √ | √ ρτ i (τ)−n−1/2 z1i ( n(γ−γ(τ)))−n−1/2 z2i ( n(β−β(τ)))−n−1/2 zqi ( n(δ−δ(τ)))−∆zq,i δ(τ) .

i=1

Then we have   n    X        | n2α−1 Qn (q) = −n2α−1 ρ  (τ) − ∆z δ(τ) − ρ  (τ) +o (1),   τ i τ i   q,i   P i=1

which means the effect of the threshold dominates. Now, we need to find the limiting distribution of   n    X        | 2α−1 n ρ  (τ) − ∆z δ(τ) − ρ  (τ) .   τ i τ i   q,i   i=1

From Knight’s identity, v

Z ρτ (u + v) − ρτ (u) = −vψτ (u) +

n

o 1{u ≤ s} − 1{u < 0} ds,

0

60

we can write 2α−1

−n

Qn (q) = n

2α−1

i=1

= −n2α−1 δ(τ)|

n X i=1

|

∆zq,i δ(τ)ψτ (i (τ))

n Z X

h

i 1{i (τ) ≤ s} − 1{i (τ) < 0} ds

0

  n X   1  |  δ(τ) + o (1). fi (F−1 (τ|z , x ))∆z ∆z ∆zq,i ψτ (i (τ)) + n2α−1 δ(τ)|  P i i q,i i q,i   2 i=1

We write Sn (τ, q) =

n X

|

ˆ q)) ρτ (yi − Zqi α(τ,

i=1

=

n X

 | √ ˆ − γ(τ))) ρτ i (τ) − n−1/2 z1i ( n(γ(τ)

i=1

 √ | √ ˆ − δ(τ))) − ∆z| δ(τ) . ˆ − β(τ))) − n−1/2 z| ( n(δ(τ) −n−1/2 z2i ( n(β(τ) qi q,i We write, h i qˆ = argminSn (τ, q) = argmin Sn (τ, q0 ) − Sn (τ, q) . q∈Q

q∈Q

Then, for ∆zq,v,i = z2i (1{xi ≤ q0 +v/αn }−1{xi ≤ q0 }), where v = αn (q−q0 ) and q = q0 +α−1 n v, we have Qn (q) = Qn (v) = Sn (τ, q0 ) − Sn (τ, q)   n    X        |  (τ) − ∆z δ(τ) − ρ  (τ) +o (1) ρ =−   i τ τ i   q,i  P  i=1

  n    X        | =− ρ  (τ) − ∆z δ(τ) − ρ  (τ) +o (1)   τ i τ i   q,v,i   P i=1

= δ(τ)|

n X i=1

 n   X  1 | δ(τ) + o (1). ∆zq,v,i ψτ (i (τ)) − δ(τ)|  fi (F−1 (τ|z , x ))∆z ∆z i i q,v,i P i q,v,i   2 i=1

Now, for the first term we have n

X αn | δ(τ) ∆zq,v,i ψτ (i (τ)) n1−2α i=1 61

=n c

−α |

n X

  z2i 1{xi ≤ q0 + v/αn } − 1{xi ≤ q0 } ψτ (i (τ))

i=1

  B τ(1 − τ)c| V0 cg0 , for B, Brownian motion with covariance matrix consisting of terms defined on the main text. For the second term, we have   n   X αn |  −1 | δ(τ)  f (F (τ|z , x ))∆z ∆z δ(τ) i i i i q,v,i q,v,i    n1−2α i=1 c| D0 cg0 |v|, where matrix functional D0 is also defined at the main text. Therefore, αn Qn (v) n1−2α =

p

1 τ(1 − τ)c| V0 cg0 W(v) − c| D0 cg0 )|v| 2

p 1 λτ W(v) − µτ |v|, 2

with λτ = τ(1 − τ)c| V0 cg0 and µτ = c| D0 cg0 . More analytically, we have n1−2a (ˆq − q0 )

        | | 1/2 arg min − c D cg |v|/2 + (τ(1 − τ)c V cg ) W(v) .  0 0 0 0    −∞ q0 . Note that as in Bai (1995), we can take J#n (q0 , v) = 0. We can see that J#n (q, v) has the same distribution as J(q − q0 ) with the last defined in the main text. From the proof of Theorem 3 of Bai (1995) and Pollard (1991), we have the following weak convergence result:     Sn τ, q, θ0 + n−1/2 θ − Sn τ, q0 , θ0

θ| Q0 (q)1/2 Z + θ| Q0 (q)θ f (0) + J(q − q0 ) 66

on the set kθk ≤ M and |q − q0 | ≤ M for M > 0. Assuming that the arguments inside the sum of equations (x) and (xx) has a continuous distribution, we can uniquely define arg minq−q0 J(q − q0 ) and by the Continuous Mapping Theorem we have that n(q − q0 )

J(q − q0 ).

This concludes the proof. 

67

References [1] Andrews, D.W.K., “Tests for Parameter Instability and Structural Change with Unknown Change Point”, 1993, Econometrica, Vol.61, No.4, pp.821-856.

[2] Andrews, D.W.K. and W. Ploberger, “Optimal tests when a nuisance parameter is present only under the alternative”, 1994, Econometrica, Vol.62, No.6, pp.1383-1414.

[3] Angrist, J., V. Chernozhukov and I. Fernandez-Val, “Quantile Regression under Misspecification, with an Application to the U.S. Wage Structure”, 2006, Econometrica 74, pp.539-563.

[4] Arcones, M. A., and B. Yu, “Central Limit Theorems for empirical and U-processes of Stationary Mixing sequences”, 1994, Journal of Theoretical Probability, Vol.7, No.1, pp.47-71.

[5] Bai, J., “Least Absolute Deviation Estimation of a Shift”, 1995, Econometric Theory, 11, pp.403-436.

[6] Bai, J., “Testing for Parameter Constancy in Linear Regressions: An Empirical Distribution function approach”, 1996, Econometrica, 64, 3, pp.597-622.

[7] Billingsley, P., “Convergence of Probability Measures”, 1999, ,2nd edition, Wiley. [8] Bhattacharya, P. K. and P. J. Brockwell, “The minimum of an additive process with applications to signal estimation and storage theory”, 1976, Probability Theory and Related Fields, 37, pp.51-75.

[9] Cai, Y. and J. Stander, “Quantile-Self Exciting Threshold time series models”, 2008, Journal of Time Series Analysis, 29, pp.187-202.

[10] Caner, M., “A note on the Least Absolute Deviation estimator of a threshold model”, 2002, Econometric Theory, 18, pp.800-814.

[11] Chan, K. S., “Consistency and Limiting distribution of the least squares estimator of a Threshold Autoregressive model”, 1993, The Annals of Statistics, Vol.21, No.1, pp.520533.

[12] Chen, J-E., “Estimating and Testing Quantile regression with structural changes”, 2008, Working Paper, Dept. of Economics, NYU.

68

[13] Davies, R. B., “Hypothesis testing when a nuisance parameter is present only under the alternative”, 1977, Biometrica, Vol.64, pp.247-254.

[14] Davies, R. B., “Hypothesis testing when a nuisance parameter is present only under the alternative”, 1987, Biometrica, Vol.74, pp.34-43.

[15] Doukhan, P., P. Massart and E. Rio, “Invariance principles for absolutely regular empirical processes”, 1995, Ann. Inst. Henri Poincare, Vol.31, No.2, pp.393-427.

[16] Dufour, J. M., “Some Impossibility Theorems in Econometrics with applications to Structural and Dynamic models”, 1997, Econometrica, 65, pp.1365-1389.

[17] Durfauf, S. N. and P. Johnson, “Multiple Regimes and Cross-Country Growth Behavior”, 1995, Journal of Applied Econometrics, 10, pp.365-384.

[18] Franses, P. H. and D. van Dijk, “Non-linear time series models in empirical finance”, 2000, Cambridge University Press.

[19] Galvao, A.F., G. Monte-Rojas and J. Olmo, “Threshold Quantile Autoregressive Models”, 2011, Journal of Time Series Analysis, Vol.32, pp.253-267.

[20] Gleser, L. J. and J. T. Hwang, “The nonexistence of 100(1 − α)% confidence sets of finite expected diameter in errors-in-variables and related models”, 1987, The Annals of Statistics, Vol.15, No.4, pp.1351-1362.

[21] Gutenbrunner, G. and J. Jureckova, “Regression Rank Scores and Regression Quantiles”, 1992, The Annals of Statistics, Vol.20, No.1, pp.305-330.

[22] Gyorfi, L., W. Hardle, P. Sarda and P. Vieu, “Nonparametric Curve Estimation from Time Series”, 1989, Springer.

[23] Hansen, B., “Inference when a nuisance parameter is not identified under the null hypothesis”, 1996, Econometrica, 64, 2, pp.413-430.

[24] Hansen, B., “Sample Splitting and Threshold Estimation”, 2000, Econometrica, 68, 3, pp.575-603.

[25] Hardle, W. and O. B. Linton, “Applied Nonparametric Methods”, 1994, Handbook of Econometrics, Vol.4, pp.2295-2339.

69

[26] Jureckova, J and P. K. Sen, “Robust Statistical Procesures: Asymptotics and Interrelations”, 1996, Wiley.

[27] Kato, K., “Asymptotics for argmin processes: Convexity arguments”, 2009, Journal of Multivariate Analysis, 100, pp.1816-29.

[28] Khan, M. S. and A. S. Senhadji, “Threshold effects in the relationship between inflation and growth”, 2001, IMF Staff Papers, 48 (1), pp.1-21.

[29] Kilian, J. and M. P. Taylor, “Why is it so difficult to beat the random walk forecast of exchange rates?”, 2003, Journal of International Economics, 60(1), pp.85-107.

[30] Kim, J. and D. Pollard, “Cube Root Asymptotics”, 1990, Annals of Statistics, Vol.18, pp.191-219.

[31] Knight, K., “Limiting distributions for L1 regression estimators under general conditions”, 1998, Annals of Statistics, 26, pp.755-770.

[32] Koenker, R. and G. J. Bassett, “Regression quantiles”, 1978, Econometrica, Vol.46, pp.33-50.

[33] Koenker, R. and Q. Zhao, “Conditional Quantile estimation and inference for ARCH models”, 1996, Econometric Theory, Vol.12, pp.793-813.

[34] Koenker, R. and J. A. F. Machado, “Goodness of Fit and Related Inference Processes for Quantile Regression”, 1999, Journal of the American Statistical Association, Vol.94, No.448, pp.1296-1310.

[35] Koenker, R. and Z. Xiao, “Inference on the quantile regression process”, 2002, Econometrica, Vol.70, pp.1583-1612.

[36] Koenker, R., “Quantile Regression”, 2005, Econometric Society Monographs, Cambridge University Press.

[37] Kosorok, M. R., “Introduction to Empirical Processes and Semiparametric Inference”, 2008, Springer Verlag Press.

[38] Kuan, C. M. and K. Hornik, “The generalized fluctuation test: A unifying view”, 1995, Econometric Reviews, 14, pp.135-161.

70

[39] Kuan, C. M. and C. C. Hsu, “Change-point estimation of fractionally integrated processes”, 1998, Journal of Time Series Analysis, 19, pp.693-708.

[40] Kuan, C. M., C-L. Chen, “Effects of National Health Insurance on Precautionary Saving: New Evidence from Taiwan”, 2009, Working Paper.

[41] Kuan, C. M., C. Michalopoulos and Z. Xiao, “The Multiple Quantile Regression on Quantile Ranges Model”, 2013, Working Paper.

[42] Lavielle, M. and E. Moulines, “Least-squares Estimation of an Unknown Number of Shifts in a Time Series”, 2000, Journal of Time Series Analysis, Vol.21, Nb.1, pp.33-59.

[43] Lee, S. M. H. Seo and Y. Shin, “Testing for Threshold Effects in Regression Models”, 2011, Journal of the American Statistical Association, 106 (493), pp.220-231.

[44] Pollard, D., “Convergence of Stochastic Processes”, 1984, Springer. [45] Nelson, F. D. and N. E. Savin, “The danger of extrapolating asymptotic local power”, 1990, Econometrica, Vol.58, 4, pp.977-981.

[46] Newey, W. K. and K. D. West, “A simple, Positive Definite, Heteroskedasticity and Autocorrelation consistent Covariance Matrix”, 1987, Econometrica, Vol.55, 4, pp.703708.

[47] Pakes, A. and D. Pollard, “Simulation and the Asymptotics of Optimization Estimators”, 1989, Econometrica, Vol.57, No.5, pp.1027-1057.

[48] Picard, D., “Testing and Estimating Change-Points in time series”, 1985, Advances in Applied Probability, 17, pp.841-867.

[49] Pollard, D., “Asymptotics for Least Absolute Deviation regression estimators”, 1991, Econometric Theory, 7, pp.186-199.

[50] Potter, S. M., “A nonlinear approach to U.S. GNP”, 1995, Journal of Applied Econometrics, 2, pp.109-125.

[51] Qu, Z., “Test for Structural Change in Regression Quantiles”, 2008, Journal of Econometrics, Vol.146, 1, pp.170-84.

[52] Ruppert, D. and R. J. Carroll, “Trimmed least squares estimation in the linear model”, 1980, Journal of the American Statistical Association, 75, pp.828-838

71

[53] Seo, M. and O. Linton, “A Smoothed Least Squares Estimator for The Threshold Regression Model”, 2007, Journal of Econometrics, 141, pp.704-735.

[54] Su, L. and Z. Xiao, “Testing for parameter instability in quantile regression models”, 2008, Statistics and Probability Letters, 78, pp.2768-2775.

[55] Su, L. and Z. Xiao, “Testing Structural Change in Conditional Distrbutions via Quantile Regressions”, 2009, Working Paper.

[56] Tong, H., “Threshold models in non-linear time series analysis”, 1983, Springer-Verlag. [57] Tong, H., “Non-Linear Time Series”, 1990, Oxford University Press. [58] van der Vaart, A. W. and J. A. Wellner, “Weak Convergence and Empirical Processes”, 1996, Springer-Verlag.

[59] van der Vaart, A. W., “Asymptotic Statistics”, 1998, Cambridge University Press. [60] Weiss, A., “Estimating Nonlinear Dynamic Models using Least Absolute Error Estimates”, 1991, Econometric Theory, Vol.7, pp.46-68.

72