Nonparametric Quantile Regression Estimation with

0 downloads 0 Views 564KB Size Report
bandwidths. To the best of our knowledge, asymptotic theory on using a completely data-driven ...... Working Paper available at https://arxiv.org/pdf/1105.6154v1.pdf. Billingsley .... Nonparametric Econometrics: Theory and Practice. Princeton ...
Nonparametric Quantile Regression Estimation with Mixed Discrete and Continuous Data Degui Li∗

Qi Li†

Zheng Li‡

University of York

Texas A&M University

North Carolina State University

This version: September, 2018

Abstract In this paper, we investigate the problem of nonparametrically estimating a conditional quantile function with mixed discrete and continuous covariates. A local linear smoothing technique combining both continuous and discrete kernel functions is introduced to estimate the conditional quantile function. We propose using a fully data-driven cross-validation (CV) approach to choose the bandwidths optimally. The paper fills a gap in the literature by deriving the asymptotic optimality theory of the CV selected smoothing parameters in the context of conditional quantile estimation. We also establish the asymptotic distribution and uniform consistency (with convergence rates) for the local linear conditional quantile estimators with the data-dependent optimal bandwidths. Simulations show that the proposed approach compares well with some existing methods. Finally, an empirical application with the data taken from the IMDb website is presented to analyze the relationship between box office revenues and online rating scores.

Keywords: Cross-validation, Discrete regressors, Local linear smoothing, Nonparametric estimation, Quantile regression. JEL Classification: C13, C14, C35



Department of Mathematics, University of York, York, YO10 5DD, UK. Email: [email protected]. The corresponding author. Department of Economics, Texas A&M University, College Station, TX, 77843, USA. Email: [email protected]. ‡ Department of Agricultural and Resource Economics, North Carolina State University, Raleigh, NC, 27695, USA. Email: [email protected]. †

1

Introduction

In recent years, there has been an increasing interest on nonparametric estimation of the regression relationship among variables, as the nonparametric approach allows the data “speak for themselves” and thus has the ability to detect regression structures which may be difficult to uncover by traditional parametric modelling approaches. Various nonparametric methods have attracted much attention of statisticians and econometricians partly due to the wide availability of large data sets for empirical applications, see, for example, Green and Silverman (1994), Wand and Jones (1995), Fan and Gijbels (1996), Pagan and Ullah (1999), Li and Racine (2007) and Horowitz (2009). One of the most commonly-used nonparametric estimation methods is the local linear smoothing method as it is well known that the local linear approach has advantages over the traditional Nadaraya-Watson kernel approach, such as higher asymptotic efficiency, design adaption and automatic boundary correction. We refer to the book by Fan and Gijbels (1996) for a detailed account on this subject. Most of the literature introduced above focuses on the nonparametric estimation with continuous regressors. However, in practice, it is not uncommon that some of the regressors might be discrete (c.f., gender, race and religious belief). Although in principle one can use a frequency method to split the whole sample into many cells to handle the discrete variables, in practice such a sample-splitting method quickly becomes infeasible when the number of discrete cells is large. Indeed as pointed out by Li and Racine (2004), the naive splitting method may perform poorly when the number of subgroups is relatively large and the number of observations in some subgroups is small. To address this problem, they consider a nonparametric kernel-based method which smoothes both continuous and discrete covariates. The method works well in practice. Recent developments on this field include Racine and Li (2004), Hall, Racine and Li (2004), Li, Racine and Wooldridge (2009), Park, Simar and Zelenyuk (2015) and Li, Simar and Zelenyuk (2016), most of which investigate the nonparametric estimation of conditional mean regression function and the associated econometric application. It is well known that the conditional mean function may not be a good representative of the impact of the explanatory variables on the response variable. Hence, it is often of interest to model conditional quantiles when studying the regression relationship between a response variable and some explanatory variables. Since the seminal paper by Koenker and Bassett (1978), the quantile regression method has been widely used in many disciplines such as economics, finance, political science and other social science fields. The quantile regression serves as a robust alternative to the mean regression. Recent developments on parametric and nonparametric quantile estimation and inference include Yu and Jones (1998), Cai (2002), Yu and Lu (2004), Koenker and Xiao (2006), Cai and Xu (2008), Hallin, Lu and Yu (2009), Escanciano and Velasco (2010), Belloni and 1

Chernozhukov (2011), Belloni, Chernozhukov and Fern´andez-Val (2011), Galvao (2011), Cai and Xiao (2012), Galvao, Lamarche and Lima (2013), Li, Lin and Racine (2013), Spokoiny, Wang and H¨ardle (2013), Escanciano and Goh (2014), Qu and Yoon (2015), Racine and Li (2017) and Zhu et al (2018). Koenker (2005) and Koenker et al (2017) give a comprehensive overview on various methodologies in quantile regression and their applications. In particular, the paper by Li, Lin and Racine (2013) is among the first to estimate the conditional cumulative distribution function (CDF) nonparametrically by smoothing both the discrete and continuous covariates, and then obtain quantile regression function estimation by inverting the estimated conditional CDF at the desired quantiles. The optimal bandwidths are chosen when estimating the nonparametric CDF, and this avoids the direct bandwidth selection for the kernel-based quantile estimation. In this paper, we propose a different nonparametric method to estimate the conditional quantile function via minimizing a local linear based “check function” defined in Sections 2 and 3. We start with a simple case which only contains continuous regressors, and then consider the general case of mixed continuous and discrete regressors by constructing the local linear smoothing check function (objective function) with both the continuous and discrete kernel functions involved. As the numerical performance of the local linear estimation is sensitive to the smoothing parameters, we further study the choice of smoothing parameters in the local linear quantile estimation, and propose a completely data-driven cross-validation (CV) approach to directly choose the optimal bandwidths. To the best of our knowledge, asymptotic theory on using a completely data-driven procedure to select smoothing parameters optimally for the check-function based conditional quantile regression estimation (with mixed continuous and discrete regressors) is not available in the literature. Thus, our result fills a gap in the literature by deriving the asymptotic optimality property of the data-driven CV selected smoothing parameters. The check function based quantile regression estimation method proposed in this paper has at least three advantages over the inverse-CDF approach (Li, Lin and Racine (2013)). First, it is computationally much more efficient than the inverse-CDF approach in Li, Lin and Racine (2013). Our check-function-based CV method requires O(n2 ) computations, whereas the inverse-CDFbased CV method involves O(n3 ) computations, here n is the sample size. This is a significant improvement in the computational aspect. The second advantage of using our method is that, besides the estimated conditional quantile function, we also obtain the derivative function (of the conditional quantile function with respect to the continuous components) estimate and its asymptotic theory, while it seems difficult to obtain derivative function estimation and to derive the related asymptotic theory if one uses the inverse-CDF method. The third advantage is that, in practice the optimal smoothing parameters can be quite different for different τ, our method provides optimal smoothing parameters for each specific quantile τ ∈ (0, 1). In contrast, the inverse-CDF method gives the same smoothing parameters for all τ ∈ (0, 1), and does not have the 2

flexibility of offering τ-dependent optimal smoothing to conditional quantile function estimation. Under some mild conditions, we prove the point-wise asymptotic normal distribution and uniform convergence rates for the developed local linear quantile estimators using the data-dependent bandwidths determined by the CV approach. We use simulations to illustrate the finite-sample behavior of the proposed method. We also compare our approach with two existing methods: the naive local linear quantile estimation without smoothing the discrete covariates and the nonparametric inverse-CDF method suggested by Li, Lin and Racine (2013). Finally we apply the proposed nonparametric quantile regression estimation method to study the relationship between box office revenues and online rating scores using the dataset collected from the IMDb website. The remaining parts of the paper are organised as follows. We consider the case of purely continuous covariates in Section 2 which introduces the local linear quantile estimation procedure and the CV bandwidth selection approach, and gives the asymptotic optimality of the CV selected bandwidths. The methodology and theory are extended to the general case of mixed discrete and continuous covariates in Section 3. The asymptotic distribution and uniform consistency for the developed local linear quantile regression estimators with the data-dependent optimal bandwidths are given in Section 4. Section 5 reports the simulation study to examine the finitesample performance of the proposed approach. Section 6 provides an empirical application to study the relationship between box office revenues and online rating scores. Section 7 concludes. The technical assumptions and the proofs of the main results are given in appendices.

2

Estimation with continuous regressors

In this section, we consider the case of continuous regressors in nonparametric quantile regression estimation, which has been extensively studied in the literature (c.f., Yu and Jones, 1998; Cai and Xu, 2008; Spokoiny, Wang and H¨ardle, 2013; Qu and Yoon, 2015; Racine and Li, 2017). Extension to the general case of mixed discrete and continuous regressors will be given in Section 3. Suppose that | | (Yi , Xi ) , i = 1, · · · , n, are the observations independently drawn from an identical distribution, where Yi ∈ R is univariate, Xi ∈ Dx is a p-dimensional continuous random vector and the set Dx = {(x1 , · · · , xp ) : ck 6 xk 6 ck , k = 1, · · · , p} is a bounded subset of Rp , where −∞ < ck < ck < +∞, k = 1, · · · , p, are bounded constants. We next describe the nonparametric local linear method to estimate the conditional quantile function and its derivatives by minimizing the local linear based check function, and then introduce a completely data-driven CV method to choose appropriate smoothing parameters involved. For x = (x1 , · · · , xp ) ∈ Dx and y ∈ R, we denote F(y|x) as the conditional CDF of the response variable Yi (evaluated at y) given the continuous random vector Xi = x. Let Qτ (x0 ) with 0 < τ < 1 |

3

and x0 = (x0,1 , · · · , x0,p ) ∈ Dx , be the conditional τ-quantile regression function of Yi given Xi = x0 , which is defined as Qτ (x0 ) = inf {y ∈ R : F(y|x0 ) > τ} , (2.1) |

or equivalently   Qτ (x0 ) = arg min E ρτ (Yi − a) Xi = x0 ,

(2.2)

a∈R

 where ρτ (·) is the check function (or loss function) defined by ρτ (y) = y τ − I{y τ} ,

(3.1)

  Qτ (x0 , z0 ) = arg min E ρτ (Yi − a) Xi = x0 , Zi = z0 .

(3.2)

or equivalently a∈R

As in Section 2, we apply the local linear smoothing approach to estimate Qτ (x0 , z0 ) based on the definition given in (3.2). Due to the mixture of discrete and continuous data in the regressors, two types of kernel-weights are required to construct the locally weighted loss function. For the continuous regressors, we use a kernel weight Kh (Xi − x0 ) defined in Section 2. For the discrete covariates, we use the following discrete kernel (with the convention that 00 equals to 1) Λλ (Zi , z0 ) =

q Y

I{Zi,k 6=z0,k }

λk

,

k=1

where λk ∈ [0, 1] is the bandwidth for the k-th discrete covariate, Zi,k is the k-th component in Zi , and IA is an indicator function which takes value 1 if the event A holds true, and 0 otherwise. | | Recall that h = (h1 , · · · , hp ) and let λ = (λ1 , · · · , λq ) . The local linear estimates of Qτ (x0 , z0 ) and its derivatives (with respect to the continuous components) Q0τ,k (x0 , z0 ), k = 1, 2, · · · , p, are obtained by minimizing the weighted loss function 1X  |  = ρτ Yi − α − (Xi − x0 ) β Kh (Xi − x0 )Λλ (Zi , z0 ) n i=1 n

Ln (α, β; x0 , z0 )

7

(3.3)

|

with respect to α and β = (β1 , · · · , βp ) . We denote the minimizers by b τ (x0 , z0 ), βˆ k ≡ Q b 0 (x0 , z0 ), αˆ ≡ Q τ,k they are the local linear estimates of Qτ (x0 , z0 ) and Q0τ,k (x0 , z0 ), k = 1, 2, · · · , p, respectively. The above check function based local linear conditional quantile estimator with mixed discrete and continuous covariates was previously considered in Li and Racine (2008). However, they did not provide asymptotic analysis on the selection of optimal smoothing parameters by some data-driven methods, which is the main task of current paper. When the smoothing parameter in the discrete kernel is chosen as a vector of zeros, the above approach reduces to the traditional local linear quantile estimation method which splits the whole sample into several groups (sub-samples) according to the different values that the discrete covariates can assume. Then we may directly apply the methodology and theory introduced in Section 2. However, as pointed out by Li and Racine (2004), such a naive sample-splitting method may increase the estimation variance. In particular, it is well known that if the sample size in the subgroup is too small, one cannot expect to get sensible results with the sample-splitting local linear quantile estimation method. The range of the discrete smoothing parameter is the unit interval, i.e., 0 6 λk 6 1 for k = 1, · · · , q. We will generalize the CV bandwidth selection method to determine appropriate bandwidth vectors h and λ. There has been an increasing interest in the CV bandwidth selection approach for the case with mixed continuous and discrete regressors (c.f., Li and Racine, 2004; Racine and Li, 2004; Li, Simar and Zelenyuk, 2016). However, most of the existing literature focuses on the bandwidth selection in the kernel-based estimation in the context of conditional mean regression. Li, Lin and Racine (2013) study the bandwidth selection in nonparametric quantile regression estimation. Their optimal bandwidths for both the continuous and discrete regressors are chosen when estimating the nonparametric CDF. So the chosen bandwidths is optimal for the CDF estimation, but not for the quantile regression estimation. We now extend the CV method in Section 2 to directly choose the optimal bandwidth vectors in nonparametric quantile regression b (−j) (Xj , Zj ; h, λ) ≡ Q b τ,(−j) (Xj , Zj ; h, λ) be the leave-one-out local linear estimated estimation. Let Q value of Qτ (Xj , Zj ) with bandwidths h and λ, which are obtained by minimizing (3.3) with (x0 , z0 ) P Pn being replaced by (Xj , Zj ) and n i=1 being replaced by i=1,i6=j . Similarly to (2.4), we define the following CV function: i 1X h b (−j) (Xj , Zj ; h, λ) M2 (Xj , Zj ), ρτ Yj − Q n j=1 n

CV(h, λ) =

(3.4)

where M2 (·, ·) is a weight function that trims out observations near the boundary of the data 8

b and bλ, which minimize CV(h, λ), i.e., support. The optimal bandwidth vectors can be chosen as h b bλ) = arg min CV(h, λ) (h, h,λ

with

(3.5)

 |  | b= h b1, · · · , h bp h and bλ = bλ1 , · · · , bλq .

b and bλ defined in (3.5), we need some additional To derive the asymptotic optimality for h notation. Let ei = Yi − Qτ (Xi , Zi ) and assume that P(ei 6 0|Xi = x, Zi = z) = τ for x ∈ Dx and z ∈ Dz . Define fxz (·, ·) as the joint probability density function of Xi and Zi and fe (·|x, z) as the conditional density function of ei given Xi = x and Zi = z. Let Q00τ,k (x, z), k = 1, 2, · · · , p, be the second-order derivative function of Qτ (·, ·) with respect to the k-th continuous component evaluated at the point (x, z). Define p q X X µ2 X 2 00 f (Xi , z1 ) [Qτ (Xi , z1 ) − Qτ (Xi , Zi )] b(Xi , Zi ; h, λ) = h Q (Xi , Zi ) + Ik (z1 , Zi )λk  2 k=1 k τ,k f (Xi , Zi ) z ∈D k=1 z

1

and

1 τ(1 − τ)ν0 · , nH [fe (0|Xi , Zi )]2 fxz (Xi , Zi ) Q where µj and νj are defined as in Section 2, Ik (z1 , z2 ) = I{z1,k 6=z2,k } qj=1,6=k I{z1,j =z2,j } , f (x, z) = Q fe (0|x, z)fxz (x, z), and H = pk=1 hk . As can be seen from the proofs in Appendix B, b(·, ·; h, λ) and σ2 (·, ·; h) represent the asymptotic bias and variance of the local linear quantile estimator, respectively. The asymptotic variance term σ2 (·, ·; h) is similar to σ2 (·, ; h) defined in Section 2, whereas the asymptotic bias term b(·, ·; h, λ) is more complicated than b(·; h) defined in Section 2 as local linear smoothing of the discrete regressors also contributes to the estimation bias. The following proposition gives the asymptotic expansion of CV(h, λ), which can be seen as an extension of Theorem 2.1. σ2 (Xi , Zi ; h) =

P ROPOSITION 3.1. Suppose that Assumptions 1, 2, 3(ii), 4(ii) and 5(ii) in Appendix A are satisfied. Then, we have  1  CV(h, λ) = CV1 + E b2 (Xi , Zi ; h, λ) + σ2 (Xi , Zi ; h) M2 (Xi , Zi )fe (0|Xi , Zi ) + s.o., 2 where

(3.6)

1X = ρτ (ei )M2 (Xi , Zi ) n i=1 n

CV1

is unrelated to the smoothing parameters h and λ, and “s.o.” represents some terms with smaller

9

asymptotic probability orders than the second term on the right hand side of (3.6). Like (2.7) and (2.8), we define the following mean squared errors of the local linear quantile regression estimation: i2 1 Xh b (−i) (Xi , Zi ; h, λ) M2 (Xi , Zi )f (0|Xi , Zi ), MSE(h, λ) = Qτ (Xi , Zi ) − Q e n i=1 n

(3.7)

and its leading term: MSEL (h, λ) = E



|

 b2 (Xi , Zi ; h, λ) + σ2 (Xi , Zi ; h) M2 (Xi , Zi )fe (0|Xi , Zi ) .

(3.8)

|

Let a = (a1 , · · · , ap ) and d = (d1 , · · · , dq ) , where ak = hk · n1/(4+p) and dk = λk · n2/(4+p) . Then (3.8) can be written as MSEL (h, λ) = n−4/(4+p) ξ (a, d), where ξ (a, d) = E



 b2 (Xi , Zi ; a, d) + σ2 (Xi , Zi ; a) M2 (Xi , Zi )fe (0|Xi , Zi ) ,

(3.9)

with b (Xi , Zi ; a, d) =

p q X X µ2 X 2 00 f (Xi , z1 ) [Qτ (Xi , z1 ) − Qτ (Xi , Zi )] , ak Qτ,k (Xi , Zi ) + Ik (z1 , Zi )dk  2 k=1 f (X i , Zi ) z ∈D k=1 1

σ2 (Xi , Zi ; a) =

z

τ(1 − τ)ν0 1 · . a1 · · · ap [fe (0|Xi , Zi )]2 fxz (Xi , Zi )

We assume that there exist unique positive constants ak , k = 1, · · · , p, and non-negative constants dk , k = 1, · · · , q (e.g., Li and Zhou (2005)), that minimize ξ (a, d) defined in (3.9). The following b and bλ are asymptotically optimal. theorem shows that the CV selected bandwidths h b and bλ be the T HEOREM 3.2. Suppose that the conditions of Proposition 3.1 are satisfied, let h optimal bandwidth vectors that are minimizers to the CV function defined in (3.4), and let hk = ak · n−1/(4+p) , k = 1, · · · , p, and λk = dk · n−2/(4+p) , k = 1, · · · , q, be the smoothing parameters that minimize the asymptotic leading MSE term defined in (3.8). Then we have

and

b k − h h k = oP (1), k = 1, 2, · · · , p, hk

(3.10)

bλk − λ k = oP (1), k = 1, 2, · · · , q.  λk

(3.11)

10

b and bλ are Theorem 3.2 above shows that the data-driven CV selected bandwidth vectors h |  | asymptotically equivalent to the infeasible optimal ones h = h1 , · · · , hp and λ = λ1 , · · · , λq . It extends existing asymptotic optimality results on the CV nonparametric estimation with mixed continuous and discrete regressors from the mean regression setting (c.f., Theorem 3.1 in Li and Racine, 2004) to the quantile regression setting. The convergence results (3.10) and (3.11) will play a critical role in deriving the asymptotic normal distribution theory of the local linear quantile b and bλ. estimator using the data-dependent bandwidth vectors h

4

Asymptotic theory with data-dependent bandwidths

In this section, we give the point-wise asymptotic normal distribution and uniform consistency for the local linear quantile estimators discussed in Sections 2 and 3, where the data-driven CV selected bandwidths are used. Let S? (x0 ) = diag {fx (x0 )f?e (0|x0 ), µ2 fx (x0 )f?e (0|x0 )Ip } and S (x0 , z0 ) = diag {fxz (x0 , z0 )fe (0|x0 , z0 ), µ2 fxz (x0 , z0 )fe (0|x0 , z0 )Ip } be two (p + 1) × (p + 1) diagonal matrices, where Ip is a p × p identity matrix. In particular, if µ0 = µ2 = 1 (e.g., K(·) is a standard Gaussian kernel), we readily have that S? (x0 ) = fx (x0 )f?e (0|x0 )Ip+1 and S (x0 , z0 ) = fxz (x0 , z0 )fe (0|x0 , z0 )Ip+1 . Let " Ω? (x0 ) = τ(1 − τ)fx (x0 )

|

νp0 0p p−1 0p ν0 ν2 Ip

and

" Ω (x0 , z0 ) = τ(1 − τ)fxz (x0 , z0 )

|

#

νp0 0p p−1 0p ν0 ν2 Ip

# ,

where 0p is a p-dimensional column vector of zeros. As in Sections 2 and 3, we define the following P asymptotic bias terms: b(x0 ; h) = µ22 pk=1 h2k Q00τ,k (x0 ) and p q X X µ2 X 2 00 f (x0 , z1 ) [Qτ (x0 , z1 ) − Qτ (x0 , z0 )] . b(x0 , z0 ; h, λ) = hk Qτ,k (x0 , z0 ) + Ik (z1 , z0 )λk  2 k=1 f (x0 , z0 ) z ∈D k=1 1

z

We next give the point-wise asymptotic normal distribution theory for the local linear quantile e τ (x0 ; h) e 0 (x0 ; h) e and Q e (defined in Section 2 for the case of continuous regressors), estimators: Q τ,k 11

b τ (x0 , z0 ; h, b 0 (x0 , z0 ; h, b bλ) and Q b bλ) (defined in Section 3 for the case of mixed discrete and and Q τ,k continuous regressors), k = 1, 2, · · · , p. T HEOREM 4.1. (i) Suppose that the conditions of Theorem 2.2 are satisfied. Then, we have   1/2   e  nH  

e τ (x0 ; h) e − Qτ (x0 ) − b(x0 ; h) e Q e 0 (x0 ; h) e1Q e −h e 1 Q0 (x0 ) − b1 (x0 ; h) e h τ,1 τ,1 .... .. e 0 (x0 ; h) epQ e −h e p Q0 (x0 ) − bp (x0 ; h) e h τ,p

   d  −→ N [0p+1 , Λ? (x0 )] ,  

(4.1)

τ,p

 1/2   e e = Qp h e k , bk (x0 ; h) e = oP Pp h e 2 so that nH e = op (1) for k = 1, ..., p, bk (x0 ; h) where H k=1 k=1 k and ! | p ν 0 τ(1 − τ) p 0 −1 Λ? (x0 ) = S−1 . ? (x0 )Ω? (x0 )S? (x0 ) = 2 0p (νp−1 ν [f?e (0|x0 )]2 fx (x0 ) 2 /µ2 )Ip 0 (ii) Suppose that the conditions of Theorem 3.2 are satisfied. Then, we have   1/2   b  nH  

b τ (x0 , z0 ; h, b bλ) − Qτ (x0 , z0 ) − b(x0 , z0 ; h, b bλ) Q b 0 (x0 , z0 ; h, b1Q b bλ) − h b 1 Q0 (x0 , z0 ) − b1 (x0 , z0 ; h, b bλ) h τ,1 τ,1 .... .. 0 b (x0 , z0 ; h, bpQ b bλ) − h b p Q0 (x0 , z0 ) − bp (x0 , z0 ; h, b bλ) h τ,p

   d  −→ N [0p+1 , Λ (x0 , z0 )] , (4.2)  

τ,p

P  1/2 Qp b Pq b  p 2 b b b b b bλ) = b where H = k=1 hk , bk (x0 , z0 ; h, λ) = oP bk (x0 , z0 ; h, k=1 hk + k=1 λk so that nH op (1) for k = 1, ..., p, and −1 Λ (x0 , z0 ) = S−1  (x0 , z0 )Ω (x0 , z0 )S (x0 , z0 ) =

τ(1 − τ) [fe (0|x0 , z0 )]2 fxz (x0 , z0 )

|

νp0 0p p−1 0p (ν0 ν2 /µ22 )Ip

! .

e and H b are products of dataThe normalization rates in (4.1) and (4.2) are random as both H driven CV selected bandwidths. Theorem 4.1(ii) above can be seen as the extension of the corresponding results from the continuous regressors case (c.f., Yu and Jones, 1998; Cai and Xu, 2008; Hallin, Lu and Yu, 2009) to the mixed continuous and discrete regressors case. It is easy to find that the discrete kernel in the local linear quantile estimation does not contribute to the asymptotic variances, but it would influence the form of the asymptotic bias, see, for example, the second term b bλ). This finding is similar to that obtained by Li, Lin and Racine (2013). In addition, of b(x0 , z0 ; h, e h b and bλ are used, the limit as in Li and Li (2010), although the data-dependent bandwidths h, distributions in (4.1) and (4.2) remain the same as that using the deterministic optimal bandwidths.

12

In order to make use of the above limit distribution theory to conduct point-wise statistical inference on the quantile regression curves, we need to estimate the asymptotic bias and variance, both of which contain some unknown quantities. In general, there are two commonly-used ways to construct these estimates. The first one is to use the plug-in method, which directly replaces the unknown quantities in the asymptotic bias and variance by appropriate estimated values. For example, the second-order derivatives of the quantile regression function Q00τ,k (x0 ) and Q00τ,k (x0 , z0 ) can be estimated through the local quadratic quantile regression (e.g., Cheng and Peng, 2002) and the density functions fx (x0 ) and fxz (x0 , z0 ) can be consistently estimated by the kernel density estimation method (e.g., Li and Racine, 2003). The second way is to use the bootstrap method to obtain the estimates of the estimation bias and variance (e.g., Zhou, 2010). When the sample is of small or medium size, the bootstrap method is usually preferred to the plug-in method. With Theorem 4.1(ii), we next discuss using the plug-in method, i.e., to replace unknown quantities by consistent estimators in constructing point-wise confidence intervals. From (4.2) we obtain  1/2 h i d b b b b b b nH Qτ (x0 , z0 ; h, λ) − Qτ (x0 , z0 ) − b(x0 , z0 ; h, λ) → N (0, V(x0 , z0 ))

(4.3)

  b 0 , z0 ; h, b bλ) and V(x b 0 , z0 ) be estiwhere V(x0 , z0 ) = τ(1 − τ)νp0 / fe (0|x0 , z0 )2 fxz (x0 , z0 ) . Letting b(x b bλ) and V(x0 , z0 ) which will be defined in Appendix C, then for α ∈ (0, 1), the mators of b(x0 , z0 ; h, (1 − α) confidence interval of Qτ (x0 , z0 ) is given by 

s

b τ (x0 , z0 ; h, b bλ) − b(x b 0 , z0 ; h, b bλ) − c1−α/2 Q

b 0 , z0 ) V(x b τ (x0 , z0 ; h, b bλ) − b(x b 0 , z0 ; h, b bλ) + c1−α/2 , Q b nH

s

 b V(x0 , z0 )  , b nH (4.4)

where c1−α/2 is the (1 − α/2)-quantile of a standard normal random variable. e τ (x; h) b τ (x, z; h, e and Q b bλ) Letting S ⊂ Dx , we next state the uniform convergence rates for Q uniformly over x ∈ S. T HEOREM 4.2. (i) Suppose that the conditions of Theorem 2.2 are satisfied. Then, we have   e 1/2 −2/(4+p) e sup Q (x; h) − Q (x) = O n log n . τ τ P

(4.5)

x∈S

(ii) Suppose that the conditions of Theorem 3.2 are satisfied. Then, we have   b 1/2 −2/(4+p) b b sup Q (x, z; h, λ) − Q (x, z) = O n log n . τ τ P

(4.6)

x∈S

The uniform convergence rates in (4.5) and (4.6) above are the same as the conventional uniform 13

convergence rates for the kernel-based estimators when the theoretically optimal bandwidths are used. It is also of interest to further study the uniform distribution theory over τ and/or x ∈ Dx . For the case of purely continuous regressors, this problem has been explored in the existing literature. For example, the uniform convergence and distribution theory (over x but with τ fixed) for the kernel-based quantile estimation is studied by H¨ardle and Song (2010), and the weak convergence (uniformly over τ but with x fixed) for the local linear quantile estimation is considered by Qu and Yoon (2015). Their results can be further used to construct simultaneous confidence bands for the quantile regression functions, facilitating the relevant uniform inference. In Appendix D we extend Qu and Yoon (2015)’s uniform convergence result to the case of mixed continuous and discrete regressors using the deterministic bandwidths. However, it is very challenging to replace the non-random optimal bandwidths by the CV selected random bandwidths in the weak convergence result (uniformly over τ) developed in Appendix D. Note that the theoretically optimality result of the CV selected bandwidths in Sections 2 and 3 are point-wise result, i.e., they hold true for a fixed value τ ∈ (0, 1). One way to establish weak convergence result of the conditional quantile estimators (as indexed by τ) with CV selected random bandwidths is to obtain uniform convergence results of the CV selected bandwidths in b the sense that (h(τ) − hk (τ))/hk (τ) = op (1) for k = 1, ..., p, and (bλk (τ) − λk (τ))/λk (τ) = op (1) for k = 1, ..., q, uniformly in τ ∈ T, where T ⊂ (0, 1) is a compact subset of (0, 1). This is a non-trivial task. We leave this challenging problem as a future research topic.

5

Simulation

In this section, we use simulations to examine the finite-sample performance of the check-functionbased conditional quantile function estimator with the bandwidths chosen by the CV method. Furthermore, we compare our method with the traditional check-function-based quantile estimation that only smoothes the continuous covariate (it thus splits the sample into cells according to different values of the discrete covariate), and the nonparametric inverse-CDF estimation with the bandwidths for nonparametric CDF estimation chosen by the least-squares CV method suggested by Li, Lin and Racine (2013). We design the data generating process to capture the patterns in the empirical data in Section 6. There are three major patterns: (1) the conditional quantile curves are nonlinear in the continuous covariate; (2) the distribution of the response variable is stretched as the value of the continuous covariate increases; (3) the distribution of the response variable conditional on the continuous

14

covariate is not symmetric. The following data generating process (DGP) serves our purpose: DGP1 : Yi = X2i + Zi +

p Xi · ui , i = 1, · · · , n,

where Xi ∼ Uniform[0, 4] and Zi ∈ {0, 1} is a binary variable with P(Zi = 0) = 0.7 and P(Zi = 1) = 0.3. The distributions of both Xi and Zi are similar to the distributions of the continuous and discrete covariates in the empirical application in Section 6. The error term ui follows F(10, 10) and is centered at mean zero, which is asymmetric. The quantiles we consider in the simulation are τ = 0.10, 0.25, 0.50, 0.75, 0.90. We examine three sample sizes: n = 100, n = 200 and n = 400. For each simulation set up, the number of replications is 1000.

5.1

Performance in estimation

Tables 1 reports the simulation results of the estimation mean (average) squared error (MSE). The first column specifies the method used in the quantile estimation, where “Check (non-smooth)” stands for the check-function-based method without smoothing the discrete covariate, “Check (smooth)” stands for our proposed check-function-based method which smoothes both the continuous and discrete covariates, and “Inverse-CDF” stands for the nonparametric inverse-CDF estimation. The next five columns specify the quantiles we estimate at. The upper three rows of the MSEs are for the sample size of 100, the middle three rows are for the sample size of 200, and the lower three rows are for the sample size of 400. We observe that our proposed method which smoothes both the continuous and discrete variables performs better than the traditional check-function-based method which only smoothes the continuous variable. This result is similar to the finding in Hall, Li and Racine (2007), which is in the context of conditional mean function estimation. The main reason is that smoothing discrete variable can borrow data from the neighbouring cells to significantly reduce estimation variance, while only introducing mild biases. As a result, the finite-sample mean squared errors can be reduced. Meanwhile, we can also find from Table 1 that our check-function-based estimator with CV-selected smoothing parameters generally performs better than the inverse-CDF method, particularly at the extreme quantiles. The only case where the inverse-CDF method performs better than the check-function-based method is when τ = 0.50 where the difference in average MSEs is quite modest. For the other four quantiles, the check-function-based method outperforms the inverse-CDF method. At 90% quantile, the average MSE of the inverse-CDF method almost doubles that of the check-function-based method. We also observe a similar superior performance of our method at 90% quantile for an empirical data example which will be reported in Section 6 below. This is due to the fact that the tuning parameters for the inverse-CDF method are not

15

quantile-specific. They lack the flexibility of selecting different smoothing parameters for different conditional quantile functions. In contrast, the tuning parameters for the proposed check-functionbased method are adaptive to quantiles. Table 1: Average MSEs in DGP1 Method

τ = 0.10

τ = 0.25

Check (smooth) Check (non-smooth) Inverse-CDF

0.235 0.279 0.423

0.203 0.223 0.210

Check (smooth) Check (non-smooth) Inverse-CDF

0.129 0.154 0.249

0.121 0.138 0.125

Check (smooth) Check (non-smooth) Inverse-CDF

0.071 0.074 0.135

0.067 0.070 0.069

τ = 0.50 τ = 0.75 n = 100 0.286 0.679 0.343 0.843 0.285 0.740 n = 200 0.170 0.380 0.197 0.476 0.166 0.406 n = 400 0.097 0.225 0.105 0.254 0.095 0.238

τ = 0.90 1.807 2.782 2.994 0.983 1.494 1.647 0.594 0.766 0.927

To further investigate the behavior of the three conditional quantile estimation methods, we decompose the MSE into squared bias and variance. Tables 2 and 3 report the average squared bias and average variance, respectively. Our main focus in on the comparison between the “Check (smooth)” and “Check (non-smooth)” methods. An important finding is that the bias of “Check (smooth)” is not always greater than that of “Check (non-smooth)”. The reason is that by setting the tuning parameter for discrete covariate to 0, the “Check (non-smooth)” method experiences a situation where the bias is small and the variance is large. To minimize the MSE, the CV method tends to increase the tuning parameter for the continuous covariate, so that the bias and variance are balanced. We report the average CV-selected tuning parameters for continuous covariate in Table 4. We see that in all cases, the average bandwidth of “Check (non-smooth)” is larger than that of “Check (smooth)”. As a result, it is possible that the bias of “Check (non-smooth)” is larger than that of “Check (smooth)”. On the other hand, the “Check (smooth)” method has smaller variance than that of “Check (non-smooth)” method in all cases. In addition to the satisfactory performance with respect to MSE, our proposed check-functionbased CV method has a huge advantage in computational time against the inverse-CDF based CV method. Specifically, the proposed CV method requires computations in the order of O(n2 ), while the inverse-CDF least-squares CV method requires O(n3 ) computations. This is due to the fact that the inverse-CDF least-squares CV searches for the optimal tuning parameters over all quantiles, while the check-function-based CV focuses on a specific quantile. If one is estimating a 16

Table 2: Average squared bias in DGP1 Method

τ = 0.10

τ = 0.25

Check (smooth) Check (non-smooth) Inverse-CDF

0.047 0.072 0.267

0.029 0.028 0.054

Check (smooth) Check (non-smooth) Inverse-CDF

0.026 0.044 0.168

0.019 0.023 0.042

Check (smooth) Check (non-smooth) Inverse-CDF

0.013 0.016 0.089

0.010 0.011 0.020

τ = 0.50 τ = 0.75 n = 100 0.070 0.197 0.063 0.198 0.057 0.148 n = 200 0.037 0.105 0.037 0.108 0.031 0.074 n = 400 0.021 0.058 0.020 0.054 0.017 0.038

τ = 0.90 0.450 0.518 0.314 0.276 0.310 0.143 0.172 0.165 0.069

Table 3: Average variance in DGP1 Method

τ = 0.10

τ = 0.25

Check (smooth) Check (non-smooth) Inverse-CDF

0.188 0.207 0.155

0.175 0.195 0.155

Check (smooth) Check (non-smooth) Inverse-CDF

0.103 0.109 0.081

0.103 0.115 0.084

Check (smooth) Check (non-smooth) Inverse-CDF

0.058 0.058 0.046

0.057 0.060 0.049

τ = 0.50 τ = 0.75 n = 100 0.217 0.482 0.280 0.646 0.227 0.592 n = 200 0.133 0.275 0.160 0.368 0.135 0.332 n = 400 0.076 0.167 0.085 0.200 0.078 0.199

τ = 0.90 1.357 2.264 2.680 0.707 1.184 1.504 0.422 0.601 0.858

small number of quantiles (e.g., five quantiles: 0.10, 0.25, 0.50, 0.75 and 0.90) instead of the whole conditional distribution, the difference of one order (O(n2 ) compared with O(n3 )) can bring huge computational efficiency gain, especially for large sample applications.

5.2

Comparison with infeasible optimal bandwidths

To demonstrate the usefulness of Theorem 3.2, we compare the CV-selected bandwidths with the infeasible optimal bandwidths. Due to the complication of the leading MSE term, it is difficult to obtain a closed-form expression of the infeasible optimal bandwidth. Instead, we estimate the optimal bandwidths by minimizing the infeasible MSE in (3.7), where Xi and Zi are random 17

Table 4: Average CV-selected tuning parameters in DGP1 Method

τ = 0.10

τ = 0.25

Check (smooth) Check (non-smooth)

0.127 0.147

0.146 0.162

Check (smooth) Check (non-smooth)

0.094 0.111

0.119 0.130

Check (smooth) Check (non-smooth)

0.078 0.085

0.100 0.107

τ = 0.50 τ = 0.75 n = 100 0.196 0.185 0.200 0.211 n = 200 0.164 0.158 0.168 0.175 n = 400 0.133 0.135 0.137 0.144

τ = 0.90 0.216 0.249 0.188 0.216 0.160 0.182

sample, and Qτ (Xi , Zi ) is the true quantile regression function. We repeat the process for 1000 times and estimate the distribution of the minimizers. Figure 1 reports the distributions of the infeasible optimal and CV-selected tuning parameters for the continuous covariate, respectively. The left, center and right panels correspond to n = 100, 200 and 400, respectively. We find that the two distributions are close in all three cases. Figure 2 reports the distributions of the infeasible optimal and CV-selected tuning parameters for the discrete covariate, respectively, with the same pattern as in Figure 1. We observe that the CV selected smoothing parameters are quite close to the theoretical bench mark optimal values, supporting our theoretical results reported in Theorem 3.2.

16

16

16 Infeasible CV-Selected

Infeasible CV-Selected

14

14

12

12

12

10

10

10

8

Density

14

Density

Density

Infeasible CV-Selected

8

8

6

6

6

4

4

4

2

2

2

0

0

0.2

0.4

0.6

0

0

0.2

0.4

0.6

0

0

0.2

0.4

0.6

Continuous Covariate Tuning Parameter Continuous Covariate Tuning Parameter Continuous Covariate Tuning Parameter

Figure 1: Infeasible optimal and CV-selected tuning parameters for the continuous covariate

18

8

8

8 Infeasible CV-Selected

Infeasible CV-Selected

7

7

6

6

6

5

5

5

4

Density

7

Density

Density

Infeasible CV-Selected

4

4

3

3

3

2

2

2

1

1

1

0

0

0.5

1

Discrete Covariate Tuning Parameter

0

0

0.5

1

Discrete Covariate Tuning Parameter

0

0

0.5

1

Discrete Covariate Tuning Parameter

Figure 2: Infeasible optimal and CV-selected tuning parameters for the discrete covariate

5.3

The case of multinomial covariate

In this subsection, we further investigate the behavior of check-function-based methods with/without smoothing the discrete covariate, and consider the situation where the discrete covariate takes more than two possible values. To serve our purpose, consider the following DGP: DGP2 : Yi = X2i +

p 1 Zi + Xi · ui , i = 1, · · · , n, 10

where Zi takes 10 possible values, {0, 1, · · · , 9}, with equal probabilities 0.1, and other settings remain the same as those in DGP1. To keep the presentation concise, we only report the results for τ = 0.50, as the results for other quantiles do not show new pattern. Table 5 shows the average squared bias, variance and MSE for both the “Check (smooth)” and “Check (non-smooth)” methods. The left half shows the results for DGP2, while the right half shows the corresponding results for DGP1. The results for the “Check (smooth)” method are similar between DGP1 and DGP2, indicating that smoothing over the discrete covariate can handle the multinomial variable very well. On the other hand, for the “Check (non-smooth)” method, both bias and variance for DGP2 are inflated compared with those for DGP1.

19

Table 5: Performance in estimation with multinomial covariate Method

6

Squared Bias Variance DGP2

Check (smooth) Check (non-smooth)

0.087 0.176

0.194 1.190

Check (smooth) Check (non-smooth)

0.047 0.117

0.130 0.660

Check (smooth) Check (non-smooth)

0.033 0.084

0.072 0.336

MSE

Squared Bias Variance DGP1 n = 100 0.282 0.070 0.217 1.365 0.063 0.280 n = 200 0.178 0.037 0.133 0.777 0.037 0.160 n = 400 0.105 0.021 0.076 0.419 0.020 0.085

MSE

0.286 0.343 0.170 0.197 0.097 0.105

An empirical application

The conditional quantile regression plays an important role in empirical studies, especially in the circumstance where the conditional mean regression cannot well reflect the nonlinear relationship among the variables of interest. For example, in the studies of individual income, the mean estimate may be distorted by outliers, while the median is robust. In this section, we apply our method to study the relationship between word-of-mouth and box office revenue. As the internet penetrates our life, more and more online rating websites become prosperous. They cover movies, hotels, restaurants, etc., and help spread word-of-mouth at an unprecedented speed. The relationship between word-of-mouth and product sales is of great interest of researchers. Gilchrist and Sands (2016) document the social spillover effect in box office revenue; Duan, Gu and Whinston (2008) divide the effects of online reviews into persuasive effect and awareness effect, and show that the awareness effect is the major driver of box office revenue; Dellarocas, Zhang and Awad (2007) study the forecasting power of online reviews on future box office revenue. Our study aims to provide additional empirical evidence to this subject. We apply our proposed method to estimate the quantiles of box office revenue conditional on online rating score, and find clear upward trends in each examined quantile as online rating score increases.

6.1

Data description

Our dataset is obtained from the IMDb website1 . For each movie, we observe characteristics including gross revenue, average IMDb rating, release date, and Motion Picture Association of 1

www.imdb.com

20

America (MPAA) film ratings2 . We collect all movies that are listed on IMDb with release date between 2011 and 2013, and MPAA rating being either “PG” (Parental Guidance Suggested) or “PG-13” (Parents Strongly Cautioned). The number of movies in the sample is 209. Table 6 below reports the summary statistics of the variables that we will use in the following empirical analysis. The gross revenue variable has a similar pattern as individual income in that there are outliers with extremely high values. The conditional mean may not reflect the behavior of gross revenue very well, hence we favor using the conditional quantiles to examine the relationship between online rating scores and movie box office revenues. The average IMDb rating has a distribution that centers around 6 and 7. The probability density declines as the average rating deviate from the center. The discrete covariate MPAA takes two values: 0 for “PG-13” and 1 for “PG”. In our sample, there are 27% of the movies with “PG” rating, and 73% of the movies with “PG-13” rating. Table 6: Summary statistics for the empirical data set Mean Gross revenue (in million dollars) 92.90 Average IMDb rating 6.32 MPAA rating (“PG”=1, “PG-13”=0) 0.27

Std Min 95.45 6.00 0.98 3.4 0.44 0

Max 623.36 8.5 1

In the quantile regression model, the dependent variable y is the gross revenue. The continuous explanatory variable x is the average IMDb rating. The discrete explanatory variable z is the MPAA status. We estimate the quantiles (10%, 25%, 50%, 75% and 90%) of y conditional on x and z with our proposed method. We also compare the performance of our method with two competing methods: the check-function-based method without smoothing discrete covariate and the inverse-CDF method.

6.2

Empirical results

Figure 3 shows the estimated conditional quantiles of gross revenue conditional on the average IMDb rating (the continuous covariate) and MPAA status (the discrete covariate) as “PG”, using the proposed check-function-based method with smoothing discrete covariates. The five curves from top to bottom represent the 90%, 75%, 50%, 25% and 10% quantiles, respectively. The figure shows two patterns: (i) as the average IMDb rating increases, all five quantiles increases; and (ii) 2

The Motion Picture Association of America film rating system is used to rate a film’s suitability for certain audiences based on its content.

21

the distribution of the gross revenue is stretched, i.e., the 10% quantile increases slowly whereas the 90% quantile increases fast. Figure 4 depicts the estimated conditional quantiles by the check-function-based method without smoothing the discrete covariate. It displays somewhat similar patterns to Figure 3. However, the curves in Figure 4 are generally much more wiggly than the curves in Figure 3. For the 90% quantile, there are two big surges at around x = 6.2 and x = 7.3, which is likely caused by data sparsity. Also, for the 75% quantile, there is a rapid up-and-down between x = 7.7 and x = 8.0. This is again due to the data scarcity around that region. On the other hand, with the technique of smoothing the discrete variable, we may “borrow” data information from the other category to help stabilize the estimates, which can be seen at same regions in Figure 3. Figure 5 shows the estimated conditional quantiles by the inverse-CDF method. The curves in Figure 5 are quite similar to those in Figure 3, which is unsurprising since both estimators are consistent with the same convergence rate. It is worth mentioning again that our check-functionbased method has computational advantage against the inverse-CDF method.

450 90% Quantile 75% Quantile 50% Quantile 25% Quantile 10% Quantile

400

Gross Revenue

350 300 250 200 150 100 50 0

4

4.5

5

5.5

6

6.5

7

7.5

8

Average IMDb Rating

Figure 3: Estimated conditional quantiles of gross revenue using check-function-based method with smoothing discrete covariates To further investigate the differences among the three quantile regression estimation methods, we report all the CV selected tuning parameters in Table 7. The first column specifies the method used in the estimation. The next five columns specify the quantiles which we estimate at. The 22

450 90% Quantile 75% Quantile 50% Quantile 25% Quantile 10% Quantile

400

Gross Revenue

350 300 250 200 150 100 50 0

4

4.5

5

5.5

6

6.5

7

7.5

8

Average IMDb Rating

Figure 4: Estimated conditional quantiles of gross revenue using check-function-based method without smoothing discrete covariates 450 90% Quantile 75% Quantile 50% Quantile 25% Quantile 10% Quantile

400

Gross Revenue

350 300 250 200 150 100 50 0

4

4.5

5

5.5

6

6.5

7

7.5

8

Average IMDb Rating

Figure 5: Estimated conditional quantiles of gross revenue using inverse-CDF method with smoothing discrete covariates

23

Table 7: CV selected tuning parameters in the empirical study Method

τ = 0.10 τ = 0.25 τ = 0.50 τ = 0.75 τ = 0.90 smoothing parameter for the continuous covariate Check (smooth) 0.430 0.481 0.455 0.464 0.242 Check (non-smooth) 0.323 0.490 0.369 0.458 0.425 Inverse-CDF 0.454 (uniform in quantiles) smoothing parameter for the categorical covariate Check (smooth) 0.646 0.242 0.428 0.985 0.266 Check (non-smooth) 0 (uniform in quantiles) Inverse-CDF 0.948 (uniform in quantiles) upper and lower halves of the table correspond to the CV selected smoothing parameters for the continuous and discrete covariates, respectively. The inverse-CDF method has the same tuning parameters across all quantiles. The check-function-based method without smoothing the discrete covariate sets the smoothing parameter for the discrete covariate as 0. The tuning parameters for the continuous covariate are similar across the three methods except for the case of τ = 0.9, which are around 0.45. It means that all the three methods show similar “opinions” towards the smoothing on the continuous covariate. The tuning parameters for the discrete covariate for “Check (smooth)” varies from 0.242 to 0.985. At some quanitles, the crossvalidation considers the data points from the other category to have more similarity, while at other quantiles they have less similarity. For the inverse-CDF method, it chooses 0.948 as the tuning parameter for the discrete covariate, lacking the flexibility across quantiles. Note that at the 90% quantile, the “Check (smooth)” method selects a pair of tuning parameters that are much smaller than the pair chosen by the inverse-CDF method. One possible explanation to this result is that the 90th -percentile conditional quantile function is less smooth than lower quantile functions so that the corresponding optimal smoothing parameter h is smaller than those for lower quantile functions. In contrast, the inverse-CDF method does not have the flexibility of selecting different smoothing parameters for different quantile functions. To examine whether this explanation is plausible, we assess their out-of-sample prediction performance at 90% quantile. We randomly split the sample into estimation sample (90% of the sample size = 189) and validation sample (10% of the sample size = 20). We use the estimation sample to forecast the conditional 90% quantile in the validation sample. The out-of-sample prediction accuracy is measured by the average check-function value (over the validation sample). The out-of-sample average check function value is defined by i 1 X h out b out ρτ Yi − Qτ (Xout , Z ) , i i 20 i=1 20

24

with τ = 0.9,

b τ (·) is the where Yiout , Xout and Zout are the i-th data point in the validation sample, and Q i i estimated conditional 90% quantile function either by the check-function-based method or the inverse-CDF method using the estimation sample. We repeat the above process for 1000 times. From Table 8, the mean and standard deviation (over the 1000 replications) of the average check function value for check-function-based method are 18.53 and 6.27, and the counter parts for inverse-CDF method are 43.56 and 47.26. Thus, for estimating the 90-percentile conditional quantile function, our proposed method produces much smaller out-of-sample prediction error than the inverse CDF-method. This again shows the advantage of check-function-based method for being able to deploy quantile-specific tuning parameters. Table 8: Out-of-sample forecasting performance Mean Std

7

Check (smooth) 18.51 6.28

Check (non-smooth) 18.74 6.62

Inverse-CDF 43.39 47.10

Conclusion

In this paper, we study the nonlinear conditional quantile function estimation when the covariates include both continuous and discrete components. We combine the quantile check function and the local linear smoothing technique with the mixed continuous and discrete kernel function to directly estimate the conditional quantile function, making it substantially different from the method proposed in Li, Lin and Racine (2013) which first estimates the CDF nonparametrically and then inverts the estimated CDF to obtain the quantile estimation. One advantage of our method is that it selects quantile (τ) dependent optimal smoothing parameters, while the inverse check function based method does not have this flexibility. Another advantage of our method is that, in addition to the conditional quantile function estimation, we also obtain the estimation of derivatives (of the quantile regression function with respect to the continuous components), whereas it is difficult to obtain derivative function estimation and to derive the related asymptotic theory if one uses the inverse-CDF method to estimate conditional quantile function. The bandwidths involved in the proposed nonparametric quantile regression estimation are determined by a completely data-driven CV criterion. We derive the asymptotic optimality property of the CV selected bandwidths, and establish the point-wise asymptotic normal distribution theory and uniform consistency for the local linear quantile estimator using the data-dependent smoothing parameters determined by the CV criterion. Simulations are used to examine the finite-sample behavior of the proposed method. We find that our method has better 25

small-sample performance than the naive local linear quantile estimation without smoothing the discrete regressors and the nonparametric inverse-CDF method. In addition, the CV selected bandwidths are very close to the theoretically optimal bandwidths in finite samples. Furthermore, the proposed nonparametric quantile estimation method is used to study the relationship between box office revenues and online rating scores. Our empirical results suggest that as the average online rating score increases, the examined quantiles of box office revenue increases, and higher quantile functions increases faster than lower quantile functions.

Acknowledgements The authors would like to thank the co-editor, an Associate Editor and two anonymous reviewers for their insightful comments that greatly improve our paper.

Appendix for “Nonparametric Quantile Regression Estimation with Mixed Discrete and Continuous Data” A

Assumptions P

Throughout the Appendices A and B, we write an ∼ bn and an ∼ bn to denote that an = bn (1 + o(1)) and an = bn (1 + oP (1)), respectively; let an  bn denote that an = O(bn ) and bn = O(an ) hold jointly. Below we first give some regularity conditions which will be used to prove the main theoretical results of the paper. A SSUMPTION 1. The kernel function K(·) is a bounded, Lipschitz continuous and symmetric probability density function with a compact support [−1, 1]. A SSUMPTION 2. The sequence of {(Yi , Xi , Zi )} is composed of independent and identically distributed (i.i.d.) random vectors. |

|

A SSUMPTION 3. (i) The conditional density function of e?i ≡ Yi − Qτ (Xi ) for given Xi = x, f?e (·|x), exists and is continuous at point zero. Furthermore, f?e (0|·) is continuous and bounded away from infinity and zero uniformly over Dx . The conditional CDF of e?i for given Xi = x, F?e (·|x), is continuous with respect to x. In addition, the probability density function of Xi , fx (x), is bounded away from infinity and zero over Dx . 26

(ii) The conditional density function of ei ≡ Yi − Qτ (Xi , Zi ) for given Xi = x and Zi = z, fe (·|x, z), exists and is continuous at point zero. Furthermore, fe (0|x, z) is continuous with respect to x and bounded away from infinity and zero uniformly for x ∈ Dx and z ∈ Dz . The conditional CDF of ei for given Xi = x and Zi = z, Fe (·|x, z), is continuous with respect to x. The joint probability density function of Xi and Zi , fxz (·, ·), is bounded away from infinity and zero over Dx and Dz . A SSUMPTION 4. (i) The quantile regression function (with only continuous regressors) Qτ (·) is twice continuously differentiable on Dx . The weight function M1 (·) is continuous and bounded on Dx , and M1 (x) = 0 when x 6∈ Dx (h) = {(x1 , · · · , xp ) : ck +hk 6 xk 6 ck −hk , k = 1, · · · , p}. (ii) The quantile regression function (with mixed continuous and discrete regressors) Qτ (·, z) is twice continuously differentiable on Dx . The weight function M2 (·, z) is continuous and bounded on Dx , and M2 (x, z) = 0 when x 6∈ Dx (h) = {(x1 , · · · , xp ) : ck + hk 6 xk 6 ck − hk , k = 1, · · · , p}. A SSUMPTION 5. (i) The bandwidths in the local linear quantile estimation satisfy that as n → ∞, Q hk → 0 for k = 1, 2, · · · , p, and nH = n pk=1 hk → ∞. In addition,   !2 p p X X 1 1  h2k = o  . h2k ∨ 1/2 n k=1 nH k=1 (ii) Let hk → 0 for k = 1, 2, · · · , p, nH = n n → ∞. In addition, 1 n1/2

p X k=1

h2k +

q X

Qp



! λk

k=1

= o

k=1

(A.1)

hk → ∞, and λk → 0 for k = 1, 2, · · · q as

p X k=1

h2k +

q X k=1



!2 λk



1  . nH

(A.2)

Assumption 1 imposes some mild conditions on the continuous kernel function in the nonparametric kernel-based smoothing, and several commonly-used kernel functions such as the uniform kernel and the Epanechnikov kernel satisfy these conditions (c.f., Fan and Gijbels, 1996; Li and Racine, 2007). In Assumption 2, we impose the i.i.d. condition on the observations, which has been commonly used in the literature on nonparametric estimation with mixed discrete and continuous data (c.f., Li and Racine, 2004). The asymptotic results of this paper can be generalized to the general stationary and weakly dependent processes at the cost of more lengthy proofs. There is no moment condition on e?i (or ei ) to estimate the conditional quantile regression, indicating that 27

the error distribution is allowed to have heavy tails. Assumptions 3 and 4 give some smoothness conditions on the (conditional) density functions, the weight function and the quantile regression function, which are often imposed in studying asymptotic behavior of nonparametric kernel estimators. Assumption 5 imposes some standard restrictions on the smoothing parameters. The conditions (A.1) and (A.2) are crucial to guarantee that the second term on the right hand side of (2.6) and (3.6) are the asymptotic leading terms to determine the optimal bandwidth vectors. In some special case, these two conditions could be simplified or even removed. For example, if hk  n−δ , (A.1) is equivalent to that 1/(2p + 4) < δ < 1/4, which indicates that (A.1) is automatically satisfied. If, in addition, λk  n−2δ , we may show that (A.2) holds as well. It is easy to check that the cross-validation (CV) function is minimized at the order of (n−4/5 ) when hk  n−1/(4+p) λk  n−2/(4+p) , these smoothing parameter values satisfy (A.1) and (A.2). ‘

B

Proofs of the main results

In this appendix, we give the detailed proofs of the asymptotic results stated in Sections 2–4. Note that Proposition 2.1 and Theorem 2.2 can be seen as a special case of Proposition 3.1 and Theorem 3.2, respectively. To save space, we next omit the proofs of Proposition 2.1 and Theorem 2.2 as they are similar to (in fact simpler than) the proofs of Proposition 3.1 and Theorem 3.2 given below. In order to simplify the presentation, we first introduce some notation. Let √ nH [α − Qτ (x, z)] , √   vn,k (βk ; x, z) = nH hk βk − hk Q0τ,k (x, z) , "  # p X 1 Xj,k − xk ∆nj (α, β; x, z) = √ un (α; x, z) + vn,k (βk ; x, z) , h nH k k=1 un (α; x, z) =

and bj (x, z) = Qτ (Xj , Zj ) − Qτ (x, z) −

p X

Q0τ,k (x, z)(Xj,k − xk ),

(B.1)

k=1

where x = (x1 , · · · , xp ) ∈ Dx and z = (z1 , · · · , zq ) ∈ Dz . With the above notation, it is easy to see that | Yj − α − (Xj − x) β = ej − ∆nj (α, β; x, z) + bj (x, z). (B.2) |

|

b (−i) (Xi , Zi ) ≡ P ROOF OF P ROPOSITION 3.1. For notational simplicity, throughout the proof, we let Q b (−i) (Xi , Zi ; h, λ) and Q b (−i) (Xi , Zi ) − Qτ (Xi , Zi ). ζ(−i) (Xi , Zi ) = Q (B.3) 28

Note that  1X   b (−i) (Xi , Zi ) M2 (Xi , Zi ) ρτ ei + Qτ (Xi , Zi ) − Q CV(h, λ) = n i=1 n

 1X 1 X   ρτ (ei )M2 (Xi , Zi ) + ρτ ei − ζ(−i) (Xi , Zi ) − ρτ (ei ) M2 (Xi , Zi ) n i=1 n i=1 n

=

n

≡ CV1 + CV2 (h, λ).

(B.4)

It is easy to see that CV1 does not rely on the smoothing parameters h and λ, indicating that this term would not play any role in choosing the optimal bandwidth vectors. Thus, to complete the proof of Proposition 3.1, we only need to study CV2 (h, λ). Using the following identity result (e.g., Knight, 1998): Zv 





ρτ (u − v) − ρτ (u) = v I{u60} − τ +

 I{u6w} − I{u60} dw,

(B.5)

0

we have ρτ ei 



− ζ(−i) (Xi , Zi ) −

ρτ (ei )

Z ζ(−i) (Xi ,Zi ) = ζ(−i) (Xi , Zi ) I

{ei 60}



−τ + 0

 I{ei 6w} − I{ei 60} dw.

Define n  1X CV21 (h, λ) = ζ(−i) (Xi , Zi ) I{ei 60} − τ M2 (Xi , Zi ), n i=1 Z ζ(−i) (Xi ,Zi ) n  1X M2 (Xi , Zi ) I{ei 6w} − I{ei 60} dw. CV22 (h, λ) = n i=1 0

Then, we have CV2 (h, λ) = CV21 (h, λ) + CV22 (h, λ).

(B.6)

In order to derive the asymptotic orders for CV21 (h, λ) and CV22 (h, λ), we need to first show that ζ(−i) (Xi , Zi ) has the following asymptotic representation: P

ζ(−i) (Xi , Zi ) ∼ −s−1 0 (Xi , Zi )W(−i),0 (Xi , Zi ) uniformly over i, where s0 (x, z) = µ0 fxz (x, z)fe (0|x, z), 1X W(−i),0 (Xi , Zi ) = ηj (Xi , Zi )Kh (Xj − Xi )Λλ (Zj , Zi ), n j6=i n

29

(B.7)

and ηj (x, z) = I{ej 6−bj (x,z)} − τ. As in (3.3), we define 1X  |  ρτ Yj − α − (Xj − Xi ) β Kh (Xj − Xi )Λλ (Zj , Zi ) n j6=i n

L(−i) (α, β; Xi , Zi ) = and

 1X   ρτ ej + bj (Xi , Zi ) Kh (Xj − Xi )Λλ (Zj , Zi ) L(−i) (Xi , Zi ) = n j6=i n

which is unrelated to α and β. Using the equation (B.2), it is straightforward to show that minimizing the leave-one-out loss function L(−i) (α, β; Xi , Zi ) is equivalent to minimizing e L(−i) (α, β; Xi , Zi ) ≡ L(−i) (α, β; Xi , Zi ) − L(−i) (Xi , Zi ) n   1 X = ρτ ej + bj,i − ∆j,i − ρτ ej + bj,i Kh (Xj − Zi )Λλ (Zj , Zi ), n j6=i where bj,i ≡ bj (Xi , Zi ) and ∆j,i ≡ ∆nj (α, β; Xi , Zi ). With Knight (1998)’s identity result again, letting u = ej + bj,i and v = ∆j,i in (B.5), we can rewrite e L(−i) (α, β; Xi , Zi ) as 

 1 e L(−i) (α, β; Xi , Zi ) = Ψ(−i) (Xi , Zi ) + un (α; Xi , Zi ) √ W(−i),0 (Xi , Zi ) + nH   p X 1 W(−i),k (Xi , Zi ) , vn,k (βk ; Xi , Zi ) √ nH k=1 where un (α, ·, ·) and vn,k (βk , ·, ·), k = 1, · · · , p, are defined at the beginning of this appendix, 1X Ψ(−i) (Xi , Zi ) = Kh (Xj − Xi )Λλ (Zj , Zi ) n j6=i n

W(−i),0 (Xi , Zi ) =

1 n

n X

Z ∆j,i h i I{ej 6w−bj,i } − I{ej 6−bj,i } dw, 0

ηj (Xi , Zi )Kh (Xj − Xi )Λλ (Zj , Zi ),

j6=i

  n 1X Xj,k − Xi,k W(−i),k (Xi , Zi ) = ηj (Xi , Zi ) Kh (Xj − Xi )Λλ (Zj , Zi ), k = 1, · · · , p. n j6=i hk We next consider Ψ(−i) (Xi , Zi ). Letting Xn = (X1 , · · · , Xn ) and Zn = (Z1 , · · · , Zn ), by Assump-

30

tions 2 and 3 (i), we have that 1X ∼ Kh (Xj − Xi )Λλ (Zj , Zi ) n j6=i n

E Ψ(−i) (Xi , Zi )|Xn , Zn 



P

Z ∆j,i

  wfe − bj,i |Xj , Zj dw

0

1 X Kh (Xj − Xi )Λλ (Zj , Zi )fe (0|Xj , Zj )∆2j,i 2n j6=i n

P



(B.8)

uniformly over 1 6 i 6 n. Using the definitions of un (α, ·, ·) and vn,k (βk , ·, ·), k = 1, · · · , p, we further have 1X Kh (Xj − Xi )Λλ (Zj , Zi )fe (0|Xj , Zj )∆2j,i n j6=i n

= [un (α; Xi , Zi ), vn,1 (β1 ; Xi , Zi ), · · · vn,p (βp ; Xi , Zi )] S(−i) (Xi , Zi ) |

[un (α; Xi , Zi ), vn,1 (β1 ; Xi , Zi ), · · · vn,p (βp ; Xi , Zi )] , where

   (nH) · S(−i) (Xi , Zi ) =   

0,p 0,1 s0,0 (−i) (Xi , Zi ) s(−i) (Xi , Zi ) · · · s(−i) (Xi , Zi ) 1,p 1,1 s1,0 (−i) (Xi , Zi ) s(−i) (Xi , Zi ) · · · s(−i) (Xi , Zi ) .. .. .. .. . . . . p,0 p,1 p,p s(−i) (Xi , Zi ) s(−i) (Xi , Zi ) · · · s(−i) (Xi , Zi )

(B.9)

     

with 1X = n j6=i n

sk,l (−i) (Xi , Zi )



Xj,k − Xi,k hk

I{k>0} 

Xj,l − Xi,l hl

I{l>0}

Kh (Xj − Xi )Λλ (Zj , Zi )fe (0|Xj , Zj ).

With the uniform consistency result in Mack and Silverman (1982), we may show that 1 X Kh (Xj − x)Λλ (Zj , z)fe (0|Xj , Zj ) = fxz (x, z)fe (0|x, z) + oP (1), n−1 j6=i  n  1 X Xj,k − xk Kh (Xj − x)Λλ (Zj , z)fe (0|Xj , Zj ) = oP (1), k = 1, · · · , p, n−1 hk j6=i   n  X Xj,k − xk Xj,l − xl 1 Kh (Xj − x)Λλ (Zj , z)fe (0|Xj , Zj ) = oP (1), 1 6 k 6= l 6 p, n−1 hk hl j6=i  n  1 X Xj,k − xk 2 Kh (Xj − x)Λλ (Zj , z)fe (0|Xj , Zj ) = µ2 fxz (x, z)fe (0|x, z) + oP (1), k = 1, · · · , p, n−1 hk n

j6=i

31

uniformly for x ∈ Dx and z ∈ Dz , indicating that uniformly over 1 6 i 6 n P

(nH)S(−i) (Xi , Zi ) ∼ S (Xi , Zi ),

(B.10)

where S (x, z) is a (p + 1) × (p + 1) diagonal matrix defined as S (x, z) = diag {fxz (x, z)fe (0|x, z), µ2 fxz (x, z)fe (0|x, z) · Ip } . In view of (B.8)–(B.10), similarly to the proof of Theorem 3.1 in Kai, Li and Zou (2011), we have that P

(nH) · Ψ(−i) (Xi , Zi ) ∼

1 [un (α; Xi , Zi ), vn,1 (β1 ; Xi , Zi ), · · · vn,p (βp ; Xi , Zi )] S (Xi , Zi ) 2 | [un (α; Xi , Zi ), vn,1 (β1 ; Xi , Zi ), · · · vn,p (βp ; Xi , Zi )] (B.11)

uniformly over 1 6 i 6 n. Let W(−i) (Xi , Zi ) =

√  | nH W(−i),0 (Xi , Zi ), W(−i),1 (Xi , Zi ), · · · , W(−i),p (Xi , Zi ) .

For given Xi and Zi , we note that (nH) · e L(−i) (α, β; Xi , Zi ) − [un (α; Xi , Zi ), vn,1 (β1 ; Xi , Zi ), · · · vn,p (βp ; Xi , Zi )] W(−i) (Xi , Zi ) converges in probability to the probability limit of the right hand side of (B.11), which is a convex function. Then, by Pollard (1991)’s convexity lemma, we complete the proof of (B.7). ei = I{ei 60} − τ, by (B.7), we have that Letting η 1X CV21 (h, λ) = ζ(−i) (Xi , Zi )e ηi M(Xi , Zi ) n i=1 n

P



n n X 1 X 1X ei b(Xi , Zi ; h, λ)M2 (Xi , Zi ) − 2 ei η ej Kh (Xj − Xi ) Λλ (Zi , Zj ) η M2 (Xi , Zi ) η n i=1 n i=1 j6=i

≡ CV211 (h, λ) + CV212 (h, λ),

(B.12)

where b(·, ·, h, λ) is defined as in Section 3 and M2 (Xi , Zi ) = M2 (Xi , Zi )/s0 (Xi , Zi ). It is easy to show that CV212 (h, λ) = OP (n−1 H−1/2 ) = oP (1/(nH)) . (B.13)

32

By (A.2) in Assumption 5 (ii), we have X p

n−1/2

X



q

h2k + n−1/2

k=1

λk = o 

k=1

or −1/2

n

p X

h2k

h2k +

λk



k=1

λk = o (1/(nH)) .

k=1

k=1

Letting

q X

X

!2 

q

k=1

−1/2

+n

X p

p µ2 X 2 00 bc (Xi , Zi ; h) = h Q (Xi , Zi ) 2 k=1 k τ,k

and bd (Xi , Zi ; λ) =

q X X z1 ∈Dz

f (Xi , z1 ) [Qτ (Xi , z1 ) − Qτ (Xi , Zi )] , Ik (z1 , Zi )λk  f (Xi , Zi ) k=1

we can thus prove that 1X 1X ei M2 (Xi , Zi )bc (Xi , Zi ; h) + ei M2 (Xi , Zi )bd (Xi , Zi ; λ) η η CV211 (h, λ) = n i=1 n i=1 ! ! p q X X = OP n−1/2 h2k + OP n−1/2 λk n

n

k=1

 = oP 

p X

k=1

h2k +

k=1

q X



!2 λk

+ 1/(nH) .

(B.14)

k=1

Using (B.12)–(B.14), one can show that

CV21 (h, λ) = OP

n−1/2

p X k=1

h2k + n−1/2

q X

! λk + n−1 H−1/2

 = oP 

k=1

p X k=1

h2k +

q X

!2 λk

 + 1/(nH) .

k=1

(B.15) We finally consider CV22 (h, λ). It is easy to verify that   P CV22 (h, λ) ∼ E CV22 (h, λ) (Xn , Zn )

33

(B.16)

and 1X M2 (Xi , Zi )fe (0|Xi , Zi ) n i=1 n

 P E CV22 (h, λ) (Xn , Zn ) ∼ 

Z ζ(−i) (Xi ,Zi ) wdw 0

1 X 2 ζ (Xi , Zi )M2 (Xi , Zi )fe (0|Xi , Zi ). 2n i=1 (−i) n

P



(B.17)

Furthermore, we can prove that 1X 2 1X 2 P ζ(−i) (Xi , Zi )M2 (Xi , Zi )fe (0|Xi , Zi ) ∼ b (Xi , Zi ; h, λ)M2 (Xi , Zi )fe (0|Xi , Zi ) + n i=1 n i=1 n

n

1X 2 σ (Xi , Zi ; h)M2 (Xi , Zi )fe (0|Xi , Zi ), n i=1 n

(B.18)

where b(Xi , Zi ; h, λ) and σ2 (Xi , Zi ; h) are defined in Section 3. The detailed proof of (B.18) will be provided later on. By (B.16)–(B.18), we have that  1 X 2 CV22 (h, λ) ∼ b (Xi , Zi ; h, λ) + σ2 (Xi , Zi ; h) M2 (Xi , Zi )fe (0|Xi , Zi ). 2n i=1 n

P

(B.19)

2 Pp Pq 2 Noting that CV22 (h, λ) has the asymptotic order of + 1/(nH), by (B.15), k=1 hk + k=1 λk CV21 (h, λ) is asymptotically negligible (compared with CV22 (h, λ)). Then, we complete the proof of Proposition 3.1 by (B.4), (B.6), (B.15), (B.19) and the Law of Large Numbers.  ej = I{ej 60} − τ and rewrite W(−i),0 (Xi , Zi ) as P ROOF OF (B.18). Recall that η W(−i),0 (Xi , Zi ) =

1X 1X ej ] Kh (Xj − Xi )Λλ (Zj , z) + [ηj (Xi , Zi ) − η η˜ j Kh (Xj − Xi )Λλ (Zj , z) n j6=i n j6=i

e(−i) (Xi , Zi ) + e ≡ b v(−i) (Xi , Zi ). Let −1 e b(−i) (Xi , Zi ) = s−1 v(−i) (Xi , Zi ). 0 (Xi , Zi )b(−i) (Xi , Zi ) and v(−i) (Xi , Zi ) = s0 (Xi , Zi )e

34

(B.20)

Hence, by (B.7) and (B.20), we readily have that 1X 2 1X 2 P ζ(−i) (Xi , Zi )M2 (Xi , Zi )fe (0|Xi , Zi ) ∼ b (Xi , Zi )M2 (Xi , Zi )fe (0|Xi , Zi ) + n i=1 n i=1 (−i) n

n

2X b(−i) (Xi , Zi )v(−i) (Xi , Zi )M2 (Xi , Zi )fe (0|Xi , Zi ) + n i=1 n

1X 2 v (Xi , Zi )M2 (Xi , Zi )fe (0|Xi , Zi ). n i=1 (−i) n

(B.21)

We next consider the three terms on the right hand side of (B.21) separately. It is easy to see that the first term leads to the asymptotic bias term, i.e., 1X 2 P 1 X 2 b(−i) (Xi , Zi )M2 (Xi , Zi )fe (0|Xi , Zi ) ∼ b (Xi , Zi ; h, λ)M2 (Xi , Zi )fe (0|Xi , Zi ). n i=1 n i=1 n

n

(B.22)

By the uniform consistency of the nonparametric kernel smoothing (c.f., Mack and Silverman, 1982), following the proof of (B.14) above and using (A.2) in Assumption 5 (ii), we have that 1X b(−i) (Xi , Zi )v(−i) (Xi , Zi )M2 (Xi , Zi )fe (0|Xi , Zi ) n i=1 ! p q X X = OP n−1/2 h2k + n−1/2 λk n

k=1

 = oP 

X

X q

p

k=1

k=1

h2k +



!2

+ 1/(nH) .

λk

k=1

Hence, with (B.21)–(B.23), to complete the proof of (B.18), it is sufficient for us to show that 1X 2 P 1 X 2 v(−i) (Xi , Zi )M2 (Xi , Zi )fe (0|Xi , Zi ) ∼ σ (Xi , Zi ; h)M2 (Xi , Zi )fe (0|Xi , Zi ). n i=1 n i=1 n

n

35

(B.23)

Observe that n X

=

i=1 n X

M2 (Xi , Zi )fe (0|Xi , Zi ) M2 (Xi , Zi )fe (0|Xi , Zi )

n X n X

 ej Kh Xj − Xi Kh (Xk − Xi ) Λλ (Zj , Zi )Λλ (Zk , Zi )e η ηk

j6=i k6=i n X n X

i=1

j6=i k6=i,j

n X

n X

M2 (Xi , Zi )fe (0|Xi , Zi )

i=1

 ej Kh Xj − Xi Kh (Xk − Xi ) Λλ (Zj , Zi )Λλ (Zk , Zi )e η ηk +

 e2j K2h Xj − Xi Λ2λ (Zj , Zi ). η

(B.24)

j6=i

With the uniform consistency result of the nonparametric kernel smoothing and the Law of Large Numbers, we have that n n X 1 X  e2j K2h (Xj − Xi ) Λ2λ (Zj , Zi ) M (X , Z )f (0|X , Z ) η 2 i i e i i n3 i=1 j6=i " #   n n 1 X 2 1 X X − X j i e = η M2 (Xi , Zi )fe (0|Xi , Zi )K2 Λ2λ (Zj , Zi ) n2 H j=1 j nH i6=j h

ν0 X 2 e f(Xj , Zj )M2 (Xj , Zj )fe (0|Xj , Zj ) η n2 H j=1 j n

P



τ(1 − τ)ν0 1 X · f(Xi , Zi )M2 (Xi , Zi )fe (0|Xi , Zi ). nH n i=1 n

P



(B.25)

By some standard calculations, one can easily show that n n X n X  1 X  ej Kh Xj − Xi Kh (Xk − Xi ) Λλ (Zj , Zi )Λλ (Zk , Zi )e M (X , Z )f (0|X , Z ) η ηk 2 i i i i e 3 n i=1 j6=i k6=i,j    = OP n−3/2 H−1 + n−1 = oP (nH)−1 . (B.26)

The proof of (B.18) is completed by using (B.24)–(B.26) and noting that P

s0 (Xi , Zi ) ∼ fxz (Xi , Zi )fe (0|Xi , Zi ) uniformly over 1 6 i 6 n.



P ROOF OF T HEOREM 3.2. From the asymptotic representation of the CV function in Proposition 3.1, CV1 does not rely on the bandwidth vectors h and λ. The second term on the right hand side of (3.6) is the asymptotic leading term for the optimal bandwidth selection. Therefore, the CV

36

b and bλ satisfy that selected bandwidth vectors h b = h◦ + oP (h◦ ) and bλ = λ◦ + oP (λ◦ ), h where h◦ and λ◦ are the bandwidth vectors that minimize the second term on the right hand side of (3.6). Note that the latter equals to 12 · MSEL (h, λ), see the definition in (3.8). Under the assumption that there exist uniquely determined a0 > 0, k = 1, · · · , p, and dk > 0, k = 1, · · · , q, that minimize | | ξ (a, d) defined in (3.9), we must have that h◦ = h = h1 , · · · , hp and λ◦ = λ = λ1 , · · · , λq with hk = ak · n−1/(p+4) and λk = dk · n−2/(p+4) . The completes the proof of Theorem 3.2.  P ROOF OF T HEOREM 4.1. We only provide the detailed proof of (4.2) for the case of mixed discrete and continuous regressors as the proof of (4.1) is very similar (and in fact simpler). Let ∆0 (x0 , z0 ; h, λ) = (nH) and

1/2

h i b Qτ (x0 , z0 ; h, λ) − Qτ (x0 , z0 )



 b 0 (x0 , z0 ; h, λ) − h1 Q0 (x0 , z0 ) h1 Q τ,1 τ,1   1/2  . . , . . ∆1 (x0 , z0 ; h, λ) = (nH)  ..  0 0 b hp Q (x0 , z0 ; h, λ) − hp Q (x0 , z0 ) τ,p

τ,p

where we make the dependence of the local linear quantile estimators on the bandwidths h and λ explicit. We first derive the asymptotic normality for ∆0 (x0 , z0 ; h, λ) and ∆1 (x0 , z0 ; h, λ) when the bandwidths h and λ are chosen as the deterministic optimal bandwidths defined in Theorem 3.2, Q i.e., h = h , λ = λ and H = pk=1 hk . Following the arguments in the proof of Proposition 3.1, we have the following representation: "

∆0 (x0 , z0 ; h , λ ) ∆1 (x0 , z0 ; h , λ )

#

P

  ∼ −S−1  (x0 , z0 )Wn (x0 , z0 ; h , λ ),

(B.27) |

where Wn (x0 , z0 ; h , λ ) = [W0 (x0 , z0 ; h , λ ), W1 (x0 , z0 ; h , λ ), · · · , Wp (x0 , z0 ; h , λ )] with n H X ηi (x0 , z0 )Kh (Xi − x0 )Λλ (Zi , z0 ) W0 (x0 , z0 ; h , λ ) = √ nH i=1 



and   n H X Xi,k − x0 Wk (x0 , z0 ; h , λ ) = √ ηi (x0 , z0 ) Kh (Xi − x0 )Λλ (Zi , z0 ), k = 1, · · · , p. h0k nH i=1 



Thus, to establish the asymptotic distribution theory of ∆0 (x0 , z0 ; h , λ ) and ∆1 (x0 , z0 ; h , λ ), we 37

only need to derive the limiting distribution of Wn (x0 , z0 ; h , λ ). ei = I{ei 60} − τ. Let f Wn (x0 , z0 ; h , λ ) be defined as Wn (x0 , z0 ; h , λ ) with ηi (x0 , z0 ) replaced by η Then, we have Wn (x0 , z0 ; h , λ ) − E [Wn (x0 , z0 ; h , λ )] h i = f Wn (x0 , z0 ; h , λ ) − E f Wn (x0 , z0 ; h , λ ) + Wn (x0 , z0 ; h , λ ) − f Wn (x0 , z0 ; h , λ ) − h i     f E Wn (x0 , z0 ; h , λ ) − Wn (x0 , z0 ; h , λ ) . (B.28) Note that h i Var Wn (x0 , z0 ; h , λ ) − f Wn (x0 , z0 ; h , λ ) 

2 

    6 E Wn (x0 , z0 ; h , λ ) − f Wn (x0 , z0 ; h , λ )  

2

    f = E E Wn (x0 , z0 ; h , λ ) − Wn (x0 , z0 ; h , λ ) (Xn , Zn ) n h i H X 2 ei )2 (Xn , Zn ) E Kh (Xi − x0 )Λ2λ (Zi , z0 )E (ηi (x0 , z0 ) − η = O n i=1    ! n 1 X X − x i 0 E K2 Λ2λ (Zi , z0 ) = o = o(1), nH i=1 h

!

(B.29)

h i Wn (x0 , z0 ; h , λ ) − E f Wn (x0 , z0 ; h , λ ) where k · k denotes the Euclidean norm. This implies that f is the leading term of Wn (x0 , z0 ; h , λ ) − E [Wn (x0 , z0 ; h , λ )]. We next turn to the proof of h i d f fn (x0 , z0 ; h , λ ) −→ Wn (x0 , z0 ; h , λ ) − E W N [0p+1 , Ω (x0 , z0 )] ,

(B.30)

where Ω (x0 , z0 ) is defined in Section 4. By the Cram´er-Wold device (Billingsley, 1968) and the classical Central Limit Theorem for the i.i.d. random vectors, we can complete the proof of (B.30). In view of (B.28)–(B.30), we have that d

Wn (x0 , z0 ; h , λ ) − E [Wn (x0 , z0 ; h , λ )] −→ N [0p+1 , Ω (x0 , z0 )] . Meanwhile, note that when Xi and x0 are sufficiently close and Zi = z0 , we have 1 X 00 Qτ,k (x0 , z0 )(Xi,k − x0,k )2 ; 2 k=1 p

bi (x0 , z0 ) ≈

38

(B.31)

and when Zi = z1 6= z0 , we have bi (x0 , z0 ) ≈ Qτ (x0 , z1 ) − Qτ (x0 , z0 ). Then, by some elementary calculations, one can prove that 1     √ S−1  (x0 , z0 )E [Wn (x0 , z0 ; h , λ )] ∼ B(x0 , z0 ; h , λ ),  nH where

p X

" B(x0 , z0 ; h, λ) = b (x0 , z0 ; h, λ) , oP

k=1

h2k +

q X

#|

! λq

(B.32)

|

· 1p

,

k=1

1p is a p-dimensional column vector with all the elements being ones. By (B.31) and (B.32), we readily have that "

∆0 (x0 , z0 ; h , λ ) ∆1 (x0 , z0 ; h , λ )

# −

√ d nH B(x0 , z0 ; h , λ ) −→ N [0p+1 , Λ (x0 , z0 )] .

|

(B.33)

|

Recall that a = (a1 , · · · , ap ) and d = (d1 , · · · , dq ) with ak = hk ·n1/(4+p) and dk = λk ·n2/(4+p)  | | b b b bk · b = (b bp ) and d = d1 , · · · , dq with a bk = h as defined in Section 3. Similarly, let a a1 , · · · , a | | bk = bλk · n2/(4+p) , and let a = a , · · · , a and d = d , · · · , d with a and d n1/(4+p) and d p q k k 1 1 defined as in Theorem 3.2. Define " # ∆0 (x0 , z0 ; h, λ) ∆(x0 , z0 , a, d) = , B(x0 , z0 ; a, d) = B(x0 , z0 ; h, λ). ∆1 (x0 , z0 ; h, λ) when h = a · n−1/(4+p) and λ = d · n−2/(4+p) . Let Sa (C1 ) and Sd (C2 ) be two neighborhoods of a and d with radius C1 > 0 and C2 > 0, respectively. Similarly to the proof of (B.27), one can show that ∆(x0 , z0 , a, d) = −S−1 (B.34)  (x0 , z0 )Wn (x0 , z0 ; a, d) + oP (1) uniformly over a ∈ Sa (C1 ) and d ∈ Sd (C2 ), where Wn (x0 , z0 ; a, d) = Wn (x0 , z0 ; h, λ). Let   c Wn (x0 , z0 ; a, d) = Wn (x0 , z0 ; a, d) − E Wn (x0 , z0 ; a, d) . With (B.34), Theorem 3.2 and Theorem 3.1 in Li and Li (2010), we only have to show that  2  c c 0 0 E Wn (x0 , z0 ; a, d) − Wn (x0 , z0 ; a , d ) 6 C3 · ka − a0 k2 ,

39

(B.35)

where a, a0 ∈ Sa (C1 ), d, d0 ∈ Sd (C2 ) and C3 is a positive constant, and   √ S−1 nH · B(x0 , z0 ; a, d) = o(1)  (x0 , z0 )E Wn (x0 , z0 ; a, d) −

(B.36)

uniformly over a ∈ Sa (C1 ) and d ∈ Sd (C2 ). The proof of (B.36) is straightforward as its left hand wide is non-random and one can use the standard Taylor expansion of the quantile regression function to prove this result. We therefore only need to prove (B.35). Without loss of generality, we consider p = 1 and define  $i (x0 , z0 ; a, d) = ηi (x0 , z0 )K

Xi − x0 h

 Λλ (Zi , z0 )

with h = a · n−1/(4+p) and λ = d · n−2/(4+p) . Note that

0

0

2



E [$i (x0 , z0 ; a, d) − $i (x0 , z0 ; a , d )] !     2 Z q q X X  2  x − x0 x − x0 = E ηi (x0 , z0 )|Xi = x, Zi = z0 K −K dx + O λk + λ0k h h0 k=1 k=1 ! Z q q X X  2  2 = h E ηi (x0 , z0 )|Xi = x0 + hw, Zi = z0 [K (w) − K (hw/h0 )] dw + O λk + λ0k k=1 0

2

0 2

0 2

6 C4 h [1 − (h/h )] = C4 [a/(a ) ](a − a ) ,

k=1

(B.37)

where h0 = a0 · n−1/(4+p) and λ0 = d0 · n−2/(4+p) and C4 is a positive constant. Similarly, (B.37) also holds when we define     Xi − x0 Xi − x0 $i (x0 , z0 ; a, d) = ηi (x0 , z0 ) K Λλ (Zi , z0 ). h h Using (B.37) and following the argument in the proof of Example 4.1 in Li and Li (2010), we readily have (B.35). This completes the proof of Theorem 3.3.  P ROOF OF T HEOREM 4.2. As in the proof of Theorem 4.1, we only consider the case of mixed b be the discrete and continuous regressors. Without loss of generality, we let p = 1, h and h deterministic optimal and CV selected bandwidths, respectively. Define the set M (h, λ) as h − h | | < , M (h, λ) = (h, λ ) : h As in Theorem 3.2, we let a and d = d1 , · · · , dq

|

40



λk − λk λ < , k = 1, · · · , q . k satisfy h = a · n−1/5 and λ = d · n−2/5 and

define

 | | M (a, d) = (a, d ) : |a − a | < , |dk − dk | < , k = 1, · · · , q .

By Theorem 3.2, for any  > 0, we readily have that P



b bλ| h,

|

   | b| ∈ M (a, d) → 1, b, d ∈ M (h, λ) → 1 and P a

(B.38)

b are defined such that h b=a b · n−2/5 . b and d b · n−1/5 and bλ = d where a b τ (x, z; a, d) = Q b τ (x, z; h, λ), where h = an−1/5 , λ = dn−2/5 . For notational simplicity, we let Q With (B.38), it is sufficient to prove that P

b sup Q (x, z; a, d) − Q (x, z) > C5 · ιn τ τ

sup

! → 0,

(B.39)

| | (a,d ) ∈M (a,d) x∈S

where C5 is a sufficiently large positive constant and ιn = n−2/5 log1/2 n. Combining the proof of Example 2.1 in Li and Li (2010) and the arguments in the proof of (B.7), we readily have the following Bahadur-type representation: b τ (x, z; a, d) − Qτ (x, z) = −s−1 (x, z)wn (x, z; a, d)(1 + oP (1)), Q 0

(B.40)

uniformly over x ∈ S and (a, d ) ∈ M (a, d), where s0 (x, z) = µ0 fxz (x, z)fe (0|x, z), | |

1 X ηi (x, z)K wn (x, z; a, d) = nh i=1 n



Xi − x h



Λλ (Zi , z), h = an−1/5 , λ = dn−2/5 .

By Assumption 3 (ii), s0 (x, z) is strictly larger than a positive constant uniformly over x and z. Meanwhile, following the proof of (B.32), we may prove that sup | |

sup |E [wn (x, z; a, d)]| = O(n−2/5 ) = o(ιn ).

(B.41)

(a,d ) ∈M (a,d) x∈S

Hence, by (B.40) and (B.41), in order to prove (B.39), we only need to show that sup

sup |wn (x, z; a, d) − E [wn (x, z; a, d)]| = OP (ιn ).

(B.42)

| | (a,d ) ∈M (a,d) x∈S

Consider covering the set S by some disjoint intervals S1 , · · · , SL1 and covering the set M (a, d) | | by some disjoint sets M1 , · · · , ML2 . Denote the center points of Sl and Ml by xl and (al , dl ) , respectively. Letting the radius of Sl and Ml be of orders ιn n−2/5 and ιn n−1/5 , respectively, the

41

2/5 1/5 q+1 orders of L1 and L2 are ι−1 and (ι−1 ) , respectively. Note that n n n n

sup |wn (x, z; a, d) − E [wn (x, z; a, d)]|

sup

| | (a,d ) ∈M (a,d) x∈S

6

max

max |wn (xl1 , z; al2 , dl2 ) − E [wn (xl1 , z; al2 , dl2 )]| +

max

max

16l1 6L1 16l2 6L2

16l1 6L1 16l2 6L2

max

sup

max

16l1 6L1 16l2 6L2

sup |wn (x, z; a, d) − wn (xl1 , z; al2 , dl2 )| +

| | (a,d ) ∈Ml2 x∈Sl1

sup

sup |E [wn (x, z; a, d)] − E [wn (xl1 , z; al2 , dl2 )]| .

(B.43)

| | (a,d ) ∈Ml2 x∈Sl1

As the radius of Sl and Ml has the order of ιn n−2/5 and ιn n−1/5 , respectively, by the smoothness condition on K(·) in Assumption 1 and following the proof of (B.37), we may show that max

max

16l1 6L1 16l2 6L2

sup |wn (x, z; a, d) − wn (xl1 , z; al2 , dl2 )| = OP (ιn )

(B.44)

sup |E [wn (x, z; a, d)] − E [wn (xl1 , z; al2 , dl2 )]| = O(ιn ).

(B.45)

sup

| | (a,d ) ∈Ml2 x∈Sl1

and max

max

16l1 6L1 16l2 6L2

sup

| | (a,d ) ∈Ml2 x∈Sl1

By the Bonferroni inequality and Bernstein inequality for independent sequence (e.g. Lin and Bai, 2011), we can prove that  P 6

max

max |wn (xl1 , z; al2 , dl2 ) − E [wn (xl1 , z; al2 , dl2 )]| > C5 ιn



16l1 6L1 16l2 6L2

L1 X L2 X

P (|wn (xl1 , z; al2 , dl2 ) − E [wn (xl1 , z; al2 , dl2 )]| > C5 ιn )

l1 =1 l2 =1

6 O (L1 · L2 · exp {−C6 log n}) = o(1),

(B.46)

where C6 would be a sufficiently large positive constant if C5 is large enough. Hence, we have max

max |wn (xl1 , z; al2 , dl2 ) − E [wn (xl1 , z; al2 , dl2 )]| = OP (ιn ).

16l1 6L1 16l2 6L2

(B.47)

By (B.43)–(B.45) and (B.47), we can prove (B.42), completing the proof of (4.6) in Theorem 4.2. 

42

C

Estimators of b(x0, z0; h, λ) and V(x0, z0)

In this appendix, we discuss the construction of the estimators of b(x0 , z0 ; h, λ) and V(x0 , z0 ) for the case of mixed discrete and continuous regressors. The construction of the asymptotic bias and variance is similar for the case of purely continuous regressors. Following the discussion in Section 4, the estimator of b(x0 , z0 ; h, λ) can be obtained by replacing unknown functions by their consistent estimates as follows p q i X X fb (x0 , z1 ) h b µ2 X b 2 b 00 b b b b b Ik (z1 , z0 )λk b(x0 , z0 ; h, λ) = h Q (x0 , z0 ) + Qτ (x0 , z1 ) − Qτ (x0 , z0 ) , 2 k=1 k τ,k fb (x0 , z0 ) z ∈D k=1 z

1

where fb (x0 , z0 ) = fbe (0|x0 , z0 )fbxz (x0 , z0 ) with Pn fbe (0|x0 , z0 )

=

fbxz (x0 , z0 ) =

K (b e )K (X − x0 )Λλ (Zi , z0 ) i=1 Pnh0 i h i , i=1 Kh (Xi − x0 )Λλ (Zi , z0 ) n X

1 nH

Kh (Xi − x0 )Λλ (Zi , z0 ),

(C.1) (C.2)

i=1

b τ (Xi , Zi ; h, b bλ), the bandwidth h0 in (C.1) can be determined bi  /h0 ), b ei = Yi − Q Kh0 (b ei ) = h−1 0 K (e by the rule of thumb: h0 = seb n−1/(5+p) with seb being the sample standard deviation of {b ei }n i=1 , 00 th b b b Qτ,k (x0 , z0 ) is the k diagonal element of the p × p matrix Γ, and Γ is the matrix that, along with α and β, minimizes the following objective function: n X

  | | ρτ Yi − α − (Xi − x0 ) β − (1/2)(Xi − x0 ) Γ(Xi − x0 ) Kh (Xi − x0 )Λλ (Zi , z0 ).

i=1

The consistent estimator of V(x0 , z0 ) is given by τ(1 − τ)νp0 b V(x0 , z0 ) = h , i2  b b fe (0|x0 , z0 ) fxz (x0 , z0 ) where fbe (0|x0 , z0 ) and fbxz (x0 , z0 ) are defined in (C.1) and (C.2), respectively.

D

Weak convergence of quantile estimation uniformly over τ

In this appendix, we discuss extension of the weak convergence result developed in Qu and Yoon (2015) from the case of continuous regressors to the case of mixed continuous and discrete 43

regressors. As suggested by Qu and Yoon (2015), we first combine the local linear smoothing method introduced in Section 3 with the linear interpolation technique to obtain quantile regression estimates for all τ ∈ T ≡ [τ, τ], where 0 < τ < τ < 1. Specifically, consider a set of grid points (with equal distance) arranged in the increasing order: {τ0 , τ1 , · · · , τrn } with τ0 = τ and τrn = τ, where rn is a positive integer which may be divergent to infinity as n increases. For each given τk , 0 6 k 6 rn , we minimize the loss function in (3.3) with respect to α, and obtain the local linear b τ (x0 , z0 ; hτ , λτ ), where estimate of the τk -quantile regression function Qτk (x0 , z0 ), denoted by Q k k k we make the dependence of hτ and λτ on τ explicitly. Then, we apply the technique of linear b τ (x0 , z0 ; hτ , λτ ), i.e., interpolation between Q k k k b τ (x0 , z0 ; hτ , λτ ) + τ − τk · Q b τ (x0 , z0 ; hτ , λτ ) b  (x0 , z0 ) = τk+1 − τ · Q Q τ k k k k+1 k+1 k+1 τk+1 − τk τk+1 − τk

(D.1)

for τk 6 τ 6 τk+1 . To ensure monotonicity of the estimated quantile regression curves (with b  (x0 , z0 ) defined in (D.1), respect to τ), we may further apply the re-arrangement technique to Q τ see, for example, Chernozhukov, Fern´andez-Val and Galichon (2010). Let ei (τ) = Yi − Qτ (Xi , Zi ) and fe,τ (·|x0 , z0 ) be the conditional density function of ei (τ) given Xi = x0 and Zi = z0 . To generalize Qu and Yoon (2015)’s result to our setting, we need some additional smoothness condition on Qτ (x0 , z0 ) and fe,τ (·|x0 , z0 ) (with respect to τ) as well as some mild restriction on the bandwidths hτ and λτ . A SSUMPTION 6. (i) The conditional density of ei (τ) = 0 for given Xi = x0 and Z0 = z0 , fe,τ (0|x0 , z0 ) is Lipschitz continuous and bounded away from zero and infinity uniformly over T. (ii) The quantile regression function Qτ (x0 , z0 ), its first-order derivative (with respect to τ) and Q00τ (x0 , z0 ) are finite and Lipschitz continuous over T. A SSUMPTION 7. (i) There exist two functions of τ: a(τ) and d(τ) such that hτ = a(τ)n−1/(4+p) 1p and λτ = d(τ)n−2/(4+p) 1q , where 1r is an r-dimensional vector with each component being one. In addition, a(τ) and d(τ) are positive, bounded and Lipschitz continuous over T. (ii) The number rn satisfies that rn /n1/(p+4) → ∞. Combining Assumptions 1, 2, 3 (ii), 4 (ii), 6 and 7, we can easily verify Assumptions 1-5 in Qu and Yoon (2015). Following the proof of Theorem 1 in their paper, we may prove the following Bahadur representation uniformly over τ ∈ T: h i p b τ (x0 , z0 ; hτ , λτ ) − Qτ (x0 , z0 ) − b(x0 , z0 ; hτ , λτ ) = s−1 (x0 , z0 )wn,τ (x0 , z0 ; hτ , λτ ) + oP (1), nHτ Q τ (D.2)

44

Q where x0 is an interior point of Dx , Hτ = pj=1 hj,τ (with hj,τ being the j-th component of hτ ), b(x0 , z0 ; hτ , λτ ) is defined as in Section 3, sτ (x0 , z0 ) = µ0 fxz (x0 , z0 )fe,τ (0|x0 , z0 ), and wn,τ (x0 , z0 ; hτ , λτ ) = −(nHτ )

−1/2

n X

 ei,τ K η

i=1

Xi − x0 hτ

 ei,τ = I{ei (τ)60} − τ. Λλτ (Zi , z0 ), η

Note that the involvement of the discrete kernel in quantile regression estimation only affects the asymptotic bias term (c.f., Theorem 4.1). Using (D.2) and following the proof of Theorem 2 in Qu and Yoon (2015), we can prove the following weak convergence result. P ROPOSITION D.1. Suppose that Assumptions 1, 2, 3 (ii), 4 (ii), 6 and 7 are satisfied. Then, p

h i  b nHτ f1/2 (x , z )f (0|x , z ) Q (x , z ; h , λ ) − Q (x , z ) − b(x , z ; h , λ ) ⇒ G(τ), 0 0 e,τ 0 0 τ 0 0 τ τ τ 0 0 0 0 τ τ xz

(D.3)

where G(τ) is a continuous Gaussian process over τ ∈ T with mean zero and covariance function defined by Z E [G(τ1 )G(τ2 )] = (τ1 ∧ τ2 − τ1 τ2 ) [κ(τ1 )κ(τ2 )]

1/2

 K

w κ(τ1 )



 K

w κ(τ2

 dw, κ(τ) = a(τ)/a(0.5).

A natural question is whether the above weak convergence result is still valid if we replace b τ and bλτ . Note that Proposition D.1 holds when the hτ and λτ by the CV selected bandwidths h theoretically optimal bandwidths hτ and λτ are used. Following the arguments in the proof of Theorem 2.1 in Li and Li (2010), we may extend Proposition D.1 to the more general scenario, using the data-dependent CV selected bandwidths as long as

and

b j,τ − h h j,τ = oP (1), j = 1, · · · , p  hj,τ

(D.4)

bλj,τ − λ j,τ = oP (1), j = 1, · · · , q,  λj,τ

(D.5)

b j,τ and h are the j-th components of h b τ and h , respectively, and uniformly over τ ∈ T, where h τ j,τ similarly bλj,τ and λj,τ are the j-th components of bλτ and λτ , respectively. However, it is quite challenging to prove (D.4) and (D.5) using the techniques developed in the present paper. For example, we would have to prove the asymptotic expansion of the CV function in Proposition 3.1 uniformly over τ. We will leave this open question in our future research.

45

References Belloni, A. and Chernozhukov, V. (2011). l1 -penalized quantile regression in high-dimensional sparse models. Annals of Statistics, 39, 82–130. Belloni, A., Chernozhukov, V. and Fern´andez-Val, I. (2011). Conditional quantile processes based on series or many regressors. Working Paper available at https://arxiv.org/pdf/1105.6154v1.pdf. Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New York. Cai, Z. (2002). Regression quantile for time series. Econometric Theory, 18, 169–192. Cai, Z. and Xiao, Z. (2012). Semiparametric quantile regression estimation in dynamic models with partially varying coefficients. Journal of Econometrics, 167, 413–425. Cai, Z. and Xu, X. (2008). Nonparametric quantile estimations for dynamic smooth coefficient models. Journal of the American Statistical Association, 103, 1596–1608. Cheng, M. and Peng, L. (2002). Regression modeling for nonparametric estimation and distribution and quantile functions. Statistica Sinica, 12, 1043–1060. Chernozhukov, V., Fern´andez-Val, I. and Galichon, A. (2010). Quantile and probability curves without crossing. Econometrica, 78, 1093–1125. Dellarocas, C., Zhang, X. and Awad, N. F. (2007). Exploring the value of online product reviews in forecasting sales: The case of motion pictures. Journal of Interactive marketing, 21(4), 23–45. Duan, W., Gu, B. and Whinston, A. B. (2008). Do online reviews matter? - An empirical investigation of panel data. Decision support systems, 45(4), 1007–1016. Escanciano, J. C. and Goh, C. (2014). Specification analysis of linear quantile models. Journal of Econometrics, 178, 495–507. Escanciano, J. C. and Velasco, C. (2010). Specification tests of parametric dynamic conditional quantiles. Journal of Econometrics, 159, 209–221. Fan J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman and Hall. Galvao, A. (2011) Quantile regression for dynamic panel data with fixed effects. Journal of Econometrics, 164, 142-157. Galvao, A., Lamarche, C. and Lima, L. (2013). Estimation of censored quantile regression for panel data with fixed effects. Journal of the American Statistical Association, 108, 1075-1089.

46

Gilchrist, D. S. and Sands, E. G. (2016). Something to talk about: Social spillovers in movie consumption. Journal of Political Economy, 124(5), 1339–1382. Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Monographs on Statistics and Applied Probability 58, Chapman and Hall. H¨ardle, W. and Song, S. (2010). Confidence bands in quantile regression. Econometric Theory, 26, 1180–1200. H¨ardle, W. and Vieu P. (1992). Kernel regression smoothing of time series. Journal of Time Series Analysis, 13, 209–224. Hall, P., Lahiri, S. N. and Polzehl, J. (1995). Bandwidth choice in nonparametric regression with both shortand long-range dependent errors. Annals of Statistics, 23, 1921–1936. Hall, P., Li, Q. and Racine, J. (2007). Nonparametric estimation of regression functions in the presence of irrelevant regressors. The Review of Economics and Statistics, 89, 784–789. Hall, P., Racine, J. and Li, Q. (2004). Cross-validation and the estimation of conditional probability densities. Journal of the American Statistical Association, 99, 1015–1026. Hallin, M., Lu, Z. and Yu, K. (2009). Local linear spatial quantile regression. Bernoulli, 15, 659–686. Horowitz, J. (2009). Semiparametric and Nonparametric Methods in Econometrics. Springer Series in Statistics, Springer. Kai, B., Li, R. and Zou, H. (2001). New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. Annals of Statistics, 39, 305–332. Koenker, R. (2005). Quantile Regression. Cambridge University Press. Koenker, R. and Bassett, G. W. (1978). Regression quantiles. Econometrica, 46, 33–50. Koenker, R., Chernozhukov, V., He, X. and Peng L. (2017). Handbook of Quantile Regression. Chapman and Hall/CRC. Koenker, R. and Xiao, Z. (2006). Quantile autoregression. Journal of the American Statistical Association, 101, 980–990. Knight, K. (1998). Limiting distributions for L1 regression estimators under general conditions. Annals of Statistics, 26, 755–770. Leung, D. (2005). Cross-validation in nonparametric regression with outliers. Annals of Statistics, 33, 2291– 2310.

47

Li, D. and Li, Q. (2010). Nonparametric/semiparametric estimation and testing of econometric models with data dependent smoothing parameters. Journal of Econometrics, 157, 179–190. Li, D., Simar, L. and Zelenyuk, V. (2016). Generalized nonparametric smoothing with mixed discrete and continuous data. Computational Statistics and Data Analysis, 100, 424–444. Li, Q., Lin, J. and Racine, J. (2013). Optimal bandwidth selection for nonparametric conditional distribution and quantile functions. Journal of Business and Economic Statistics 31, 57–65. Li, Q. and Racine, J. (2003). Nonparametric estimation of distributions with categorical and continuous data. Journal of Multivariate Analysis, 86, 266–292. Li, Q. and Racine, J. (2004). Cross-validated local linear nonparametric regression. Statistica Sinica, 14, 485–512. Li, Q. and Racine, J. (2007). Nonparametric Econometrics: Theory and Practice. Princeton University Press. Li, Q. and Racine, J. (2008). Nonparametric estimation of conditional CDF and quantile functions with mixed categorical and continuous data. Journal of Business & Economic Statistics, 26, 423–434. Li, Q., Racine, J. and Wooldridge, J. M. (2009). Efficient estimation of average treatment effect with mixed categorical and continuous data. Journal of Business and Economic Statistics, 26, 423–434. Li, Q. and Zhou, R. (2005). The uniqueness of cross-validation selected smoothing parameters in kernel estimation of nonparametric models. Econometric Theory 21, 1017–1025 Lin, Z. and Bai, Z. (2011). Probability Inequalities. Springer. Mack, Y. P. and Silverman, B. W. (1982). Weak and strong uniform consistency for kernel regression estimates. Zeitschrift fur Wahrscheinlichskeittheorie und verwandte Gebiete, 61, 405–415. Pagan, A. and Ullah, A. (1999). Nonparametric Econometrics. Cambridge University Press. Park, B., Simar, L. and Zelenyuk, V. (2015). Categorical data in local maximum likelihood: theory and applications to productivity analysis. Journal of Productivity Analysis, 43, 199–214. Pollard, D. (1991). Asymptotics for least absolute deviation regression estimators. Econometric Theory 7, 186–199. Qu, Z. and Yoon, J. (2015). Nonparametric estimation and inference on conditional quantile processes. Journal of Econometrics 185, 1–19. Racine, J. and Li, Q. (2004). Nonparametric estimation of regression functions with both categorical and continuous data. Journal of Econometrics, 119, 99–130.

48

Racine, J. and Li, K. (2017). Nonparametric conditional quantile estimation: A locally weighted quantile kernel approach. Journal of Econometrics, 201, 72–94. Rice, J. (1984). Bandwidth choice for nonparametric regression. Annals of Statistics, 12,1215–1230. Spokoiny, V., Wang, W. and H¨ardle, W. (2013). Local quantile regression. Journal of Statistical Planning and Inference, 143, 1109–1129. Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing. Chapman and Hall. Xia, Y. and Li, W. K. (2002). Asymptotic behavior of bandwidth selected by cross-validation method under dependence. Journal Multivariate Analysis, 83, 265–287. Yu, K. and Jones, M. C. (1998). Local linear quantile regression. Journal of the American Statistical Association, 93, 228–238. Yu, K. and Lu, Z. (2004). Local linear additive quantile regression. Scandinavian Journal of Statistics, 31, 333–346. Zhou, Z. (2010). Nonparametric inference of quantile curves for nonstationary time series. Annals of Statistics, 38, 2187–2217. Zhu, X., Wang, W., Wang, H. and H¨ardle, W. (2018). Network quantile autoregression. Forthcoming in Journal of Econometrics.

49