Estimation of a Nonparametric Censored Regression Model - CiteSeerX

Estimation of a Nonparametric Censored Regression Model Songnian Chen

Hong Kong University of Science and Technology

Shakeeb Khan

University of Rochester July 2000 (Preliminary Draft)

Abstract In this paper we consider identi cation and estimation of a censored nonparametric location scale model. We rst show that in the case where the location function is strictly less than the ( xed) censoring point for all values in the support of the explanatory variables, then the location function is not identi ed anywhere. In contrast, if the location function is greater or equal to the censoring point with positive probability, then the location function is identi ed on the entire support, including the region where the location function is below the censoring point. In the latter case we propose a simple estimation procedure based on combining conditional quantile estimators for three distinct quantiles. The new estimator is shown to converge at the optimal nonparametric rate with a limiting normal distribution. A small scale simulation study indicates that the proposed estimation procedure performs well in nite samples.

JEL Classi cation: C14,C23,C24 Key Words: censored regression, nonparametric quantile regression, location-scale model.

Corresponding author. Department of Economics, University of Rochester, Rochester, NY 14627; e-mail:

[email protected]. We are grateful to B.E. Honore, A. Lewbel, O.B. Linton, and J.L. Powell for their helpful comments.

1 Introduction The nonparametric location-scale model is usually of the form:

yi = (xi) + (xi )i

(1.1)

where xi is an observed d?dimensional random vector and i is an unobserved random variable, distributed independently of xi, and assumed to be centered around zero in some sense. The functions () and () are unknown. This location-scale model has received a great deal of attention in the statistics and econometrics literature1, and existing nonparametric methods such as kernel, local polynomial, and series estimators can be used to estimate () from a random sample of observations of the vector (yi; x0i)0 . In this paper, we consider extending the nonparametric location-scale model to accommodate censored data.2 Censoring occurs in many types of economic data, either because of non-negativity constraints, or top coding. To allow for censoring, we work within the latent dependent variable framework, as is typically done for parametric and semiparametric models. We thus consider a model of the form:

yi = (xi ) + (xi)i yi = max(yi; 0)

(1.2) (1.3)

where yi is a latent dependent variable, which is only observed if it exceeds the xed censoring point, which we assume without loss of generality is 0. We consider identi cation and estimation of (xi) after imposing the location restriction that the median of i = 0. We emphasize that our results allow for identi cation of (xi) on the entire support of xi . This is in contrast to identifying and estimating (xi ) only in the region where it exceeds the censoring point, which could be easily done by extending Powell's(1984) CLAD estimator to a nonparametric setting. 1 See for example Fan and Gijbels(1996), Chapter 3, and Ruppert and Wand(1994). 2 Semiparametric ( xed) censored regression models, where (xi ) is known up to a nite-dimensional parameter, has been studied extensively in the econometrics literature. Estimators of the nitedimensional parameter have been proposed by Powell(1984,1986a,b), Horowitz(1986,1988), Moon(1989), Nawata(1990), Honore and Powell(1994), Buchinsky and Hahn(1998), Khan and Powell(1999), and Chen and Khan(1999a,b), under various restrictions on the error structure. The advantage of our nonparametric approach here is that economic theory rarely provides any guidance on functional forms in relationships between variables.

1

Our work is motivated by the fact that there are often situations where the econometrician is interested in estimating the location function in the region where it is less than the censoring point. For example, when the data set is heavily censored, (xi) will be less than the censoring point for a large portion of the support of xi, requiring estimation at these points necessary to draw meaningful inference regarding its shape. Another example would be estimating a demand curve in the presence of rm capacity constraints. A rm would be interested in estimating the demand curve for its product in the region where quantity demanded exceeds its production capacity in order to determine if increasing capacity would increase pro ts. Our approach is based on a structural relationship between the conditional median and upper quantiles which holds for observations where (xi) 0. This relationship can be used to motivate an estimator for (xi) in the region where it is negative. Our results are thus based on the condition

PX (xi : (xi) 0) > 0

(1.4)

where PX () denotes the probability measure of the random variable xi . Variations of censored nonparametric models have been studied elsewhere in the literature. Lewbel and Linton(1999) estimate a nonparametric censored regression model with a xed censoring point that is based on a mean restriction on the disturbance term. As conditional mean restrictions are generally not sucient for identi cation in censored regression models (see Powell(1994)), their approach requires much stronger conditions on the tail behavior of the random variables i and xi than assumed here. Van Keilegom and Akritas(1999) consider a nonparametric censored regression model where the censoring variable is a random variable whose value is only observed for censored observations. However, their procedure cannot accommodate the xed censoring case, (which arises more often in economic applications) and also is based on much stronger conditions on the tail behavior of the disturbance term. The paper is organized as follows. The next section explains the key identi cation condition, and motivates a way to estimate the function () at each point in the support of xi . Section 3 introduces the new estimation procedure and establishes the asymptotic properties of this estimator when the identi cation condition is satis ed. Section 4 considers extensions of the estimation procedure to estimate the distribution of the disturbance term and obtain dimensionality reduction. Section 5 explores the nite sample properties of the estimator through the results of a simulation study. Section 6 concludes by summarizing results and 2

discussing extensions for future research. An appendix contains proofs of the theorems.

2 Identi cation of the Location Function In this section we consider conditions necessary for identifying () on X , the support of xi . Our identi cation results are based on the following assumptions:

I1 The regressor support X is a compact subset of Rd. I2 The disturbance term i is distributed independently of xi , and has a density function with respect to Lebesgue measure, that is positive and bounded on R. I3 i has median 0. I4 The scale function () is continuous, strictly positive and bounded on X . I5 The location function () is continuous on X . Remark 2.1 The median restriction in Assumption I3 is dierent from the usual 0 mean

assumption imposed in the location scale model. Censoring introduces a non-linearity which makes identi cation of () impossible without further assumptions on i . In contrast, as medians are equivariant to monotonic transformations, identi cation of () will still be possible. The equivariance property of medians was rst exploited by Powell(1984) in estimating a semiparametric censored regression model.

The rst result is that the location function is not identi ed anywhere on X if () < 0 everywhere on X . Its proof is left to the appendix.

Theorem 2.1 (Necessity) Suppose Assumptions I1-I5 hold, and that maxx2X (x) < 0. Then there exists a function ~() and a random variable ~i, where Assumptions I2-I5 still hold with ~(); ~i replacing (); i respectively, such that if we de ne: y~i = max(~(xi ) + (xi)~i ; 0) then

L(yijxi ) = L(~yijxi) 8xi 2 X where L(yi jxi) denotes the conditional distribution of yi given xi .

3

Remark 2.2 The conclusion of the theorem is analogous to the full rank condition for the

semiparametric censored regression model discussed in Powell(1984,1986a). In that context, a necessary condition for identi cation of the parameter of interest 0 was a sucient number of observations in the support of xi that satis ed the condition x0i 0 0.

Our next result establishes the suciency of (1.4) for identi cation of () on every point in X . The proof of the theorem suggests a natural estimator of (), so it is included in the main text.

Theorem 2.2 (Suciency) Suppose Assumptions I1-I5 hold, and condition (1.4) holds. Then () is identi ed for all x 2 X . Proof : We show identi cation sequentially. We rst show identi cation for all points where () is nonnegative. We then show how identi cation of in this range of the support of

xi can be used to identify where it is negative. To show identi cation in the nonneagtive region, we let x0 be any point which satis es (x0 ) 0. Suppose rst that (x0) = 0. We will show that ~(x0 ) < 0 or ~(x0 ) > 0 leads to a contradiction. If ~(x0) = ? < 0, let ~ (x0 ) be a positive, nite number. We note by Assumption I2 that c , when viewed as a function of is continuous on [0; 1] and has bounded derivative on any compact subset of (0; 1). Thus if we let ~i denote an alternative error term, it must follow that c~0:5 = 0, and 0 < c~ < =~ (x0 ) for 2 (0:5; 0:5 + ) where recall = ?~(x0 ) and is an arbitrarly small positive constant. Noting that c > 0 for 2 (0:5; 0:5 + ), we have for 2 (0:5; 0:5 + ), q (x0 ) = max((x0 ) + c(x0 ); 0) = max(c (x0); 0) > 0. Alternatively we have: q~(x0 ) = max(~(x0 ) + ~ (x0 )~c; 0) max(? + ; 0) = 0

(2.1) (2.2) (2.3)

Thus we have found quantiles where q(x0 ) 6= q~(x0), which shows that (x0 ) = 0 is distinguishable from negative alternatives. A similar argument can be used to show that it is distinguishable from positive alternatives, establishing its identi cation. It is even simpler to show that points where () > 0 are identi ed. If (x0) > 0, and ~(x0 ) 6= (x0 ), then q0:5 (x0) = (x0) and q~0:5 (x0 ) = max(~(x0 ); 0) 6= (x0 ). 4

We next show how to identify (x) when (x) < 0 given that we have identi ed (x0 ) for (x0) 0. We rst note that since (x) and (x) are nite by Assumptions I5 and I4 respectively, there exists quantiles 1 < 2 < 1 such that:

(x) + ci (x) > 0 i = 1; 2 Thus we have the relationships:

q (x) = (x) + c (x) q (x) = (x) + c (x) 1

1

2

2

(2.4) (2.5)

which imply the following: q(x) = c(x) q(x) = (x) + c(x)

(2.6) (2.7)

where q(x) = q (x) ? q (x), q(x) = (q (x)+ q (x))=2, c = c ? c , c = (c + c )=2. Combining the two previous relationships, if we could identify the fraction cc , then we could identify (x) as: 2

1

2

1

2

1

(x) = q(x) ? cc q(x)

2

1

(2.8)

We use identi cation of (x0) 0 to identify cc in the following manner. We combine the following values of the conditional quantile function evaluated at the three distinct quantiles 0:5; 1; 2.

q0:5(x0 ) = (x0 ) q (x0 ) = q0:5 (x0) + c (x0) q (x0 ) = q0:5 (x0) + c (x0) 1

1

2

2

(2.9) (2.10) (2.11)

This enables us to identify cc as

c = q(x0 ) ? q0:5(x0 ) c q(x0 )

(2.12)

which immediately translates into identi cation of (x) from the relationship: ) ? q0:5 (x0 ) q(x) (x) = q(x) ? q(x0 q(x )

(2.13)

0

5

This completes the proof of the theorem.

Remark 2.3 Identi cation on all points rst involves identi cation of a point where (x)

0. As is apparent from the proof, identi cation is much simpler for points where (x) > 0, and we note that the argument for identi cation of a point where (x) = 0 would be dicult to translate into an estimator. In the next section, where we propose an estimator for () based on our identi cation results, we therefore assume PX (xi : (xi ) > 0) > 0.

Remark 2.4 Identi cation of () where it is negative involves identi cation of the quantiles

of the homoskedastic component of the disturbance term. Thus an additional consequence of condition (1.4) being satis ed is that the quantiles of i are identi ed for all 0 inf f : supx2X q (x) > 0g. This result can be used to estimate and construct hypothesis tests regarding the distribution of i , as is considered in Section 5. We also note that if the econometrician were to impose a distributional form on i , the (known) values of c ; c could be used in (2.8) to identify and estimate the location function, without requiring condition (1.4). 1

2

3 Estimation Procedure and Asymptotic Properties 3.1 Estimation Procedure In this section we consider estimation of the function (). Our procedure will be based on our identi cation results in the previous section, and involves nonparametric quantile regression at dierent quantiles and dierent points in the support of the regressors. Our asymptotic arguments are based on the local polynomial estimator for conditional quantile functions introduced in Chaudhuri(1991a,b). For expositional ease, we only describe this nonparametric estimator for a polynomial of degree 0, and refer readers to Chaudhuri(1991a,b), Chaudhuri et al.(1997), Chen and Khan(1999a,b), and Khan(1999) for the additional notation involved for polynomials of arbitrary degree. First, we assume the regressor vector xi can be partitioned as (x(ids) ; xi(c) ), where the dds?dimensional vector x(ids) is discretely distributed, and the dc-dimensional vector xi(c) is continuously distributed. 6

We let Cn(xi ) denote the cell of observation xi and let hn denote the sequence of bandwidths which govern the size of the cell. For some observation xj ; j 6= i, we let xj 2 Cn(xi) denote that x(jds) = x(ids) and xj(c) lies in the dc-dimensional cube centered at xi(c) with side length 2hn. Let I [] be an indicator function, taking the value 1 if its argument is true, and 0 otherwise. Our estimator of the conditional th quantile function at a point xi for any 2 (0; 1) involves -quantile regression(see Koenker and Bassett (1978)) on observations which lie in the de ned cells of xi . Speci cally, let ^ minimize: n X j =1

I [xj 2 Cn(xi)] (yj ? )

(3.1)

where () j j + (2 ? 1)()I [ < 0]. Our estimation procedure will be based on a random sample of n observations of the vector (yi; x0i)0 and involves applying the local polynomial estimator at three stages. Throughout our description, ^ will denote estimated values.

1. Local Constant Estimation of the Conditional Median Function. In the rst stage, we estimate the conditional median at each point in the sample, using a polynomial of degree 0. We will let h1n denote the bandwidth sequence used in this stage. Following the terminology of Fan(1992), we refer to this as a local constant estimator, and denote the estimated values by q^0:5 (xi). Recalling that our identi cation result is based on observations for which the median function is positive, we assigns weights to these estimated values using a weighting function, denoted by w(). Essentially, w() assigns 0 weight to observations in the sample for which the estimated value of the median function is 0, and assigns positive weight for estimated values which are positive.

2. Weighted Average Estimation of the Disturbance Quantiles In the second stage, the unknown quantiles c ; c are estimated (up to the scalar constant c) by a weighted average of local polynomial estimators of the quantile functions for the higher quantiles 1 ; 2. The estimator of these constants is based on (2.12). In this stage, we use a polynomial of degree k, and denote the second stage bandwidth sequence by h2n . We let c^1; c^2 denote the estimators of the unknown constants cc ; cc , and de ne them as: 1

2

1

1 Pn (xi )w(^q0:5 (xi )) (^q1 (xi )?q^0(p:5) (xi )) (^q2 (xi )?q^1 (xi )) c^1 = n i=1 1 Pn q0:5(xi)) n i=1 (xi )w (^

7

2

(3.2)

1 Pn (xi )w(^q0:5 (xi )) (^q2 (xi )?q^0(p:5) (xi )) (^q2 (xi )?q^1 (xi )) c^2 = n i=1 1 Pn q0:5(xi)) n i=1 (xi )w (^

(3.3)

where (xi ) is a trimming function, whose support, denoted by X , is a compact set which lies strictly in the interior of X . The trimming function serves to eliminate \boundary eects" that arise in nonparametric estimation. We use the superscript (p) to distinguish the estimator of the median function in this stage from that in the rst stage.

3. Local Polynomial Estimation at the Point of Interest The third stage is based on (2.13). Letting x denote the point at which the function () is to be estimated at,

we combine the local polynomial estimator, with polynomial order k and bandwidth sequence h3n , of the conditional quantile function at x using quantiles 1 ; 2, with the estimator of the unknown disturbance quantiles, to yield the estimator of (x):

^(x) = c^2q^ (x) ? c^1q^ (x) 1

(3.4)

2

Remark 3.1 We note here that a dierent order polynomial is used in rst stage than in the other two stages. The reason for this is that even though the functions (); () are assumed to be k?times dierentiable, the quantile functions will not in general be smooth at the censoring point. Thus a local polynomial estimator may not be consistent when the quantile function is in a neighborhood of the censoring point. However, once points in the sample which are greater than the censoring point are \selected" in the rst stage, the quantile function at these points are suciently smooth for the local polynomial estimators to be used in the second and third stages.

3.2 Asymptotic Properties In this section we establish the asymptotic properties of our estimation procedure. Our results are based on the following assumptions:

Assumption ID (Identi cation) The weighting function is positive with positive probability:

PX ( (xi )w(q0:5(xi )) > 0) > 0 8

and the 1 quantile at the point of interest is positive:

q (x) > 0 1

Assumption RS (Random sampling) The sequence of d +1 dimensional vectors (yi; xi) are independent and identically distributed.

Assumption WF (Weighting function properties) The weighting function, w() : R ! R+ has the following properties:

WF.1 w() 2 [0; 1] and is continuously dierentiable with bounded derivative. WF.2 w 0 if its argument is less than , an arbitrarily small positive constant. Assumption RD (Regressor Distribution) We let fX c jX ds (jx(ds) ) denote the conditional ( )

(

)

density function of xi(c) given x(ids) = x(ds) , and assume it is bounded away from 0 and

in nity on X . We let fX ds () denote the mass function of x(ids) , and assume a nite number of mass points on X . Also, we let fX () denote fX c jX ds (j)fX ds (). (

)

( )

(

(

)

)

Assumption ED (Disturbance Density) The disturbance terms i is assumed to have a continuous distribution with density function that is bounded, positive, and continuous on R.

Assumption OS (Orders of Smoothness). For some % 2 (0; 1], and any real valued function F of xi, we adopt the notation F 2 C % (X ) to mean there exists a positive constant K < 1 such that: jF (x1 ) ? F (x2 )j K kx1 ? x2 k% for all x1 ; x2 2 X . With this notation, we assume the following smoothness conditions

OS.1 fX (); () 2 C %(X ) OS.2 () and () are dierentiable in xi(c) of order k, with kth order derivatives 2 C %(X ). We let p = k + % denote the order of smoothness of this function. Assumption BC (Bandwidth Conditions) The bandwidths used in each of the three stages are assumed to satisfy the following conditions: 9

logdn ! 0, n p p dc h2 ! 0. BC.1 h1n satis es nh c 1n n p log n ! 0, n p dc hp ! 0. BC.2 h2n satis es nh dc 2n n BC.3 h3n is of the form h3n = 0n p? dc where 0 is a positive constant. 2 +

1

2 +

2

1 2 +

Remark 3.2 The weighting function w() In Assumption WF serves as a smooth approximation to an indicator function, selecting those observations for which the estimated value of the conditional median function is positive. For technical reasons, we require that the weighting function only assign positive weight to estimated conditional median values which are bounded away from 0.

Remark 3.3 The bandwidth sequences h1n,h2n ; h3n in Assumption BC are required to sat-

isfy dierent conditions. The conditions on h1n and h2n in Assumptions BC.1, BC.2, re ect \undersmoothing", implying that the bias of the nonparametric estimators used in the rst two stages converges to 0 at a faster rate than the standard deviation. In contrast, Assumption BC.3 imposes the optimal rate for h3n , so that the estimator of () will converge at the optimal nonparametric rate.

We now characterize the limiting distribution for the proposed estimator of (x), where x is assumed to lie in the interior of the support of xi. The following theorem establishes that the proposed estimator converges at the optimal nonparametric rate, and has a limiting non-centered normal distribution. The proof is left to the appendix.

Theorem 3.1 If Assumptions ID,RS,WF,RD,ED,OS,BC hold, then p

n p dc (^(x) ? (x)) ) N (B; V )

(3.5)

2 +

where

c2 V = (c)2f (q (x)jx)2 1 (1 ? 1 ) Y jX c2 + (c)2f (q (x)jx)2 2 (1 ? 2 ) Y jX ? (c)2f (q 2(cx)jcx)f (q (x)jx) 1(1 ? 2) Y jX Y jX 2

1

1

2

2

1

1

2

10

(3.6)

with fY jX () denoting the conditional density function of yi. The form of the limiting bias requires introducing new notation. For any quantile , we let ? q n x(c) + th3n; x(c); x(ds)

denote the kth order Taylor polynomial approximation of ? q x(c) + th3n; x(ds)

where here t is a dc-dimensional vector of constants, and h3n is as de ned in Assumption BC.3. We de ne

B = nlim !1

q

nhd3nc

Z

? ? q x(c) + th3n; x(ds) ? q n x(c) + th3n; x(c) ; x(ds) dt

? 1 1 dc ; 2 2

[ ]

The limting bias of the proposed estimator is of the form

B = cc B ? cc B 2

(3.7)

1

1

2

4 Extensions In this section we informally consider two extensions of the estimation procedure. First, we propose an estimator of the distribution of the homoskedastic component of the disturbance term. Second, we consider dimensionality reduction that can be obtained by imposing additive separability on the location function. A rigorous pursuit of these extensions is left to future work.

4.1 Estimating the Distribution of i As mentioned in Section 2, the distribution of the random variable i is identi ed for all quantiles exceeding 0 inf f : supx2X q (x) > 0g. In this section we consider estimation of these quantiles, and the asymptotic properties of the estimator. Estimating the distribution of i is of interest for two reasons. First, the econometrician may be interested in estimating 11

the entire model, which would require estimators of (xi) and the distribution of i as well as of (xi). Second, the estimator can be used to construct tests of various parametric forms of the distribution of i , and the results of these tests could then be used to adopt a (local) likelihood approach to estimating the function (xi). Before proceeding, we note that the distribution of i is only identi ed up to scale, and we impose the scale normalization that c0:75 ? c0:25 1.3 We also assume without loss of generality that 0 0:25. To estimate c for any 0, we let ? = min(; 0:5) and de ne our estimator as

c^ =

? q^0(p:5) (xi)) ? q^0:25(xi ))

1 Pn q? (xi)) (^q (xi ) nP i=1 (xi )w (^ 1 n (xi )w(^q (xi )) (^q0:75 (xi ) ? n i=1

(4.8)

The proposed estimator, which involves averaging nonparametric estimators, will converge p at the parametric ( n) rate and have a limiting normal distribution, as can be rigorously shown using similar arguments found in Chen and Khan(1999b).

4.2 Dimensionality Reduction One of the diculties that arises when implementing nonparametric procedures in practice is their imprecision when the dimension of the regressors is high. This \curse of dimensionality" has been well documented in the econometrics and statistics literature- see for example Hardle and Linton(1994). One solution to alleviating this problem has been to impose additive separability on the regression function- see for example Brieman and Friedman(1985), Buja et al.(1989), Andrews and Whang(1990), Newey(1994), Linton and Nielsen(1995), and Horowitz(2000). We consider extending this restriction to the problem at hand by assuming the location function is of the form: (2) (d) (xi) = + 1(x(1) i ) + 2 (xi ) + :::d (xi )

where superscripts denote components of the vector xi and denotes an unknown constant. One approach to estimating the individual functions separately would be to take \partial means" of the estimator discussed in the previous section, analogous to the approach suggested in Newey(1994) and Linton and Nielsen(1995) for the uncensored nonparametric re3 Alternatively, one could adopt the normalization c0:75 ? c0:25 1:38, which is the interquartile range of

the standard normal distribution.

12

gression model. To do so we rst impose the following location normalizations to identify the separate functions:

E [l (xi(l) )] = 0 l = 1; 2; :::d

(4.9)

Then we simply average the value of the previous estimator evaluated at the sample points in the following manner. If, for example, the function 1() is to be estimated at some point x(1) that lies in the support of x(1) i , it could be estimated as: n X 0 1 (1) ^1(x ) = n ^(x(1) ; x(1i )) ? ^(xi) i=1

(4.10)

0

where here x(1i ) denotes all the remaining components of xi . By following arguments analogous to used in Newey(1994), this estimator can be shown to converge at the one dimensional nonparametric rate.

5 Monte Carlo Results In this section the nite sample properties of the proposed estimator are explored by way of a small scale simulation study. We simulated from designs of the form:

yi = max((xi ) + (xi)i ; 0) where xi was a random variable distributed uniformly between -1 and 1, i was distributed standard normal, and the scale function (xi ) was set to e0:15xi . We considered four dierent functional forms for (xi) in our study: 1. (x) = x 2. (x) = x2 ? C1 3. (x) = 0:5 x3 4. (x) = ex ? C2 where the constants C1 ; C2 were chosen so that the censoring level was 50%, as it was for the other two designs. 13

We adopted the following data-driven method to select the quantile pair. For a given point x, we note that the estimator requires that q (x); q (x) both be strictly positive for identi cation, requiring that the quantiles be suciently close to 1. On the other hand, eciency concerns would suggest that the quantiles not be at the extreme, as the quantile regression estimator becomes imprecise. We thus let the probability of being censored, or the \propensity score" (see Rosenbaum and Rudin(1983)) govern the choice of quantiles for estimating the function () at the point x. Letting di denote an indicator function which takes the value 1 if an observation is uncensored, we note that 1

2

? ( x ) 1 ? E [dijxi = x] = F (x)

where F() denotes the c.d.f.

of i. Letting = F ?(x(x))

, we note that

q (x) = max((x) + c (x); 0) = max((x) + ?(x(x)) (x); 0) = 0 Thus if one knew the propensity score value, identi cation would require that be a lower bound for the choice of quantile pair. The propensity score can be easily estimated using kernel methods, suggesting an estimator of :

^ = 1 ?

1 Pn di K~ (x(c) ? x(c) )I [x(d) = x(d) ] i i n i=1 1 Pn K~ (x(c) ? x(c) )I [x(d) = x(d) ] i i n i=1

where K~() = ~?dc K ( ~ ) where ~ is a bandwidth sequence, and K () is a kernel function. Our proposed choice of quantile pair takes into account this lower bound as well as the eciency loss of estimating quantiles at the extreme. We set: 1 = 2^ 3+ 1 2 = 2 +3 ^

which divides the interval [^; 1] into three equal spaces. In implementing this procedure in the Monte Carlo study, the propensity scores were estimated using a normal kernel function and a bandwidth of n?1=5 . 14

For the quantile estimators, a local constant was t in the rst stage, using a bandwidth of n?1=5 , and a local linear estimator was used in the second and third stages, using a bandwidth of the form n?1=5 . The constant was selected using the \rule of thumb" approach detailed on page 202 in Fan and Gijbels(1996). Figures 1-4 are based on sample sizes of n = 100 and n = 400, with 401 replications. The function () was estimated at 100 equispaced points, and the gures plot the average value of the estimated function, denoted by m(x), alongside the true function. Also reported (in parentheses) are the average mean squared errors (AMSE) for the estimator. As indicated by the gures, the results are pretty much as expected. For n = 100, estimator performs very well at points where (x) 0, and is further away from the truth the further (x), in its negative range, is from 0. The estimator performs much better for n = 400, where it is close to true function value on its entire support. For both sample sizes, the estimator performs better in terms of AMSE for (x) = x2 ?C1 and (x) = ex ?C2 . This is because the location function is negative with smaller probability than for the other two designs. While our results are very encouraging in general, we would expect poorer nite sample performance when more regressors are present, as the rate of convergence would be slower.

6 Conclusions This paper has established conditions for nonparametric identi cation of the location function in a censored regression model. An estimation procedure was proposed, and shown to have desirable asymptotic properties. The procedure is simple to implement, as it is based on various quantiles of the conditional distribution of the dependent variable, and can be computed by linear programming methods. The results in this paper suggest areas for future research. First, the asymptotic properties for the extensions discussed in section 4 needs to be formally derived. Furthermore, a more formal data driven approach needs to be developed for selection of the quantiles used in the second and third stages, and the asymptotics of such an approach needs to be derived.

15

References [1] Andrews, D.W.K. and Y.-J. Whang (1990) \Additive and Interactive Regression Models: Circumvention of the Curse of Dimensionality", Econometric Theory, 6, 466-479. [2] Buchinsky, M. and J. Hahn (1998) \An Alternative Estimator for the Censored Quantile Regression Model", Econometrica, 66, 653-672. [3] Buja, A., Hastie, T., and R. Tibshirani (1989) \Linear Smoothers and Additive Models (with Discussion)", Annals of Statistics, 17, 454-455 [4] Breman, L. and J.H. Friedman(1985) \Estimating Optimal Transformations for Multiple Regression and Correlation (with Discussion)", Journal of the American Statistical Association, 80, 580-619 [5] Chaudhuri, P. (1991a) \Nonparametric Quantile Regression", Annals of Statistics, 19, 760-777. [6] Chaudhuri, P. (1991b) \Global Nonparametric Estimation of Conditional Quantiles and their Derivatives", Journal of Multivariate Analysis, 39, 246-269. [7] Chaudhuri, P., K. Doksum, and A. Samarov (1997) \On Average Derivative Quantile Regression", Annals of Statistics, 25, 715-744. [8] Chen, S. and S. Khan (1999a) \Semiparametric Estimation of a Partially Linear Censored Regression Model", Forthcoming, Econometric Theory. [9] Chen, S. and S. Khan (1999b) \Estimation of Censored Regression Models in the Presence of Nonparametric Multiplicative Heteroskedasticity", Forthcoming, Journal of Econometrics. [10] Fan, J. (1992) \Design-adaptive Nonparametric Regression", Journal of the American Statistical Association, 87, 998-1004. [11] Fan, J. and I. Gijbels (1996) Local Polynomial Modelling and its Applications, New York: Chapman and Hall. [12] Hardle, W. and O. Linton (1994) \Applied Nonparametric Methods", in Engle, R.F. and D. McFadden (eds.) , Handbook of Econometrics, Vol. 4, Amsterdam: North-Holland. [13] Honore, B.E. and J.L. Powell (1994) \Pairwise Dierence Estimators of Censored and Truncated Regression Models", Journal of Econometrics, 64, 241-278. [14] Horowitz, J.L. (1986) \A Distribution-Free Least Squares Estimator for Censored Linear Regression Models", Journal of Econometrics, 32, 59-84. [15] Horowitz, J.L. (1988) \Semiparametric M-Estimation of Censored Linear Regression Models", Advances in Econometrics, 7, 45-83. [16] Horowitz, J.L. (2000) \Nonparametric Estimation of a Generalized Additive Model with an Unknown Link Function", Econometrica, forthcoming. [17] Khan, S. (1999) \Two Stage Rank Estimation of Quantile Index Models", Forthcoming, Journal of Econometrics. [18] Khan, S. and J.L. Powell (1999) \Two-Step Quantile Estimation of the Censored Regression Model ", manuscript, University of Rochester.

16

[19] Koenker, R. and G.S. Bassett Jr. (1978) \Regression Quantiles", Econometrica, 46, 33-50. [20] Lewbel, A. and O.B. Linton (1999) \Nonparametric Censored Regression", unpublished manuscript. [21] Linton, O.B. and J. P. Nielsen (1995) \Estimating Structured Nonparametric Regression by the Kernel Method", Biometrika, 83, 529-540. [22] Moon, C-G. (1989) \A Monte Carlo Comparison of Semiparametric Tobit Estimators," Journal of Applied Econometrics, 4, 361-382. [23] Nawata, K. (1990) \Robust Estimation Based on Group-Adjusted Data in Censored Regression Models", Journal of Econometrics, 43, 337-362. [24] Newey, W.K. (1994) \Kernel Estimation of Partial Means and a General Variance Estimator", Econometric Theory, 10, 233-253. [25] Powell, J.L. (1984) \Least Absolute Deviations Estimation for the Censored Regression Model", Journal of Econometrics, 25, 303-325. [26] Powell, J.L. (1986a) \Censored Regression Quantiles", Journal of Econometrics, 32, 143-155. [27] Powell, J.L. (1994) \Estimation of Semiparametric Models", in Engle, R.F. and D. McFadden (eds.) , Handbook of Econometrics, Vol. 4, Amsterdam: North-Holland. [28] Rosenbaum, P.R. and D.B. Rubin (1983) \The Central Role of the Propensity Score in Observational Studies for Causal Eects", Biometrica, 70, 41-55 [29] Ruppert, D. and M.P. Wand (1994) \Multivariate Locally Weighted Least Squares Regression", Annals of Statistics, 22, 1346-1370. [30] Van Keilegom, I. and M.G. Akritas (1999) \Transfer of Tail Information in Censored Regression Models", Forthcoming, Annals of Statistics.

A Appendix A.1 Proof of Theorem 3.1 In this section, we prove the limiting distribution results stated in the theorem. Throughout this section, we adopt new notation. Here we let i ; i ; wi ; wî ; q^0i ; q^1i ; q^2i ; q0i ; q1i ; q2i ; q^0 ; q^1 ; q^2 ; q0 ; q1 ; q2 ; Cni ; Cn ; Nni denote (xi ); (xi ); w(q0:5 (xi )); w(^q0:5 (xi )); q^0:5 (xi ); q^1 (xi ); q^2 (xi ); q0:5 (xi ); q1 (xi ); q2 (xi ); q^0:5 (x); q^1 (x); q^2 (x); P q0:5 (x); q1 (x); q2 (x); Cn (xi ); Cn (x); j6=i I [xj 2 Cn (xi )], respectively. Noting that the conditional median function is estimated in both the rst and second stages, we let q^0(pi ) denote the second stage local polynomial estimator, to distinguish it from the rst stage local constant estimator. PAlso,2 we1=2let ^; denote ^(x); (x) respectively. For a matrix A, with elements faij g, we let kAk denote i;j aij . Wenote that since we aim to prove that the estimator converges at the optimal nonparametric rate of p Op n? 2p+dc , we will use the term \asymptotically negligible" when referring to remainder terms which are

17

p op n? 2p+dc . Our proof will rely heavily on three previously established properties of the nonparametric conditional quantile estimator used. The rst is a uniform rate of convergence of the local constant estimator used in the rst stage. The rate is uniform over regressor values for which the conditional median function is bounded away from the censoring point. We denote this set of regressor values as X fxi 2 X : q0i g.

Lemma A.1 (From Chaudhuri et al. Lemma 4.3a) Under Assumptions RS, RD, ES, OS, BC.1, sup jq^0i ? q0i j = op (1)

xi 2X

The second previously established property is an exponential bound for the local constant and local polynomial estimators for regressor values in a neighborhood of the censoring point:

Lemma A.2 (From Lemma 2 in Chen and Khan(1999b)) Let X=c 2 denote the set fxi 2 X ; q0i =2g and let An denote the event:

fq^0i for all xi 2 X=c 2 g then under Assumptions RS,RD,ED,OS,BC.1, there exists constants C1 ; C2 such that d

P (An ) C1 e?C2 nh1nc The third property of the conditional quantile estimator is the local Bahadur representation developed in Chaudhuri(1991a) and Chaudhuri et al.(1997).

Lemma A.3 (From Lemmas 4.1 and 4.2 in Chaudhuri et al.(1997)) Let q (xi ; x) denote the kth order Taylor polynomial approximation of q (xi ) for xi close to x. Under assumptions RS, RD, ED, OS, and BC, for all 0:5, x : q0:5 (x) , we have the following linear representation for the local polynomial estimator used in the second and third stages:

q^ (x) ? q (x) =

1

n X

(I [y nhd(2c;3)nfY;X (q (x); x) i=1 i

q (xi ; x)] ? ) I [xi 2 Cn (x)] + Rn (x)

(A.1)

where h(2;3)n denotes the bandwidth used either in the second or third stages, and the remainder term satis es:

?p

sup Rn (x) = op n 2p+dc

x2X

18

The main step in the proof is to show that the dierence between the constants cc1 ; cc2 and their estimators c^1 ; c^2 , are asymptotically negligible. We only show this result for the rst quantile, as the same arguments can be used for the second quantile. We let c1 denote the constant cc1 and let i denote qq21ii ??qq10ii and î its estimated value, obtained by replacing quantile functions with their local polynomial estimators. We adopt the convention 0/0=0, and de ne

Pn

i wî c1 cy1 = Pi=1 n w^ i=1 i i

and we note that it can be easily be shown that

P (cy1 6= c1 ) ! 0 by Assumption ID, WF and Lemma A.1. Thus it will suce to show that c^1 ? cy1 is asymptotically negligible. This dierence is of the form: 1 Pn i wî ( î ? c1 ) i=1P 1 n n i=1 i wî

c^1 ? cy = n 1

(A.2)

The following lemma shows that the denominator of the above expression converges in probability to a positive constant.

Lemma A.4 Under Assumptions WF,ID,ED,OS,RD,BC.1, n 1X p n i=1 i wî ! E [i wi ]

(A.3)

Proof : A mean value expansion of wî around wi yields: n n X 1X 1 0 n i=1 i wi + n i=1 i wi (^q0i ? q0i )

where i wi0 denotes the derivative of the weighting function evaluated at an intermediate value. We can decompose the summation involving this intermediate value as: n n X 1 1X 0 0 n i=1 i wi (^q0i ? q0i )I [q0i =2] + n i=1 i wi (^q0i ? q0i )I [q0i < =2]

It follows by the bound on the derivative of the weighting function and Lemmas A.1, A.2 that each of these P p terms is op (1). The LLN implies that n1 ni=1 i wi ! E [i wi ].

19

?p

Thus it will suce to show the numerator term in (A.2) is op n 2p+dc . To do so, we take a mean value expansion of wî around wi , yielding the terms: n 1X ^ n i=1 i wi ( i ? c1 ) n X 0 + n1 i wi (^q0i ? q0i )( î ? c1 ) i=1

(A.4) (A.5)

where i wi0 again denotes the derivative of the weighting function evaluated at an intermediate value. The following lemma establishes the asymptotic negligibility of (A.4).

Lemma A.5 Under Assumptions WF, RD, ED, OS, BC1, n ?p 1X i wi ( î ? c1 ) = op n 2p+dc n i=1

(A.6)

Proof : Note that i wi c1 = iwi i. We linearize î ? i to yield:

n n (p) 1X 1X ? 1 ^ ^ ? q ? q w ( ? ) = w q q ? q ^ n i=1 i i i i n i=1 i i i i 0i i 0i n X 1 i ? q0i ? n i wi q( qi )2 (^qi ? qi ) i=1 + Rn

where

Rn = O p

(A.7) (A.8) (A.9)

!

n 1X (p) 2 2 2 i wi jq^2i ? q2i j + jq^1i ? q1i j + jq^0i ? q0i j n i=1

It follows by Lemma 4.1 in Chaudhuri et al.(1997) that 0 s !21 Rn = Op @ logdnc + hp2n A nh2n and is thus asymptotically negligible by Assumption BC.2. The expressions in (A.7) and (A.8), are sample averages of undersmoothed conditional quantile estimators. We will thus only show that n ?p 1X q i wi (^ 1 i ? q1i ) = op n 2p+dc n i=1

(A.10)

as similar arguments may used for the other terms. (A.10) follows from the same arguments used in Lemma 2 of Chen and Khan(1999b). The only dierence is in that paper, the smoothness and bandwidth conditions implied the bias term was op (n?1=2 ), whereas in this case, using Assumptions OS, BC.2, the bias term that ?p is op n 2p+dc . The following lemma shows that (A.5) is also asymptotically negligible.

20

Lemma A.6 Under Assumptions WF,RD,ED,OS,BC.1,BC.2, n 1X 0 q0i ? q0i )( î ? c1 ) = op n 2p?+pdc i wi (^ n i=1

(A.11)

Proof: We multiply the left hand side of the above expression by I [q0i =2] + I [q0i < =2], to separate the terms where the median function is bounded away from 0, from the terms where it is not. Terms where q0i < =2 are asymptotically negligible by Lemma A.2, since i wi0 > 0 ) q^0i . For the terms where q0i =2, note that c1 = i , and we can apply the uniform rates of convergence in Chaudhuri(1991a,b) and Chaudhuri et al.(1997) after linearizing the dierence î ? i as before. We note that the uniform rates for the local constant estimator and the local polynomial estimator are dierent, but it will follow by Assumptions BC.1,BC.2, and OS, that their product will be asymptotically negligible. To make this argument precise, we note from the arguments used in Lemma 4.1 of Chaudhuri et al.(1997) that the uniform rate for the local constant estimator is

Op

s

log n + h2 nhd1nc 1n

!

and for the local polynomial estimator it is

Op

s

log n + hp nhd2nc 2n

!

Letting k k1 denote max1in j j, we note that n 1X 0 ^ n i=1 i wi I [q0i =2](^q0i ? q0i )( i ? i )

is of order:

kq^0i ? q0i k1 kq^1i ? q1i k1 + kq^0i ? q0i k1 kq^2i ? q2i k1 + kq^0i ? q0i k1 kq^0(pi ) ? q2i k1 which by the states uniform rates is

s

Op

log n + h2 nhd1nc 1n

?p

! s

log n + hp nhd2nc 2n

!!

which is op n 2p+dc by Assumptions OS,BC.1,BC.2.

Combining all our results, we can now replace the estimated constants with their true values:

?p ^(x) ? (x) = cc2 (^q1 ? q1 ) ? cc1 (^q2 ? q2 ) + op n 2p+dc

The limiting distribution of the estimator follows from (A.1).

21

(A.12)

n=100 (0.385) 1

m(x) µ(x)

0.5

0

−0.5

−1 −1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0.2

0.4

0.6

0.8

1

0.4

0.6

0.8

1

0.4

0.6

0.8

1

n=400 (0.123) 1

m(x) µ(x)

0.5

0

−0.5

−1 −1

−0.8

−0.6

−0.4

−0.2

0

Figure 1: (x) = x

n=100 (0.281) 1

m(x) µ(x)

0.8 0.6 0.4 0.2 0 −0.2 −0.4 −1

−0.8

−0.6

−0.4

−0.2

0

0.2

n=400 (0.060) 1

m(x) µ(x)

0.8 0.6 0.4 0.2 0 −0.2 −0.4 −1

−0.8

−0.6

−0.4

−0.2

0

0.2

Figure 2: (x) = x2 ? C1

22

n=100 (0.290) 0.6

m(x) µ(x)

0.4 0.2 0 −0.2 −0.4 −1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0.2

0.4

0.6

0.8

1

n=400 (0.060) 0.6

m(x) µ(x)

0.4 0.2 0 −0.2 −0.4 −1

−0.8

−0.6

−0.4

−0.2

0

Figure 3: (x) = 0:5 x3

n=100 (0.450) 2

m(x) µ(x)

1.5 1 0.5 0 −0.5 −1 −1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0.2

0.4

0.6

0.8

1

n=400 (0.096) 2

m(x) µ(x)

1.5 1 0.5 0 −0.5 −1 −1

−0.8

−0.6

−0.4

−0.2

0

Figure 4: (x) = ex ? C2

23