ON BANDWIDTH CHOICE IN NONPARAMETRIC ... - Project Euclid

27 downloads 15930 Views 314KB Size Report
cross-validation, for choosing the bandwidth in nonparametric regression when errors have an almost arbitrarily long range of dependence. A novel analytical ...
The Annals of Statistics 1995, Vol. 23, No. 6, 1921 ] 1936

ON BANDWIDTH CHOICE IN NONPARAMETRIC REGRESSION WITH BOTH SHORT- AND LONG-RANGE DEPENDENT ERRORS

¨ POLZEHL BY PETER HALL, SOUMENDRA NATH LAHIRI AND JORG Australian National University, Iowa State University and Konrad-Zuse-Zentrum fur ¨ Informationstechnik We analyse methods based on the block bootstrap and leave-out cross-validation, for choosing the bandwidth in nonparametric regression when errors have an almost arbitrarily long range of dependence. A novel analytical device for modelling the dependence structure of errors is introduced. This allows a concise theoretical description of the way in which the range of dependence affects optimal bandwidth choice. It is shown that, provided block length or leave-out number, respectively, are chosen appropriately, both techniques produce first-order optimal bandwidths. Nevertheless, the block bootstrap has far better empirical properties, particularly under long-range dependence.

1. Introduction. In three seminal papers on nonparametric regression with short-range dependent data, Altman Ž1990., Chu and Marron Ž1991. and Hart Ž1991. addressed both the failure of cross-validation and the sort of remedy that might be appropriate to correct it. Chu and Marron considered a modified or ‘‘leave-k-out’’ form of cross-validation, and argued that, for processes exhibiting short-range dependence, this approach may produce asymptotically optimal performance if k is chosen to increase with sample size at an appropriate rate. On the other hand, the method of partitioned cross-validation was shown by Chu and Marron to be relatively unsuccessful in producing asymptotically optimal bandwidths. In this note we take up the argument where it was left by Chu and Marron, and demonstrate that, even in the context of very-long-range dependent data, both modified cross-validation and a form of the block bootstrap produce asymptotically optimal bandwidths. We develop a simple asymptotic device that allows very long ranges of dependence to be modelled and analysed with relative ease. For example, it permits an elementary account of the way in which leave-out number Žin cross-validation. or block size Žfor the block bootstrap. should depend on strength of dependence if first-order optimality of bandwidth choice is to be achieved. Thus, even in the context of short-range dependence and leave-k-out cross-validation we complement Chu and Marron’s results by indicating the sort of leave-out numbers or block

Received May 1994; revised May 1995. AMS 1991 subject classifications. Primary 62G07, 62G09; secondary 62M10. Key words and phrases. Bandwidth choice, block bootstrap, correlated errors, cross-validation, curve estimation, kernel estimator, local linear smoothing, long-range dependence, mean squared error, nonparametric regression, resampling, short-range dependence.

1921

1922

P. HALL, S. N. LAHIRI AND J. POLZEHL

sizes that are required. Furthermore, we address the block bootstrap approach to both local and global bandwidth choice. These theoretical results are described in Section 2, for which the technical details are outlined in Section 4. Section 3 summarizes the conclusions of a simulation study. That work makes it clear that while leave-k-out cross-validation has first-order theoretical properties similar to those of the block bootstrap, its empirical performance is very poor under long-range dependence. This is due to a marked tendency for cross-validation to select a bandwidth that is almost identical to the smallest one producing a well-defined cross-validation criterion. The problem becomes more pronounced as the range of dependence increases, with the result that leave-k-out cross-validation could not really be considered to perform satisfactorily with a variety of ranges of dependence. The block bootstrap is much more satisfactory. A leave-k-out cross-validation method was also considered by Hart and Vieu Ž1990., in the context of density estimation. Hart and Wehrly Ž1986. studied bandwidth selection when measurements are repeated; Hardle and ¨ Vieu Ž1992. addressed leave-one-out cross-validation with mixing errors; Chiu Ž1989., Diggle and Hutchinson Ž1989., Hermann, Gasser and Kneip Ž1992. and Kohn, Ansley and Wong Ž1992. discussed other aspects of bandwidth choice for dependent data; and Hart Ž1994. introduced the method of time series cross-validation, appropriate when the dependence structure may be modelled parametrically. Surveys of the literature on nonparametric regression under dependence may be found in Gyorfi, Sarda and Vieu ¨ Hardle, ¨ Ž1989. and Hardle wŽ 1990., Chapter 7x . The analysis of bootstrap methods for ¨ approximating error in curve estimation with independent data was initiated by Taylor Ž1989. and Faraway and Jhun Ž1990.. The block bootstrap for dependent data was developed by Hall Ž1985., Carlstein Ž1986. and Kunsch ¨ Ž1989.. 2. Main results. 2.1. Estimators and basic properties. As in Altman Ž1990., Chu and Marron Ž1991. and Hart Ž1991. we suppose that the observed data X s  Yi , 1 F i F n4 are generated by the model Yi s mŽ x i . q « i , where x i s Ž i q c .rn for a constant c, m is a smooth function and  « i 4 is a stationary sequence with zero mean. Let wi denote a weight function. We take

Ž 2.1.

m ˆ Ž x. s

n

Ý wi Ž x . Yi

is1

as our estimator of m. One candidate for wi , producing the Nadaraya] Watson kernel estimator treated by Chu and Marron Ž1991., is

Ž 2.2.

wi Ž x . s K  Ž x y x i . rh4

n

Ý K  Ž x y x j . rh4

js1

y1

,

1923

BANDWIDTH CHOICE

where K is a kernel function and h is a bandwidth. Another, the local linear regression smoother proposed by Fan Ž1993., is

Ž 2.3.

½Ý

vj Ž x . q n

 s 2 y Ž x y x i . s1 4

and sks

wi Ž x . s vi Ž x .

n

y2

js1

5

y1

,

where vi Ž x . s K

½

x y xi h

5

n

Ý

K

js1

½

x y xj h



x y xj.

k

for k s 1, 2. Ž1979. Alternative choices of wi include those proposed by Gasser and Muller ¨ or a simpler version of the Nadaraya]Watson prescription in which the denominator in Ž2.2. is replaced by nh. Our results have straightforward analogues in these cases. The mean squared error ŽMSE. and mean integrated squared error ŽMISE. of m ˆ are given by

Ž 2.4.

MSE Ž x . s E m ˆ Ž x . y mŽ x . 4 , 2

MSE s

HI E Ž mˆ y m .

2

,

where I : Ž0, 1.. We model the dependence of the errors by taking  « i , 1 F i F n - `4 to be a triangular array, with the nth row having a joint distribution determined by defining « i s ZŽ l x i ., 1 F i F n, where Z is a stationary stochastic process in the continuum. We assume that Z has zero mean and autocovariance g , and take l s l n to be a sequence of positive numbers that would typically increase with n. Under this model, EŽ « i « j . s g  lŽ x i y x j .4 , and l may be interpreted as a measure of the strength of dependence of the process  « i 4 , with larger values of l indicating weaker dependence. In particular, l s ` corresponds to independence, and lrn ª ` to asymptotic independence, in the sense that first-order asymptotic properties of m ˆ are identical to those under independence. We always assume that l ª ` as n ª `. If l does not diverge, then the amount of statistical information contained in any given sequence  Yi : x i g Ž a, b .4 , for any a - b, does not generally increase with increasing n. ŽFor example, consider the case where the process Z is Gaussian.. The classical description of dependence among errors in nonparametric regression arises when l ' n, the results in so-called ‘‘time-series errors’’ with EŽ « i « j . s g Ž i y j .. This is the context studied by Altman Ž1990., Chu and Marron Ž1991. and Hart Ž1991.. Note particularly that under our model the sum of autocovariances, sn ' Ý nis1 EŽ « iq1 « 1 ., is not necessarily bounded. Indeed, if lrn ª 0 and Hg / 0, then sn is asymptotic to a constant multiple of nrl and so is unbounded, implying that the data exhibit long-range dependence. As a prelude to describing asymptotic properties of mean squared error under our model, we assume that m has two bounded, continuous derivatives on the interval w 0, 1x ; that K satisfies the usual conditions of a second-order kernel Ži.e., Hy i K Ž y . dy s 1 if i s 0, 0 if i s 1 and 2 k , say, if i s 2. and is

1924

P. HALL, S. N. LAHIRI AND J. POLZEHL

` compactly supported, Holder continuous and of bounded variation, with Hyx K ¨ x and Hy` K bounded away from zero for all x G 0; that g is integrable, ultimately monotone and satisfies Hg / 0; that h s hŽ n. ª 0 and minŽ l, n. h ª ` as n ª `; and that lrn ª C, where 0 F C F `. ŽThe condition that K be compactly supported is imposed only for simplicity in technical arguments, and may be relaxed. In particular, our results are all valid if K is the standard normal density.. Define RŽ K . s HK 2 and L s minŽ n, l., and let b s mY y 1 if the weights are given by Ž2.2., b s mY if they are given by Ž2.3..

THEOREM 2.1.

If C s 0 or `, then

½

E m ˆ Ž x . y m Ž x . 4 s R Ž K . Ž nh . 2

Ž 2.5.

y1

g Ž 0. q Ž l h .

q h4k 2b Ž x . q o  Ž L h . 2

y1

y1

žH / 5 g

q h4 4

uniformly in x g Ž d , 1 y d . for each d ) 0, and

HI E mˆ Ž x . y m Ž x . 4 Ž 2.6.

s

HI

2

dx

½

R Ž K . Ž nh .

q o Ž L h.

y1

y1

g Ž 0. q Ž l h .

y1

žH / 5 g

q h4k 2b Ž x .

2

dx

q h4 4

uniformly in measurable sets I : Ž d , 1 y d .. If 0 - C - `, then

½

E m ˆ Ž x . y m Ž x . 4 s R Ž K . Ž nh . 2

Ž 2.7.

y1

g Ž 0. q Ž l h .

y1

Ý g Ž Ci .

i/0

q h 4k 2b Ž x . q o  Ž L h . 2

y1

5

q h4 4

uniformly in x g Ž d , 1 y d . for each d ) 0, and

HI E mˆ Ž x . y m Ž x . 4 Ž 2.8.

s

HI

½

2

dx

R Ž K . Ž nh .

q o Ž L h.

y1

y1

g Ž 0. q Ž l h .

y1

Ý g Ž Ci .

i/0

5

q h4k 2b Ž x .

2

dx

q h4 4

uniformly in measurable sets I : Ž d , 1 y d .. If the weights at Ž2.3. are employed, then Ž2.6. and Ž2.8. are available uniformly in all J : Ž0, 1.. Locally and globally optimal bandwidth choices are obtained by minimizing the right-hand sides of Ž2.5. ] Ž2.8.. In all cases the optimal bandwidth is asymptotic to a constant multiple of Ly1 r5, giving a convergence rate of Ly4 r5 in terms of mean squared error. In the special case when C s 0, a version of Ž2.5. has been proved by Hart Ž1987. for the Gasser]Muller ¨ estimator of m.

1925

BANDWIDTH CHOICE

2.2. Bandwidth choice by block bootstrap. Let m ˆ 1 and m ˆ 2 denote two estimators of m that are constructed according to the prescription at Ž2.1., but employing respective values h1 and h 2 of the bandwidth h. In what follows, m ˆ 1 will be used to compute centred residuals and m ˆ 2 to generate bootstrap data. Put «ˆ1 i s Yi y m ˆ 1Ž x i ., «ˆ1?s Ny1 ÝX«ˆ1 i and «ˆi s «ˆ1 i y «ˆ1?, where ÝX denotes summation over all N design points x i that lie within I : Ž0, 1.. These are the centred residuals. Shortly we shall define the bootstrap errors « iU . In terms of those, let YiU s m ˆ 2 Ž x i . q « iU ,

m ˆU Ž x . s

n

Ý wi Ž x . YiU ,

is1

where wi is exactly as in Ž2.1.. Our estimators of MSE and MISE are $

2 MSE Ž x . s E  m ˆU Ž x . y m ˆ 2Ž x. 4