Estimation of the density and the regression

0 downloads 0 Views 229KB Size Report
condition is weaker than the -mixing one, and the -mixing property implies the. '-mixing one. .... Let f(Xt; Yt)g be a stationary -mixing sequence of random variables from Rs. R. .... ei is the i-th unit vector of Rs and the spectral radius of A satis es (A) < 1. ...... 4] Bosq, D. (1996) Nonparametric statistics for stochastic processes.
Estimation of the density and the regression function under mixing conditions Preprint M12/98 Eckhard Liebscher Technical University Ilmenau Institute of Mathematics D-98684 Ilmenau/Thür. Germany

Abstract

In this paper we derive rates of strong convergence for the kernel density estimator and for the Nadaraya-Watson estimator under the -mixing condition and under the condition of absolute regularity. A combination of an inequality of Bernstein type (Rio 1995) and an exponential inequality (cf. Fuk/Nagaev 1971) is the crucial tool for the proofs. Moreover, we consider the application of the main statements to the situation of autoregression. key words: strong convergence, mixing conditions, kernel density estimators, Nadaraya-Watson estimator AMS Subject Classication: 62G07

1. Introduction In this paper we examine strong convergence of nonparametric estimators (kernel density estimator, Nadaraya-Watson estimator for the regression function) under mixing condition as well as under -mixing one. Considering the case of independent samples, there is a large range of literature on the topics of density and regression function estimation. Here we refer to the excellent accounts [25] by Silverman, [7] by Eubank and [11] by Härdle. In the case of dependent data, the study of nonparametric estimators for mixing sequences is very popular. Under the - or the -mixing condition, the uniform strong convergence of the Nadaraya-Watson estimator and the convergence of its MISE were treated in the monographs [10] by Györ et al., [4] by Bosq, and in papers by Roussas ([23]) and by Ango Nze and Doukhan ([1], [2]). In the case of '-mixing sequences, Collomb ([5]) and Györ et al. ([10]) derived results concerning strong convergence of the Nadaraya-Watson estimator. In their monograph [8] (Chapter 6) Fan and Gijbels established a central limit theorem for the local polynomial estimator of the regression function for - and -mixing sequences. The regression function of a xeddesign regression model can also be estimated by the Gasser-Müller estimator or the Priestley-Chao one. Accounts about such estimators are due to Hart ([13]), Herrmann et al. ([14]), Wu and Chu ([28]), Roussas ([24]) and due to the author ([17], [18]). Estimation techniques of nonparametric regression are closely related to density estimation. In the case of stationary mixing sequences, kernel density estimators have been studied extensively in the literature. We refer to papers [22] by Roussas, [26] by Tran, [20] by Mokkadem, [27] by Vieu, [3] by Ango Nze and Rios, [30] by Yu and the author's paper [16], in which - and -mixing sequences are considered. In the papers [5] by Collomb and [15] by Liebscher, the '-mixing condition is assumed. In Section 2, we provide convergence rates for kernel density estimators. These rates represent a slight improvement on those rates derived in the paper [3] by Ango Nze and Rios and in the author's paper [16]. In Section 3, we state the main results of this paper: we provide rates of uniform strong convergence for the Nadaraya-Watson estimator of the regression function under the -mixing condition as well as under the -mixing one. We extend the results [1] of Ango Nze and Doukhan, and [4] of Bosq. The results of Section 3 are 1

based on an inequality which is a combination of a Bernstein-type inequality (cf. [21], [16]) and an exponential inequality which is similar to an inequality proved by Fuk and Nagaev (cf. [9]). In contrast to other authors, we give not only sucient conditions for the optimal convergence rate. The optimal convergence rate is that rate we get by minimizing w.r.t. the bandwidth under the assumption of independent data. We also study situations in which the convergence rate is slow and the optimal rate cannot be achieved. In Section 4, we apply the results of Section 3 in order to obtain rates of strong convergence of the kernel estimator for the autoregression function. The result of Section 4 extends a statement by Masry and Tjøstheim (cf. [19]). The proofs of the results are carried out in Sections 5 to 8. Let fZk ; k = 1; 2:::g be a sequence of random variables. Flm denotes the algebra generated by Zl ; Zl+1; :::; Zm. The -mixing coecient of fZk g and the mixing one are dened by 



m : = sup sup jP(A \ B ) ? P(A)P(B)j : A 2 Fm1+k ; B 2 F1k ; k=1;2:::

m : = sup

k=1;2:::

E

?

sup



 P



(A j F1k ) ? P(A) : A 2 Fm1+k :

If m ! 0 or m ! 0 as m tends to innity, we say that the sequence fZk g is mixing (strong mixing) or -mixing (absolutely regular), respectively. The -mixing condition is weaker than the -mixing one, and the -mixing property implies the '-mixing one. There are several examples of mixing sequences (cf. [6]). It is known that, under mild additional assumptions, stationary ARMA-sequences are -mixing with exponentially decreasing mixing coecients.

2. Kernel density estimators Let fXt g be a stationary -mixing sequence of random variables from Rs. We assume that Xt has the density f . The kernel estimator for f reads as follows: n

X ?  f^n(x) = n?1b(n)?s K (x ? Xk )b(n)?1 (x 2 Rs)

k=1

where b(n) is the bandwidth satisfying lim b(n) = 0; nlim (nbs (n)= ln(n)) = +1: n!1 !1 2

(2.1) (2.2)

In this section we give strong convergence rates for supx2D jf^n(x) ? f (x)j where D  Rs is a compact set. Here we need the following assumptions on the kernel function K: Condition K(p), p  1: Let k:k be the Euklidean norm of Rs. Z

sup jK (t)j < +1;

l Y

Z

R j =1 s

t2Rs

R

s

K (t) dt = 1;

K (t) = 0 for ktk > 1;

zi K (z1; : : :; zs) d(z1; : : : ; zs) = 0 (i1; : : :; il 2 f1; : : : ; sg; l = 1 : : : p ? 1): 2 j

Condition K(q ): In the case q = 1, the kernel K is Lipschitz-continuous on

In the case q  2, the partial derivatives of order q ? 1 of K exist and are Lipschitz-continuous on Rs. 2 Condition Ke (2): s = 1, K is Lipschitz-continuous on R, K 0 exists on (?1; 1) and is Lipschitz-continuous there. 2

Rs.

Kernels fullling condition Ke(2) may have a discontinuous derivative at ?1 or 1. The univariate Epanechnikov kernel satises condition Ke (2). The assumption that K vanishes on ft : ktk > 1g is only a technical one and may be weakened in some way. In the sequel we determine an additional assumption on the distributions of fXk g where D = fx : 9y 2 D : kx ? yk < "g with some " > 0. Condition J : For all integers j  j0 , the joint density fj of X1 and Xj +1 exists

on Rs. Furthermore, for some  > 0, jfj (x; y)j  C1 8j  j0; 8x; y 2 D : kx ? yk < ; f (x)  C2 8x 2 D with some positive constants C1; C2. 2

Now we state a theorem on strong convergence of the density estimator. Here P L : N ?! R is an increasing function such that 1n=1 n?1 ln(n)?1L(n)?1 < +1. The function ln ln(n)1+ with  > 0 is a possible choice for L(n). Theorem 2.1. Suppose that k = O(k ?r ) (r > 1) is fullled. Let conditions J ,

K(p) and K(q) be satised for some integers p; q  1. Furthermore, assume that f 3

has bounded partial derivatives of order p on D . (i)

Then

s



n

sup f^ (x) ? f (x) = O x2D

?

ln(n) + v + bp(n) nb (n) n

!

a:s:

(2.3)



where vn = n1?r b(n)?s(r+2) ln(n)r+1 L(n) 1= ! 0,  = r + 1 + s=q and  := s for r  2,  := 2s=r for r < 2. In the case s = 1, q = 2, one can use assumption Ke(2) instead of K(2). (ii)

If, in addition q = 1 and k = O(k ?r ) (r > 1), then the rate (2.3) holds true with vn = (n1?r ln(n)r+1 L(n))1= b(n)?s ! 0,  = r + 1.

The proof of part (ii) of this theorem is essentially due to Ango Nze and Rios ([3]) and it is omitted. In the case q = 1, the rate of the previous theorem coincides with the rate of Theorem 4.2 of the author's paper [16], whereas the previous theorem improves that rate for q > 1. Now we provide conditions, under which the density estimator has the optimal convergence rate. Corollary 2.2. Let the assumptions of Theorem 2.1(i) be fullled. Under

either r > 3 + s=q + 3s=p or k = O(k ?r ) and r > 3 + 2s=p, we have the following optimized convergence rate (minimization w.r.t. b(n))



?

sup f^n(x) ? f (x) = O (ln(n)=n)p=(2p+s) x2D



a:s:

by choosing b(n) = const (ln(n)=n)1=(2p+s).

3. Nadaraya-Watson estimator Let f(Xt; Yt)g be a stationary -mixing sequence of random variables from Rs  R. Dene the regression function r(x) = E (Yt j Xt = x) (t = 1; 2 : : : ; x 2 Rs). The Nadaraya-Watson estimator for the regression function r is given by n X

r^n (x) =

Yk K ((x ? Xk )b(n)?1)

k=1 n X k=1

K ((x ? Xk )b(n)?1) 4

(x 2 Rs)

where b(n) is the bandwidth satisfying (2.2). In the sequel we give convergence rates for supx2D jr^n(x) ? r(x)j where D  Rs is a compact set. In the statements of this section, we use an assumption M on conditional moments of fYk g and a modication of condition J : Condition J 0: For all integers j  j0 , the joint density fj of X1 and Xj +1 exists

on Rs. Furthermore, for some  > 0, jfj (x; y)j  C1 8j  j0; 8x; y 2 D : kx ? yk < ; 0 < C3  f (x)  C2 8x 2 D with positive constants C1; C2; C3, D as in Section 2.2 Condition M( ), > 2:  E ( jYi j jXi = x)  C4 8x 2 D jE ( Yi Yj jXi = x; Xj = y)j  C5 8x; y 2 D : kx ? yk <  with constants ; C4; C5 > 0.

J 0, K(p), K(q) and M( ) be satised for some > 2r=(r ? 1) and some integers p; q  1. Theorem 3.1. Suppose that k = O(k ?r ) (r > 1). Let conditions

Furthermore, assume that f and r have bounded partial derivatives of order p on D . (i)

(ii)

Then

s

!

sup jr^n (x) ? r(x)j = O nbln((nn)) + vn + bp(n) a:s: (3.1) x2D where vn = n 1 b(n) 2 L 3 (n) ln 4 (n) ! 0 as n ! 1,  1 := ( (1 ? r) + r + 1)= ,  2 := ?s (r +2)=,  3 := ( + r +1)=,  4 := ( +1)(r +1)= ,  := (rq + s + q)=q and  := s for r  2( ? 1)=( ? 2),  := 2s( ? 1)=(r( ? 2)) for r < 2( ? 1)=( ? 2). In the case s = 1, q = 2, one can use assumption Ke(2) instead of K(2).

If, in addition, jYi j < C6 < +1 a:s: with some constant C6, then supx2D jr^n (x)? r(x)j has the convergence rate (2.3) of Theorem 2.1(i).

The rate (3.1) is close to the rate (2.3) for ! 1. For -mixing coecients decaying to zero exponentially fast and for -mixing sequences, we obtain faster convergence rates for r^n. 5

Theorem 3.2. Suppose that k = O(k ) (0 <  < 1). Assume that conditions

J 0, K(p), K(1) and M( ) are satised for some > 2 and some integer p  1. Let f and r have bounded partial derivatives of order p on D . Then s

sup jr^n(x) ? r(x)j = O x2D

ln(n) + v + bp(n) nbs (n) n

!

a:s:

(3.2)

where vn = n(1? )= b(n)?s L(n)1= ln(n)(2 +1)= ! 0 as n ! 1. Theorem 3.3. Assume that the assumptions of Theorem 3.1(i) are fullled for

q = 1. Moreover let k = O(k?r ) (r > 1). Then s

sup jr^n(x) ? r(x)j = O x2D

ln(n) + v + bp(n) nb (n) n

!

a:s:

where vn = n( +r+1?r )= ln(n)(2 +r+1+ r)=L(n)( +r+1)= b(n)?s ! 0 as n ! 1,  = (r + 1) ,  as in Theorem 3.1(i).

The following corollary gives conditions, under which the Nadaraya-Watson estimator has the optimal convergence rate. Corollary 3.4. Let the assumptions of Theorem 3.1(i) be fullled. Under

either r > 3 + s=q + 3s=p and > (2rp + rs + 2p + s)=(rp ? 3p ? 3s ? sp=q) or k = O(k ) (0 <  < 1) and > 2 + s=p or k = O(k ?r ); r > 3 + 2s=p and > (2rp + rs + 2p + s)=(rp ? 3p ? 2s), we have the following optimized convergence rate ?

sup jr^n(x) ? r(x)j = O (ln(n)=n)p=(2p+s) x2D



a:s:

by choosing b(n) = const (ln(n)=n)1=(2p+s).

4. Special case of autoregression Let ftgt=1;2::: be a strictly stationary sequence of real random variables which follows the autoregressive model 6

 t = g( t?1; : : :;  t?s) + Zt

(t = s + 1; s + 2 : : :)

(4.1)

where fZtg is a sequence of i.i.d. random variables which are independent of 1 : : :  s. We assume that E Zt = 0 and g : Rs ! R is a measurable function. Here the Nadaraya-Watson estimator for the autoregression function g is given by n X

g^n (x) =

k=1

k+1 K ((x ? Xk )b(n)?1)

n X k=1

K ((x ? Xk )b(n)?1)

(x 2 Rs)

where Xt = (t; : : : ; t?s+1 )T . Assume that Xt has the density f . Putting Yk =  k+1, g^n is a special case of the Nadaraya-Watson estimator studied in Section 3. Condition G : The function g is nonperiodic and bounded on compact sets. Zt

has a density function that is positive on R. Moreover

g(y) = aT y + o(kyk) as kyk ! 1 (a; y 2 Rs), A = (a; e1; e2; : : :; es?1) 2 Rs;s, ei is the i-th unit vector of Rs and the spectral radius  of A satises (A) < 1. Condition G is Assumption 3.2 of Masry and Tjøstheim ([19]) and ensures that the sequence f tg is -mixing with mixing coecients k = O(k ) (0 <  < 1). This condition G may be replaced by other conditions implying k = O(k ) (cf. [6]). The following Theorem 4.1 gives the rate of uniform strong convergence of g^n on the compact set D  Rs where D is determined as in Section 2. Theorem 4.1. Assume that f and g have bounded partial derivatives of order

p  1 on D . Assume that E jZt j < +1 for some > 2. Let the conditions G , J 0, K(p) and K(1) be fullled for some integer p  1. Then supx2D jg^n(x) ? g(x)j has

the convergence rate (3.2).

Extending this result to heteroscedastic models is straightforward (cf. [19]). According to Corollary 3.4, supx2D jg^n (x) ? g(x)j has the optimal convergence rate ?  O (ln(n)=n)p=(2p+s) a:s: provided that > 2 + s=p.

7

5. A Bernstein-type inequality Following the line of the proof of Theorem 4 in the paper [9] by Fuk and Nagaev, one easily proves the following theorem. Theorem 5.1. Let  1 : : : n be independent random variables with E  i = 0.

Then, for all "; y > 0,

( n X P i i=1 !?1



 I (j ij  y) > "

8
y g

;

The main dierence between Theorem 5.1 and the usual inequality of Bernsteintype is that the mean of  iI (j ij  y) is not zero. In the sequel we apply Theorem 5.1 in order to get an inequality of Bernstein-type which is similar to that proved by the author (cf. [16], Theorem 2.1). Proposition 5.2. Let fZi gi2N be a stationary -mixing sequence of real r.v.

with the mixing coecients f j g. Assume that E Z1 = 0 and E Z12 < +1. Then, for n; N 2 N; 0 < N  n=2, for S; " > 0, ( n  X P i i=1:::n i=1 ?1 n X



Z I max jZi j  S > "

(



" nN ?1 D + 1 "SN  4 exp ? 16 N 3 2

P

+

i=1

)

P fjZi j > S g

)

+ 32 S" n N



where DN = maxj2N D 2 ji=1 Zi . Proof. Let M; N; R be positive integers such that n = 2MN + R with 0 < R  8 8 2N . Dene < N < N for i < M ? 1; for i < M ? 1; Ni := : Ri := : 2N for i = M ? 1; R for i = M ? 1:

Let S > 0 be arbitrary but xed. We decompose the sum Tn =

Tn = Tn1 + Tn2; Tn1 := 8

M ?1 X i=0

Vi ; Tn2 :=

M ?1 X i=0

Pn

i=1 Zi

Vi;

as follows

Vi : = UiI (jUij  2SN ); Ui := Vi : = UiI (jUij  2SN ); Ui :=

Ni X j =1 Ri X j =1

Z2iN +j ; Z2iN +N +j i

(i = 0 : : : (M ? 1)). For " > 0, we have ( n  X P i i

i=1



)

Z I max jZij  S > "  P

o n o " " jTn1j > 2 + P jTn2j > 2 :

n

(5.1)

Analogously to [21], we obtain that there are independent random variables V0 : : : VM ?1 such that, for i = 0 : : : M ? 1, Vi has the same distribution as Vi and E jVi ? Vi j  8SN N ; M ?1 X i=0

E jVi

? Vij  4Sn N :

Moreover, there are independent random variables U0 : : : UM ?1 such that, for i = 0 : : : M ? 1, Ui has the same distribution as Ui. By Markov's inequality and Theorem 5.1,

 

P fjTn1j > "=2g ( ) ( ) ?1 M ?1 M X " " X  P Ui I (jUij  2SN ) > 4 + P jVi ? Vij > 4 i8 =0 i=0 9 !?1 M ? 1 M ?1 = < 2 X X " 2 D 2 U + "SN + P fjUi j > 2SN g 2 exp :? 16 i ; 3 i=0 i=0

+16 S" n N ( )  ?1 n 2 X " 1  2 exp ? 16 nN ?1 DN + 3 "SN + P fjZij > S g + 16 S" n N (5.2) i=1

for " > 0. The same bound holds for P fjTn2j > "=2g. The lemma follows from (5.1) and (5.2). 2

6. Proofs concerning density estimators Let fXk g be a stationary -mixing sequence of random variables. Let fng be a sequence of positive real numbers such that limn!1 (n =b(n)) = 0. Let D  Rs be a compact set. So it can be covered with s-dimensional cubes E1; : : :; E having edges 9

of length n and centres u1; : : :; u , respectively, where   C1  ?n s . In this section we use positive constants C1 : : : C6 which do not depend on n; N; "; x or k. Lemma 6.1. Assume that k = O(k ?r ) (r > 1) and

sup jK (t)j < +1 and K (t) = 0 for ktk > 1:

(6.1)

t2Rs

Then we have



n

^ (uk ) ? E f^n (uk ) = O(an ) a:s: max f k=1::: p

(6.2)

where an = n?1=2b(n)?=2 ln(n) + n?1 b(n)?s ln(n)N , N = N (n) 2 N with N  n=2 and n  (n2L(n)N ?1?r )1=s,  as in Theorem 2.1. Proof. Applying the inequality of Proposition 5.2, we obtain that, for " > 0, n; N 2 N, nN  n= 2, o ^ ^ P max fn (uk ) ? E fn (uk ) > " k=1::: n o    sup P f^n(x) ? E f^n (x) > " x2D



n



o

 C2   exp ?C3 "2  n?1b(n)? + "Nn?1b(n)?s ?1 + ln(n)?1nN ?1?r : Since n  (n2L(n)N ?1?r )1=s, we have X1 X ?1 nN ?1?r  const  1 ?s ln(n)?1 nN ?1?r < +1:   ln( n ) n=1 n=1 n 

?

Now apply the Borel-Cantelli lemma in order to get the lemma. 2

Lemma 6.2. Assume that the assumptions of Theorem 2.1(i) are satised. Then

we have

n





^ (x) ? f^n (uk ) ? E f^n(x) ? f^n(uk ) = O(an + qn b(n)?q ) a:s: (6.3) f max sup k=1::: x2Ek

Proof. We prove this lemma only for s = 1. The proof in the multivariate case

is similar but more tedious. Observe that supx2E kx ? uk k  b(n) for n  n0 and, by Taylor expansion, k



K ((x ? Xi )b(n)?1) ? K ((uk ? Xi)b(n)?1)

?

q?1 X l=1



1 K (l)((u ? X )b(n)?1)(x ? u )lb(n)?l k i k l!

  (q ?C41)!  b(n)?q jx ? uk jq  I (jXi ? uk j  2b(n)) 10

for all x 2 Ek , k = 1 : : :  , n  n0 where jK (q?1)(u) ? K (q?1)(v)j  C4ju ? vj. A similar bound holds true with expectations. Hence sup f^n (x) ? E f^n (x) x2Ek



n

q?1





X  f^ (uk ) ? E f^n (uk ) + C5  lnb(n)?l f~n(l)(uk ) ? E f~n(l) (uk )

l=1   ~(q) q ? q (q) (q) ~ ~  +C6  n b(n) fn (uk ) ? E fn (uk ) + 2 E fn (uk )

(k = 1 : : :  , n  n0) where

n

X f~n(l)(u) : = n?1b(n)?1 K (l)((u ? Xi )b(n)?1) (u 2 R;l = 1 : : : q ? 1);

i=1

f~n(q)(u) : = n?1b(n)?1

n X i=1

I (jXi ? uj  2b(n)) (u 2 R):

f~n (u) : : : f~n (u) are kernel estimators for multiples of f (u) with kernel functions satisfying (6.1). The inequality ~(q) max E fn (uk )  4C2 k (q)

(1)

completes the proof. 2

One proves the following lemma in a similar way as done in the previous proof. Lemma 6.3. Assume that the assumptions of Theorem 2.1(i) with q = 2 and

condition Ke (2) instead of K(2) are satised. Then (6.3) holds true. Proof of Theorem 2.1(i): Let n := (n2 L(n)N ?1?r )1=s. Lemmas 6.1 to 6.3 imply



sup f^n(x) ? E f^n (x) x2D





p

= O n?1=2b(n)?=2 ln(n) + n?1 b(n)?s ln(n)N + qnb(n)?q a:s: A minimization w.r.t. N leads to n



sup f^ (x) ? E f^n (x) x2D 

= O

p ?  n?1=2b(n)?=2 ln(n) + nq?qr b(n)?sq(r+2) ln(n)q(r+1)Lq (n) 1=(s+qr+q)

The well-known identity completes the proof. 2



E n

sup f^ (x) ? f (x) = O(bp (n)) x2D

11



a:s:

7. Proofs concerning regression estimators Suppose that f(Xt; Yt)g is a stationary -mixing sequence of random variables. Let ?  Wni(x) = n?1 b(n)?sYi K (x ? Xi)b(n)?1 (x 2 Rs): Let D  Rs be a compact set. For the proof of the main results, we need the following lemma which gives bounds for variances and moments of Wni(x): Lemma 7.1. Assume that the assumptions of Theorem 3.1(i) are satised. Then

we have

sup D x2D

m X

2

i=1

!

Wni(x) = O

and

?

 mn?2b(n)? ;





? 2 ;1 ;  := s  max r2(

? 2)

?



sup E jWni (x)j = O n? b(n)? s+s : x2D

Proof. We deduce E i

sup Y K x2D

?

 (x ? Xi)b(n)?1 

=

 C4 C2 sup =

x2D

O(bs(n))

Z



R

?

K (x ? xi)b(n)?1

 

dxi

s

(7.1)

for 1    . By means of uniform boundedness of the densities fj , we obtain

?



?

sup E Yi K (x ? Xi )b(n)?1 Yj K (x ? Xj )b(n)?1 x2D Z Z

 sup x2D



R R s



?

jE ( YiYj jXi = xi; Xj = xj )j K (x ? xi)b(n)?1

s

 C1C5  b(n)



?





K (x ? xj )b(n)?1 fj?i (xi; xj ) dxidxj

2s

Z Z

R R s

jK (xi)jjK (xj )j dxidxj = O(b(n)2s ) for j ? i  j0.

s

Hence ? ?  ?  sup cov YiK (x ? Xi)b(n)?1 ; Yj K (x ? Xj )b(n)?1 = O(b(n)2s )

(7.2)

x2D

for j ? i  j0. Applying Lemma 2.3 of [16], by (7.2) and (7.1), we obtain sup D x2D

2

m X i=1

!

?



Wni (x)  O(m  n?2 b(n)?2s) bs(n) + b(n)2s(1?( ?1)=(r ?2r)) :

This inequality implies directly Lemma 7.1. 2 12

By standard arguments (cf. [19]), we obtain Lemma 7.2 which is utilized in the proofs of theorems of Section 3. Lemma 7.2. Let f k g be a sequence of random variables with E j i j   < +1

for some  1. Then

max j j = o (Tn) i=1:::k i

a:s: with Tn := (n ln(n)L(n))1= .

Next we proceed to prove Theorems 3.1 and 3.2. Observe that   ? 1 ^ ^ r^n (x) ? r(x) = (m^ n(x) ? r(x)f (x)) fn(x) ? fn(x) ? f (x) r(x)f^n(x)?1

(7.3)

(x 2 Rs, f^n (x) as in (2.1)) and

jm^ n (x) ? E m^ n (x)j     (x 2 Rs) (7.4) + j m ~ ( x ) j I max j Y j > T = jm~ n (x)j I imax j Y j  T n i n i n i=1:::n =1:::n P

where Tn as in Lemma 7.2, m~ n(x) = m^ n(x) ? E m^ n (x); m^ n(x) = ni=1 Wni (x). In this section we use positive constants C7 : : : C10 which do not depend on n; N; ". Lemma 7.3. Assume that the assumptions of Theorem 3.1(i) are satised. Then

we have where

sup jm^ n(x) ? r(x)f (x)j = O (an + bp(n)) a:s: x2D

p

an = ln(n)n?1=2b(n)?=2 + vn with vn and  as in Theorem 3.1.

Proof. Let fn g and fan g be sequences of positive real numbers such that

limn!1 n=b(n) = 0. Let ; E1; : : :; E ; u1; : : : ; u be determined as in the beginning of the previous section. Let T~n := supt2R jK (t)j n?1b(n)?s (Tn + E jY1 j). Note that s

max k=1:::

n X i=1

P

n X

o

n

jW~ ni(uk )j > T~n  T~n? kmax =1:::

i=1

~ ni (uk )j E jW

?



= O bs(n) ln(n)?1 ;

W~ ni (x) := Wni(x) ? E Wni (x). An application of Proposition 5.2 leads to n





o

max jm~ (u )j I imax jYij  Tn > "an =1:::n k=1::: n n k o ~ ni(uk )j  T~n   kmax P jm ~ ( u ) j > "a ; max j W n k n i=1:::n =1::: P



C8  n?s





exp ?C7" an 2 2



n?1 b(n)? + "a

+ "?1 ln(n)?1nN ?1 N 13



n N T~n

?1 

(7.5)

for all " > 0, n; N 2 N; N  n=2, a?n 1N T~n  ln(n)?1. Assume that n := (n2L(n)N ?r?1 )1=s. Therefore, the series 1 X n=1

P

n





max jm~ (u )j I imax jYij  Tn > "an =1:::n k=1::: n k

o

(7.6)

converges for some large " > 0 provided that

a2n  ln(n)n?1b(n)? ; an  TnNn?1 b(n)?s ln(n): Applying the Borel-Cantelli lemma, Lemma 7.2 and (7.4), we obtain max jm~ (u )j = O (an) a:s:; k=1::: n k p

an := ln(n)n?1=2b(n)?=2 + n(1? )= Nb(n)?s ln(n)( +1)= L(n)1= . Analogously to Lemma 6.2, one proves that max sup jm^ n (x) ? m^ n(uk ) ? E ( m^ n(x) ? m^ n(uk ))j = O(an + qn b(n)?q ) a:s:

k=1::: x2Ek

Combining the latter two identities and minimizing w.r.t. N , we get sup jm~ n(x)j = O (an) a:s:; an as above. x2D

From Härdle and Luckhaus ([12]), we take sup jE m^ n (x) ? r(x)f (x)j = O (bp(n))

(7.7)

x2D

which completes the proof. 2 Lemma 7.4. Under the assumptions of Theorem 3.2, we have

sup jm^ n(x) ? r(x)f (x)j = O (~an + bp(n)) a:s: x2D

where

p

?



a~n = ln(n)n?1=2b(n)?s=2 + n? +1b(n)? s ln(n)2 +1L(n) 1= : Proof. This poof follows the line of the previous proof. Let N := d(3 + s) j ln j?1 ln(n)e and n := n?1. In view of (7.5), the series (7.6) converges for some large " > 0 provided that a2n  ln(n)n?1b(n)?s ; an  Tn n?1b(n)?s ln2(n): Consequently,

?



sup jm^ n (x) ? E m^ n (x)j = O a~n + nb(n)?1 a:s: x2D

Together with (7.7), we obtain the lemma. 2 14

Proof of Theorems 3.1, 3.2: Combining Theorem 2.1, Lemmas 7.3, 7.4 and

(7.3), we get Theorem 3.1 and Theorem 3.2. 2

At the end of this section, we prove Theorem 3.3 concerning -mixing sequences. Lemma 7.5. Let the assumptions of Theorem 3.3 be satised. Then

sup jm^ n(x) ? r(x)f (x)j = O (an) a:s: x2D

where

p

an = ln(n)n?1=2b(n)?=2 + vn; vn and  as in Theorem 3.3.

Proof. Let M; N; R; Ni ; Ri be as above. There exist independent random

vectors X~0 : : : X~2M ?1 having the same distribution (and dimension) as X~0 : : : X~2M ?1, respectively, where X~i = (XiN +1; YiN +1; : : :; XiN +N ; YiN +N )T for i = 0 : : : 2M ? 3; X~2M ?2 = (X2MN ?2N +1; Y2MN ?2N +1; : : : ; X2MN ; Y2MN )T ; X~2M ?1 = (XiN +1; YiN +1; : : :; XiN +N ; YiN +N )T : Let I1 be the set of indices of the Xi's which are components of X~0; X~2 : : : X~2M ?2, and I2 = f1 : : : ngnI1. Let fn g and fang be sequences of positive real numbers such that limn!1 n=b(n) = 0. Determine  , E1 : : : E , u1 : : :u as in the beginning of Section 6. We have max jm~ (u )j k=1::: n k ?1  M X ik l2I1 k=1::: i=0 



?1  M X ik l2I2 k=1:::

U I max jYlj  Tn + max

 max



i=0

U I max jYlj  Tn



(7.8)

+ kmax jm~ n (uk )j I lmax jYl j > Tn = An + Bn + Dn , say, =1::: =1:::n

where

?

?



?



Zlk : = n?1b(n)?s YlK (uk ? Xl )b(n)?1 ? E Yl K (uk ? Xl )b(n)?1 ; Uik : =

Ni X j =1

Ri

X Z2iN +j;k ; Uik := Z2iN +N +j;k : i

j =1

A multiple application of Lemma 1 in [29], leads to P fAn



> "a ng

?1  M X  P ik l2I1 k=1::: (

max

i=0



)

U I max jYlj  Tn > "an + 2M N 15

( ?1 M X  P ik k=1::: i=0 ( ?1 M  X  P ik k=1:::

)

jUik j  2T~n N + 2M N U > "an; imax =1:::n

  max



)

U I jUik j  2T~nN > "an + 2M N

  max

i=0  ; Y  ; : : :)T , U  := PNi Z  for all " > 0 where X~i = (XiN +1 iN +1 ik j =1 2iN +j;k ; ? ?  ?  Zlk := n?1b(n)?s YlK (uk ? Xl)b(n)?1 ? E Yl K (uk ? Xl )b(n)?1 ; T~n = n?1b(n)?s supt2Rs jK (t)j (Tn + E jY1 j). This technique is also used in the proof of Lemma 3.1 of [30]. Since the U0k ; : : :; UM ?1;k are independent, we can apply Theorem

5.1 and obtain P fAn

> "ang  C10



?n s exp



?C9" an 2 2



n?1b(n)? + "a

l

n N T~n

?1 

m

+ nN ?1

for all " > 0. Choose n = n?1 and N := (n2 ln(n)L(n))1=(r+1) such that X1 nN ?1 N < +1: n=1



N

(7.9)

P

Consequently, by (7.9), the series 1n=1 P fjAnj > "ang converges for some large " > 0 provided that a2n  ln(n)n?1 b(n)?; an  TnNn?1 b(n)?s ln(n): Now apply the Borel-Cantelli lemma to get An = O(an) a:s:, p an = ln(n)n?1=2b(n)?=2 + TnNn?1 b(n)?s ln(n). The same convergence rate holds true for Bn. Hence, by (7.8), max jm~ (u )j = O(an ) a:s: k=1::: n k Analogously to Lemma 6.2, we obtain sup jm~ n(x)j = O(an + nb(n)?1) = O(an ) a:s: x2D

and therefore Lemma 7.5. 2

Proof of Theorem 3.3: Theorem 3.3 is a consequence of Theorem 2.1(ii),

Lemma 7.5 and (7.3). 2

8. Proofs in the case of autoregression Let fk gk=1;2::: be a strictly stationary sequence of real random variables satisfying the autoregressive model (4.1) and condition G . Let f k g be the sequence of -mixing coecients of fk g. Then k = O(k ) holds for some  2 (0; 1). 16

Lemma 8.1. Let g be bounded on D . Assume that

< +1 and the condition J 0 and K(p) are fullled for some > 2 and some integer p  1. Then sup D 2

m X

x2D

i=1

!

?

E j j j



?

Wni(x) = O mn?2b(n)?s and sup E jWni (x)j = O n? b(n)? s+s



x2D

where Wni(x) = n?1 b(n)?s i+1 K ((x ? Xi )b(n)?1 ) ; Xi = ( i ; : : :;  i?q+1 )T . Proof. Since E Zi K ((x ? Xi )b(n)?1 ) = 0 and E ( ZiK ((x ? Xi )b(n)?1) Zj K ((x ? Xj )b(n)?1)) = 0 for i 6= j , we deduce D2

m X i=1

!

Wni (x)  n?2b(n)?2s 2

where

An(x) = D 2

m X i=1

m X i=1

D 2 Zi K

?

(x ? Xi )b(n)

?

g(Xi )K (x ? Xi)b(n)

?1 

?1 

!

+ 2An(x)

(8.1)

!

:

Analogously to Lemma 7.1, we obtain sup An(x) = O (mbs(n)) :

(8.2)

x2D

Moreover

sup D ZiK x2D

2

?

 (x ? Xi)b(n)?1

 E Zi  C2 sup 2

=

x2D O(bs (n)):

Z

?



K 2 (x ? t)b(n)?1 dt R (8.3) s

Combining (8.1) to (8.3), we get the lemma. 2 Proof of Theorem 4.1: The proof follows the line of the proof of Theorem 3.1 but Lemma 8.1 is applied instead of Lemma 7.1. 2

9. Acknowledgement The author wishes to thank the anonymous referees for their valuable hints.

References [1] Ango Nze, P. and Doukhan, P. (1993) Functional estimation for mixing time series (in French), C.R. Acad. Sci. Paris 317, Série I, 405-408 17

[2] Ango Nze, P. and Doukhan, P. (1995) Functional estimation for time series, I: Quadratic convergence properties, Math. Methods Statist. 5, 404-423 [3] Ango Nze, P. and Rios, R. (1995) Density estimation in the L1-norm for mixing processes (in French), C.R. Acad. Sci. Paris, t. 320, Série I, 1259-1262 [4] Bosq, D. (1996) Nonparametric statistics for stochastic processes. Estimation and prediction, Lecture Notes in Statistics 110. New York, Springer-Verlag [5] Collomb, G. (1984) Propriétés de convergence presque complète du prédicteur à noyau, Zeitschrift für Wahrscheinlichkeitsth. und verw. Geb. 66, 441-460 [6] Doukhan, P. (1994) Mixing: properties and examples, Lecture Notes in Statistics 85, Springer-Verlag [7] Eubank, R.L. (1988) Spline smoothing and nonparametric regression, Marcel Dekker [8] Fan, J. and Gijbels, I. (1996) Local Polynomial Modelling and Its Applications, Chapman & Hall [9] Fuk, D.Kh. and Nagaev, S.N. (1971) Probability inequalities for sums of independent random variables, Theor. Probab. Appl. 16, 643-660 [10] Györ, L.; Härdle, W.; Sarda, P.; Vieu, P. (1989) Nonparametric curve estimation from time series, Lecture Notes in Statistics 60, Springer-Verlag [11] Härdle, W. (1990) Applied Nonparametric Regression, Cambridge University Press [12] Härdle, W. and Luckhaus, S. (1984) Uniform consistency of a class of regression estimators, Ann. Statist. 12, 612-623 [13] Hart, J.D. (1991) Kernel regression estimation with time series errors, J. R. Statist. Soc. B53, 173-188 [14] Herrmann, E.; Gasser, T.; Kneip, A. (1992) Choice of bandwidth for kernel regression when residuals are correlated, Biometrika 79, 783-795 [15] Liebscher, E. (1995) Strong convergence of sums of '-mixing random variables, Math. Methods Statist. 4, 216-229 [16] Liebscher, E. (1996) Strong convergence of sums of -mixing random variables with applications to density estimation, Stoch. Proc. Appl. 65, 69-80 18

[17] Liebscher, E. (1999a) Asymptotic normality of nonparametric estimators under -mixing condition, Statistics & Probability Letters 43, 243-250 [18] Liebscher, E. (1999b) Convergence of the Gasser-Müller estimator under mixing condition, Preprint M11/99, TU Ilmenau. [19] Masry, E. and Tjøstheim, D. (1995) Nonparametric estimation and identication of nonlinear ARCH time series, Econometric Theory 11, 258-289 [20] Mokkadem, A. (1990) Study of risks of kernel estimators, Teoriya veroyatnostei i eyo primeneniya 35, 531-538 [21] Rio, E. (1995) The functional law of the iterated logarithm for stationary strongly mixing sequences, Ann. Probab. 23, 1188-1203 [22] Roussas, G.G. (1988) Nonparametric estimation in mixing sequences of random variables, J. Statist. Plann. Inference 18, 135-149 [23] Roussas, G.G. (1990) Nonparametric regression estimation under mixing conditions, Stoch. Proc. Appl. 36, 107-116 [24] Roussas, G.G.; Tran, L.T.; Ioannides, D.A. (1992) Fixed design regression for time series: asymptotic normality, J. Multivariate Analysis 40, 262-291 [25] Silverman, B.W. (1986) Density Estimation for Statistics and Data Analysis, Chapman & Hall, London [26] Tran, L.T. (1990) Kernel density estimation under dependence, Statistics & Probability Letters 10, 193-201 [27] Vieu, P. (1991) Quadratic errors for nonparametric estimates under dependence, J. Multiv. Analysis 39, 324-347 [28] Wu, J.S. and Chu, C.K. (1994) Nonparametric estimation of a regression function with dependent observations, Stoch. Processes Appl. 50, 149-160 [29] Yoshihara, K. (1978) Probability inequalities for sums of absolutely regular processes and their applications, Z. Wahrscheinlichkeitstheorie u. verw. Gebiete 43, 319-329 [30] Yu, B. (1993) Density estimation in the L1 norm for dependent data with applications to the Gibbs sampler, Ann. Statist. 21, 711-735 19