ON RECURSIVE ESTIMATION FOR LOCALLY ... - CiteSeerX

5 downloads 0 Views 426KB Size Report
moulines@tsi.enst.fr, [email protected], roueff@tsi.enst.fr. ⋆GET/Telecom Paris, CNRS LTCI. 46 rue Barrault, 75634 Paris Cedex 13, France. †Laboratoire ...
ON RECURSIVE ESTIMATION FOR LOCALLY STATIONARY TIME VARYING AUTOREGRESSIVE PROCESSES ERIC MOULINES? , PIERRE PRIOURET† , AND FRANC ¸ OIS ROUEFF?

[email protected], [email protected], [email protected] ? GET/Telecom

Paris, CNRS LTCI 46 rue Barrault, 75634 Paris Cedex 13, France † Laboratoire

de Probabilit´es Universit´e Paris VI Tour 56 - 3`eme ´etage, 4, Place Jussieu, 75252 Paris cedex 05, France

Abstract. This paper focuses on recursive estimation of locally stationary autoregressive processes. The stability of the model is revisited and uniform results are provided when the time-varying autoregression parameters belong to appropiate smoothness classes. An adequate normalization for the correction term used in the recursive estimation procedure allows for very mild assumptions on the innovations distributions. The rate of convergence of the pointwise estimates are shown to be minimax in β-Lipschitz classes for 0 < β ≤ 1. For 1 < β ≤ 2, this property does no longer hold; a bias reduction method is proposed for recovering the minimax rate. Finally, an asymptotic expansion of the estimation error is given, allowing both for an explicit asymptotic expansion of the mean-square error and for a central limit theorem.

1. Introduction 1.1. Locally stationary autoregressive models and recursive algorithms. Suppose that we have real-valued observations (X1,n , X2,n , . . . , Xn,n ) from a timevarying autoregressive model (TVAR) Xk,n =

d X

θi ((k − 1)/n)Xk−i,n + σ(k/n)k,n ,

k = 1, . . . , n,

i=1

where • θ = (θ1 , . . . , θd ) : [0, 1] → Rd is the local vector of autoregression coefficients, • {k,n }n≥1,1≤k≤n is a triangular array of unit variance strictly white noise (referred to as the innovations) independent of the initial condition X1−i,n , i = 1, . . . , d, Date: February 8, 2004. AMS 2000 MSC. Primary 62M10, 62G08, 60J27. Secondary 62G20. key words : locally stationary processes, nonparametric estimation, recursive estimation, timevarying autoregressive model. 1

ERIC MOULINES? , PIERRE PRIOURET† , AND FRANC ¸ OIS ROUEFF?

2

• σ : [0, 1] → R+ is the local innovation standard deviation. This recurrence equation may be more compactly written as Xk,n = θ Tk−1,n Xk−1,n + σk,n k,n ,

k = 1, . . . , n,

where Xk,n := [Xk,n Xk−1,n · · · Xk−d+1,n ]T , θ k,n := θ(k/n) = [θ1 (k/n) θ2 (k/n) . . . θd (k/n)]T

and σk,n = σ(k/n).

TVAR models have been used for modeling data which show some sort of nonstationary behavior in Rao (1970), Hallin (1978), Grenier (1983). We consider the approach intoduced in Dahlhaus (1997). In short, if θ and σ are smooth mappings, and if the inital condition correspond to the stationnary distribution of the (θ(0), σ(0))-AR model, then the TVAR model belongs to a larger class of models, namely the locally stationnary processes. Here we slightly widen this notion by using more general smoothness assumptions and initial conditions. The problem of estimating the mapping t 7→ (θ(t), σ(t)) is reminiscent of nonparametric curve estimation on a fixed design, a problem which has received a considerable attention in the literature. However the inhomogeneous dependence structure of the data raises some technical difficulties. A natural approach consist in using a stationary method on short overlapping segments of the time series (see Dahlhaus and Giraitis (1998)). An alternative approach, first investigated by Belitser (2000) for first order TVAR models consists in estimating the regression function recursively. More precisely, at a given time t ∈ (0, 1) only observations that have been observed before time t are used in the definition of the estimator : bn (t) = θ bn (t, X1,n , · · · , X[nt],n ), where [·] denotes the integer part. This approach is θ useful when the observations must be processed on line (see e.g. Ljung and Soderstr¨om (1983), Solo and Kong (1995), Kushner and Yin (1997)). We focus in this contribution on the Normalized Least Square algorithm (NLMS) which is a specific example of a recursive identification algorithm, defined as follows b0,n (µ) = 0 θ bk+1,n (µ) = θ bk,n (µ) + µ (Xk+1,n − θ bT (µ) Xk,n ) (1) θ k,n

Xk,n , 1 + µ|Xk,n |2

0≤k ≤n−1

where µ is referred to as the step-size and | · | denotes the Euclidean norm. At each iteration of the algorithm, the parameter estimates is updated by moving in the direction of the gradient of the instantaneous estimate (Xk+1,n − θ T Xk,n )2 of the local mean square error E[(Xk+1,n − θ T Xk,n )2 ]. The normalization (1 + µ|Xk,n |2 )−1 is a safeguard again large values of the norm of the regression vector and allows for very mild assumptions on the innovations (see (A1) below) compared with the LMS, which typically requires much stronger assumptions (see Priouret and Veretennikov (1995)). Extensions to more sophisticated iterative rule algorithm, are currently under investigation. We define a pointwise estimate of t 7→ θ(t) as a simple interpobk,n (µ), k = 1, . . . , n, i.e. lation of θ (2)

bn (t; µ) := θ b[tn],n (µ), θ

t ∈ [0, 1], n ≥ 1,

RECURSIVE ESTIMATION

3

bn (t; µ) only where [x] denotes the integer part of x. Observe that, for all t ∈ [0, 1], θ depends on X0,n , {Xl,n , l = 1, . . . , [tn]} and µ. Remark 1. Let us mention that an iterative estimator of the local innovation variance may be obtained using the following iterative equation   2 2 2 bT (µ)Xk,n )2 − σ σ bk+1,n =σ bk,n + µ (Xk+1,n − θ b k,n k,n . This estimator can be studied using similar techniques as those presented in this paper but it will not be treated here for sake of brevety. The convergence of the recursive estimator depends on three key properties: stability of the TVAR model, mixing and persistence of excitation. The aim of this paper is to establish these properties and then to exploit them for studying the NLMS algorithm from a classical non-parametric function estimation point of view. The paper is organized as follows. Section 2 is concerned with the stability properties of the model and their consequences upon mixing properties. In Section 3 a lower bound of the minimax rate is derived in appropriate smoothness classes. A general and independent result concerning the stability of a product of random non-negative symmetric matrices is established in Section 4. The stability of the NLMS algorithm is then obtained as an application of this result. A risk bound for bn (t; µ) is derived in Section 5. An asymptotic expansion of the error the estimator θ is then established and detailed in Section 6 in which a close look is given at the Mean Square Error (MSE). Two more important results are then obtain from this asymptotic expansion, namely, a bias corrected estimator (Section 7) and a Central Limit Theorem (CLT) (Section 8). 2. Exponential Stability of the TVAR model 2.1. Exponential stability of inhomogeneous linear control models. We preface the proof of the exponential stability of the TVAR by some simple results on linear control models whose proofs are omitted or postponed to Appendix A. The space of m1 × m2 matrices is embedded with the operator norm associated to the Euclidean norm, which we denote by (3)

|A| :=

sup

|Ax|.

x∈Rm2 ,|x|=1

Observe that for a column vector, its euclidean norm coincides with its operator norm. We are thus authorized to use the same notation for these two norms. For 1 p p any random variable (r.v.) Z in a normed space (Z, |·|), define kZkp := (E|Z| ) . Let us consider a sequence {Zk , k ≥ 0} of d × 1 random vectors satisfying the following inhomogeneous linear control equation (4)

Zk = Ak Zk−1 + Bk Uk ,

k ≥ 1,

where A = {Ak , k ≥ 1}, B = {Bk , k ≥ 1} are two deterministic sequences of respectively d × d and d × q matrices and {Uk , k ≥ 1} is a sequence of independent random q × 1 vectors. The pair (A, B) is said to be exponentially stable if there exist

4

ERIC MOULINES? , PIERRE PRIOURET† , AND FRANC ¸ OIS ROUEFF?

constants C > 0 and ρ ∈ (0, 1) such that (5)

k+m Y sup Al ≤ Cρm k≥0

for all m > 0

k≥1

l=k+1

with the convention that

Yk+m l=k

and B ? := sup |Bk | < ∞,

Al := Ak+m Ak+m−1 . . . Ak . We have

Proposition 1. Let p ∈ [1, ∞]. Suppose that (A, B) is exponentially stable. Then there exists a positive constant M only depending on C, ρ and B ? such that  kZk kp ≤ M U?p + kZ0 kp , k ∈ Z+ , where U?p := supk≥1 kUk kp . Exponential stability (5) of the linear control model (4) implies exponential forgetting of the initial condition in the following setting. Let (E, | · |E ) and (F, | · |F ) be two normed spaces, let m be a positive integer and p a non-negative real number. We denote by Li(E, m, F; p) the class of mappings φ : Em → F for which there exists λ1 ≥ 0 such that, for all (x1 , . . . , xm , y1 , . . . , ym ) ∈ E2m , (6) |φ(x1 , . . . , xm ) − φ(y1 , . . . , ym )|F ≤ λ1 (|x1 − y1 |E + . . . + |xm − ym |E ) (1 + |x1 |pE + |y1 |pE + . . . + |xm |pE + |ym |pE ). This implies that there exists λ2 ≥ 0 such that, for all (x1 , . . . , xm ) ∈ Em , (7)

+ . . . + |xm |p+1 |φ(x1 , . . . , xm )|F ≤ λ2 (1 + |x1 |p+1 E ). E

Denote by |φ|Li(p) the smallest λ satisfying (6) and (7). Li(E, m, F; p) is called the class of p-weighted Lipschitz mappings and |φ|Li(p) the p-weighted Lipschitz norm of φ. We now state the exponential forgetting property. In the sequel, for any integrable r.v. Z, EF [Z] denotes the expectation of Z conditionally to the σ-field F. Proposition 2. Let p ≥ 0. Assume that U?p+1 is finite and that (5) is satisfied for some C > 0 and ρ < 1. Let φ ∈ Li(Rd , m, R, p), where m is a positive integer. For all k ≥ 0 let us denote Fk := σ(Z0 , Ul , 1 ≤ l ≤ k). Then, there exist constant C1 and C2 (depending only on C, ρ, U?p+1 , B ? and p) such that, for all 0 ≤ l ≤ k and for all 0 = j1 < . . . < jm , F E k [Φk+r ] ≤ C1 m |φ|Li(p) (1 + ρr(p+1) |Zk |p+1 ), (8) F   E k [Φk+r ] − EFl [Φk+r ] ≤ C2 m ρr |φ|Li(p) 1 + |Zk |p+1 + EFl |Zk |p+1 , (9)  (10) EFk [Φk+r ] − E[Φk+r ] ≤ C2 m ρr |φ|Li(p) 1 + |Zk |p+1 + E[|Zk |p+1 ] , where Φj = φ(Zj+j1 , . . . , Zj+jm ) for all j ≥ 0.

RECURSIVE ESTIMATION

5

2.2. TVAR model: definitions and elementary properties. We now consider the TVAR model. For all t ∈ [0, 1] and for all ϑ = (ϑ1 , . . . , ϑd ) : [0, 1] → Rd , Θ(t, ϑ) is the companion matrix defined by   ϑ1 (t) ϑ2 (t) · · · · · · ϑd (t)  1 0 ··· ··· 0     0 1 0 ··· 0  (11) Θ(t, ϑ) =  .  .. ..   . .  0 ··· 0 1 0 For any A ∈ Md , where Md is the set of d × d complex matrices, denote |λ|max (A) its spectral radius that is |λ|max (A) := max{|λ|, λ ∈ sp(A)}, where sp(A) is the set of eigenvalues of A. We have Lemma 3. Let 0 < ρ0 < ρ and L a positive constant. Then there exists C > 0 such that, for all A ∈ Md with |A| ≤ L and |λ|max (A) ≤ ρ0 , and for all k ∈ Z+ , |Ak | ≤ Cρk . Proof. We apply (Dunford and Schwartz, 1958, Theorem VII.1.10). Let C = {z ∈ C : |z| = ρ}. Then, for any A such that |λ|max (A) ≤ ρ0 and for all k ∈ Z+ , Z 1 k A = λk (λ I − A)−1 dλ. 2π i C Hence, we have ρk |A | ≤ 2π k

Z

|(λ I − A)−1 | dλ.

C

Let G := {A ∈ Md : |A| ≤ L, |λ|max (A) ≤ ρ0 }. G is a compact set and (λ, A) 7→ |(λ I − A)−1 | is continuous over C × G so that it is uniformly bounded. Applying this in the last equation, we get the result.  For ϑ = (ϑ1 , . . . , ϑd ) : [0, 1] → Rd , define the time-varying characteristic polynoP mial ϑ(z; t) := 1 − dj=1 ϑj (t)z j . For ρ > 0, define n o (12) S(ρ) := ϑ : [0, 1] → Rd , ϑ(z; t) 6= 0 for all |z| < ρ−1 and for all t ∈ [0, 1] . For any β ∈ (0, 1], denote the β-Lipschitz semi-norm of a mapping f : [0, 1] 7→ Rl by |f |Λ,β = sup t6=s

|f (t) − f (s)| . |t − s|β

and define for 0 < L < ∞, the β-Lipschitz ball ( (13)

Λl (β, L) :=

)

f : [0, 1] → Rl , |f |Λ,β ≤ L, sup |f (t)| ≤ L . t∈[0,1]

ERIC MOULINES? , PIERRE PRIOURET† , AND FRANC ¸ OIS ROUEFF?

6

Proposition 4. Let β ∈ (0, 1], L > 0 and 0 < ρ < τ < 1. Then, there exists a constant M > 0 such that, for all ϑ ∈ Λd (β, L)∩S(ρ) and for all 0 ≤ k < k +m ≤ n, k+m Y (14) Θ(l/n, ϑ) ≤ M τ m . l=k+1

Proof. First note that Θ? = sup

sup |Θ(t, ϑ)| < ∞,

ϑ∈Λ∩S t∈[0,1]

where, within this proof section, Λ and S denote Λd (β, L) and S(ρ) for convenience. For any square matrices A1 , . . . , Ar , we have (15)

r Y

Ak =

Ar1 + (Ar − A1 )Ar−1 + Ar (Ar−1 − A1 )Ar−2 +· · ·+ 1 1

k=1

r Y

Ak (A2 − A1 )A1 .

k=3

By applying this decomposition to (16) βn (k, i; ϑ) :=

k Y

Θ(j/n, ϑ),

0 ≤ i < k ≤ n,

βn (i, i; ϑ) = I,

0 ≤ i ≤ n,

j=i+1

we have for all 0 ≤ k < k + q ≤ n, |βn (k + q, k; ϑ)| ≤ |Θ((k + 1)/n, ϑ)q | + q Θ? q−1 max |ϑ((k + j)/n) − ϑ((k + 1)/n)|, 1≤j≤q

Let us set ρ˜ ∈ (ρ, τ ). It follows from (11) that ϑ ∈ S ⇔ ∀t ∈ [0, 1], |λ|max (Θ(t, ϑ)) ≤ ρ. Since Θ? < ∞, by Lemma 3, we obtain, for some M > 0, sup

sup |Θq (t, ϑ)| ≤ M ρ˜q .

ϑ∈Λ∩S t∈[0,1]

Let q be the smallest integer such that M ρ˜q ≤ τ q /2. Observe that, for all n ≥ 1, sup max

max |ϑ((k + j)/n) − ϑ((k + 1)/n)| ≤ L(q/n)β .

ϑ∈Λ 1≤j≤q 0≤k≤n−q

Let N be the smallest integer such that, q Θ? q−1 L(q/N )β ≤ τ q /2. Hence, for all n ≥ N, sup max |βn (k + q, k; ϑ)| ≤ τ q . ϑ∈Λ∩S 0≤k≤(n−q)

Write m = sq +t, 0 ≤ t < q. For all n ≥ N, l ∈ {1, · · · , n−m}, and for all ϑ ∈ Λ∩S, |βn (l + m, l − 1; ϑ)| ≤ Θ? t

s−1 Y

|βn (l + (i + 1)q, l + i q; ϑ)| ≤ (1 + Θ? /τ )q τ m .

i=0

The proof follows. We formally define a TVAR(d) process as follows.



RECURSIVE ESTIMATION

7

Definition 5. Let θ : [0, 1] → Rd and σ : [0, 1] → R+ . Let {k,n , 1 ≤ k ≤ n} a triangular array of independent r.v. such that E[n,k ] = 0 and E[2k,n ] = 1. for all 1 ≤ k ≤ n. We say that {Xk,n , 1 ≤ k ≤ n} is a TVAR(d) process with initial conditions {X0,n , n ≥ 1} if {X0,n , n ≥ 1} and {k,n , 1 ≤ k ≤ n} are independent and if (17)

Xk+1,n = Θ(k/n, θ)Xk,n + σ k+1,n k+1,n ,

0 ≤ k ≤ n − 1,

where Xk,n = [Xk,n · · · Xk−d+1,n ]T and σ k,n = [σ(k/n) 0 · · · 0]T . We will denote by Pθ,σ the probability distribution of {X0,n , Xk,n , 1 ≤ k ≤ n} and by Eθ,σ the associated expectation. Moreover, for any r.v. Z ∈ (Z, | · |) which is measurable w.r.t to the σ-field generated by {X0,n , Xk,n , 1 ≤ k ≤ n}, we denote kZkθ,σ,p := (Eθ,σ (|Z|p ))1/p . Consider the following assumption. (A1) There exists q ≥ 1 such that sup kX0,n kq and ?q := n≥1

sup

kk,n kq are

k,n : 1≤k≤n

both finite. For β ∈ (0, 1], L > 0, 0 < ρ < 1, and 0 < σ− ≤ σ+ < ∞, let us denote (18)

C(β, L, ρ, σ− , σ+ ) := {(θ, σ) : θ ∈ Λd (β, L) ∩ S(ρ), σ : [0, 1] → [σ− , σ+ ]} .

We will write C in short when no confusion is possble. From Proposition 4 and Proposition 1, we get that, under (A1), (19)

sup

sup kXk,n kθ,σ,q < ∞.

(θ,σ)∈C 0≤k≤n

Eq. (14) and (19) are referred to as uniform exponential stability and uniform Lq boundedness respectively. The bound (19) may be extended to conditional moments as follows. Proposition 6. Assume (A1) with q ≥ 1 and let p ∈ [1, q]. Let β ∈ (0, 1], L > 0, 0 < ρ < τ < 1 and 0 < σ− ≤ σ+ . Then, there exists a constant M such that, for all (θ, σ) ∈ C(β, L, ρ, σ− , σ+ ) and for all 1 ≤ k ≤ l ≤ n, Fk,n (20) Eθ,σ [|Xl,n |p ] ≤ M (1 + τ (l−k) |Xk,n |p ), where Fk,n = σ (X0,n , Xj,n , 1 ≤ j ≤ k). Proof. Eq. (14) is satisfied by Proposition 1. Then under (A1), we apply Proposition 2 with φ(x) = |x|p , since φ ∈ Li(Rd , 1, R; p − 1). Eq. (20) follows from (8).  3. Minimax lower bound For all β > 1 the β-Lipschitz balls are generalized as follows. Let k ∈ Z+ and α ∈ (0, 1] be uniquely defined by β = k + α. Then we define n o (21) Λl (β, L) := f : [0, 1] → Rl , |f (k) |Λ,α ≤ L, |f (0)| ≤ L . This corresponds to β-H¨older balls when β ∈ / Z+ . Consider the following assumption.

ERIC MOULINES? , PIERRE PRIOURET† , AND FRANC ¸ OIS ROUEFF?

8

(A2) For all n and for all k, 1 ≤ k ≤ n, the distribution of k,n is absolutely continuous w.r.t. the Lebesgue measure with probability density function (p.d.f.) pk,n . In addition, for all 1 ≤ k ≤ n, pk,n is absolutely continuous, its derivative p˙k,n satisfies "   2 # p˙k,n p˙k,n E (k,n ) = 0 and I := sup E (k,n ) < ∞. pk,n pk,n 1≤k≤n We have the following result. Theorem 7. Assume (A1) with q = 2 and (A2). Let β > 0, L > 0, ρ ∈ (0, 1), 0 < σ− ≤ σ+ . Then there exists α > 0 such that, for all n ≥ 1, for all t ∈ [0, 1], and bn := δ bn (X0,n , X1,n , . . . , Xn,n ) ∈ Rd , for all estimator δ −2β

(22)

inf

bn , t)u ≥ α n 1+2β . sup uT MSEθ,σ (δ

|u|=1 (θ,σ)∈C

h i bn , t) := Eθ,σ (δ bn − θ(t))(δ bn − θ(t))T is where C := C(β, L, ρ, σ− , σ+ ) and MSEθ,σ (δ the MSE. bn , (22) simply means that, for all estimator δbn (X0,n , X1,n , . . . , Xn,n ) ∈ Proof. By writing δbn := uT δ R and for all u = (u1 , . . . , ud ) ∈ Rd such that |u| = 1,  2  −2β b T sup Eθ,σ δn − u θ(t) ≥ α n 1+2β . (θ,σ)∈C

Write β = bβc + β˜ where bβc is the largest integer strictly smaller than β. Let φ : R → R be a C ∞ symmetric function decreasing on R+ such that φ(0) = 1, φ(u) = 0 for all |u| ≥ 1 and (23)

sup x,y∈R,x6=y

|φ(bβc) (x) − φ(bβc) (y) |x − y|β˜

≤ 1.

Let λ : R → R+ be a C 1 p.d.f. with respect to the Lebesgue measure vanishing outside the segment [−1, +1] and such that  Z 1 0 λ (x) 2 (24) λ(x) dx < ∞. λ(x) −1 Let (vn ) and (wn ) be two non-decreasing sequence of positive numbers to be precised later such that  (25) lim vn−1 + wn−1 + n−1 wn = 0 and sup vn−1 wnβ ≤ L. n→∞

n≥0

Let t ∈ [0, 1] and u ∈ Rd such that |u| = 1. Define φn,t : [0, 1] → Rd , s 7→ φn,t (s) := φ((s − t)wn )u and let σ : [0, 1] → R+ , s 7→ σ(s) = σ+ . For η ∈ R and n ≥ 0, define the one-dimensional parametric family {Pηφn,t ,σ , η ∈ R}. In other words, for a given n and a given λ, we consider the model {X0,n , X1,n , . . . , Xn,n } defined by initial conditions X0,n , innovations {k,n , 1 ≤ k ≤ n} and by the iterative relation Xk+1,n = ηφ((k/n − t)wn )uT Xk,n + (σ+ ) k+1,n ,

0 ≤ k ≤ n − 1.

RECURSIVE ESTIMATION

9

From (A2), it thus follows that, for all 0 ≤ k ≤ n − 1, x 7→ pk+1,n ((x − ηφ((k/n − t)wn )uT y)/σ+ )/σ+ is the conditional density of Xk+1,n given Xk,n = y and parameter η. Since the distribution of X0,n does not depend on η and using the conditionnal densities above to compute the joint density of {X0,n , X1,n , . . . , Xn,n }, the Fisher information of this model reads  !2  n X p˙k,n −2 In (η) = σ+ Eηφn,t ,σ  φ((k/n − t)wn ) uT Xk−1,n (k,n )  . pk,n k=1

Now, under (A2), the summand in this equation is a martingale increments sequence whose variances are bounded by I (φ((k/n − t)wn ))2 Eηφn,t ,σ [(uT Xk−1,n )2 ]. Since |u| = 1, (uT X)2 ≤ |X|2 , and we finally obtain (26)

−2 In (η) ≤ σ+ I

n X

(φ((k/n − t)wn ))2 Eηφn,t ,σ [|Xk−1,n |2 ].

k=1

From (23) and (25), for all η ∈ [−vn−1 , vn−1 ], |ηφn,t |Λ,β ≤ vn−1 wnbβc

|φ(bβc) ((s0 − t)wn ) − φ(bβc) ((s − t)wn )|

sup

|s0

0≤s 0 and n sufficiently large without further assumptions. This clearly follows from the proof : since the distribution of X0,n only depends on (θ(0), σ(0)) and since, for t > 0 and n sufficiently large (ηφn,t (0), σ(0)) = (0, σ+ ) does not depend on η, the computation of In (η) applies and the proof goes on similarly. 4. Exponential stability of the recursive algorithm 4.1. Error decomposition. When studying recursive algorithms of the form (1) it is convenient to rewrite the original recursion in terms of the error defined by bk,n (µ) − θ k,n , 0 ≤ k ≤ n. Let us denote, for all ν ≥ 0 and for all x ∈ Rd , δk,n := θ (28)

Lν (x) :=

x 1 + ν |x|2

and Fν (x) := Lν (x)xT .

The tracking error process {δk,n , 0 ≤ k ≤ n} obeys the following sequence of linear stochastic difference equations. For all 0 ≤ k < n, (29)

δk,n = δu k,n + δv k,n + δw k,n ,

(30)

δu k+1,n := (I − µ Fµ (Xk,n )) δu k,n ,

(31)

δv k+1,n := (I − µ Fµ (Xk,n ))δv k,n + µ Lµ (Xk,n ) σk+1,n k+1,n ,

δv 0,n = 0,

(32)

δw k+1,n := (I − µ Fµ (Xk,n )) δw k,n + (θ k,n − θ k+1,n )

δw 0,n = 0.

δu 0,n = −θ 0 ,

{δu k,n } takes into account the way the successive estimates of the regression coefficients forget the initial error. Parallel to classical non-parametric regression estimation, δw k,n plays the role of a bias term due to the variation of θ (this term cancels when t 7→ θ(t) is constant), while δv k,n is a stochastic disturbance. The transient term simply writes, for all 0 ≤ k < n, (33)

δu k+1,n = Ψn (k, −1; µ) δu 0 ,

RECURSIVE ESTIMATION

11

where, for all ν ≥ 0 and for all −1 ≤ j < k ≤ n, (34)

Ψn (j, j; ν) := I

and Ψn (k, j; ν) =

k Y

(I − ν Fν (Xl,n )) .

l=j+1

Let us finally define the following increment processes, for all 0 ≤ k < n, ξw k,n := θ k,n − θ k+1,n

(35)

and ξv k,n := µ Lµ (Xk,n ) σk+1,n k+1,n .

According to these definitions, {δv k,n }1≤k≤n and {δw k,n }1≤k≤n obey a generic sequence of inhomogeneous stochastic recurrence equations of the form (36)

?

δ

k+1,n

?

= (I − µ Fµ (Xk,n )) δ

k,n

?



k,n

=

k X

Ψn (k, i; µ) ξ? i,n ,

0 ≤ k < n.

i=0

In view of (33) and (36), it is clear that the stability of the product of random matrices Ψn (k, i; µ) plays an important role in the limiting behavior of the estimation error. Properties of products of random matrices are of interest in a wide range of fields. Seminal results concerning this problem have been obtained in Guo (1994). However they are rather difficult to apply in the context of inhomogeneous Markov chains with explicitly uniform bounds. In the following section we have revisited such results by new technical means in a way that makes completely transparent its application to the TVAR model. 4.2. Stability of a product of random matrices. We denote by λmax (A) and by λmin (A) the maximum eigenvalue and the minimum eigenvalue of the matrix A, respectively We let M+ d denote the space of d × d positive semi-definite symmetric matrices. Theorem 8. Let (Ω, F, P, {Fn : n ∈ Z+ }) be a filtered space. Let {φk , k ≥ 0} and {Vk , k ≥ 0} be non-negative adapted processes such that Vk ≥ 1 for all k ≥ 1. For any µ ≥ 0, let A(µ) := {Ak (µ), k ≥ 0} be an adapted M+ d -valued process. Let r be a strictly positive integer, and let R1 and µ1 be strictly positive constants. Assume that (a) there exist λ < 1 and B ≥ 1 such that, for all k ∈ Z+ , EFk [Vk+r ] ≤ λ Vk I(φk > R1 ) + B Vk I(φk ≤ R1 ). (b) there exists α1 > 0 such that, for all k ∈ Z+ and for all µ ∈ [0, µ1 ], " !# k+r X Fk E λmin Al (µ) ≥ α1 I(φk ≤ R1 ), l=k+1

(c) for all k ∈ Z+ and for all µ ∈ [0, µ1 ], |µ Ak (µ)| ≤ 1 (d) there exists s > 1 and C1 > 0 such that, for all k ∈ Z+ and for all µ ∈ [0, µ1 ], I(φk ≤ R1 )

k+r X l=k+1

EFk [|Al (µ)|s ] ≤ C1 ,

12

ERIC MOULINES? , PIERRE PRIOURET† , AND FRANC ¸ OIS ROUEFF?

Then, for any p ≥ 1, there exists C0 > 0, δ0 > 0 and µ0 > 0 only depending on the constants above such that, for all µ ∈ [0, µ0 ] and for all integer n ≥ 1, p # " n Y F0 E (I − µAi (µ)) ≤ C0 e−δ0 µ n V0 . i=1

For convenience, we write Ek for EFk [·], Pk for PFk and Ak for Ak (µ). The assumptions of Theorem 8 hold true throughout this section. The proof of this result is prefaced by two technical lemmas, valid under (a)-(d). In the sequel µ0 and δ are some constants only possibly depending on r, R1 , µ1 , α1 , C1 , λ, B and s. Lemma 9. For any a ≥ 1, there exist δ > 0 and µ0 > 0 such that, for all k ∈ Z+ and for all µ ∈ [0, µ0 ], a ) ( k+r Y (37) I(φk ≤ R1 ) Ek (I − µAi ) ≤ e−δµ . i=k+1

Proof. Under (c), we have |I − µAk | ≤ 1 for all k ∈ Z+ and for all µ ∈ [0, µ1 ] (see Lemma 32) so that we may assume a = 1 without loss of generality. We write k+r Y

(38)

(I − µ Al ) = I − µ Dk + Sk ,

l=k+1

where Dk :=

k+r X

Al

and Sk :=

r X j=2

l=k+1

(−1)j µj

X

Ak+ij · · · Ak+i1 .

1≤i1 µ P s µsβ k+r l=k+1 Ek [|Al | ], which, among with (d), implies (39).

RECURSIVE ESTIMATION

13

For all j = 2, . . . , r, for 1 ≤ i1 < . . . < ij ≤ r, using (c), we have, all ordered j-uplet for all µ ∈ [0, µ1 ], µj Ak+i1 · · · Ak+ij ≤ µ2 |Ak+i1 Ak+i2 |. Hence, for some constant M1 only depending on r, Ek [|Sk |] ≤ M1 µ2 sup1≤i 0 and γ > 0 such that, for all n ≥ 0,   E0 e−α0 Nn ≤ e−γ n α0 V0 . Proof. Observe that, using (a), for all k ∈ Z+ , (41)   Ekr V(k+1)r ≤ Vkr [λI(φkr > R1 ) + BI(φkr ≤ R1 )] = Vkr λI(φkr >R1 ) B I(φkr ≤R1 ) . Let {Wkr , k ∈ Z+ } be the process defined by  k  N(k−1)r 1 λ W0 := V0 and Wkr = Vkr , k ≥ 1. λ B From the previous inequality, since Nkr is Fkr -measurable and using (41), we have, for all k ∈ Z+ ,  k  Nkr −I(φkr ≤R1 )  k+1  Nkr λ 1 λ 1 Ekr [V(k+1)r ] ≤ Vkr = Wkr . Ekr [W(k+1)r ] = λ B λ B Hence, by induction, we get that E0 [Wkr ] ≤ W0 = V0 . Since Vk ≥ 1 for all k, we have for all k ≥ 0, "  # # "  k+1  Nkr λ λ Nkr 1 E0 V(k+1)r ≤ λk+1 E0 [W(k+1)r ] ≤ λk+1 V0 . ≤ λk+1 E0 B λ B The proof follows.



Proof of Theorem 8. From (c), for all k ∈ Z+ and for all µ ∈ [0, µ1 ], p (k+1)r Y (42) (I − µAl ) ≤ l=kr+1 p     (k+1)r Y e−(δ/2) µI(φkr ≤R1 ) (I − µAl ) e(δ/2) µ I(φkr ≤ R1 ) + I(φkr > R1 ) ,   l=kr+1 ‡

Note that Lemma 33 is derived for real valued r.v. Its application to matrices relied on the P following elementary fact. For any matrices A and B, |A B| ≤ M i,j,k |Ai,j Bj,k |, where M only depends on the matrix sizes and, of course, |Ai,j | ≤ |A| for all (i, j) since | · | denotes the operator norm. This argument will repeatedly used in the sequel.

14

ERIC MOULINES? , PIERRE PRIOURET† , AND FRANC ¸ OIS ROUEFF?

where δ is defined in Lemma 9. Let n = mr+t, where m ∈ Z+ and t = 0, 1, . . . , r−1. Eq. (42) and the Cauchy-Schwarz inequality show that p n Y 1/2 1/2 E0 (I − µAl ) ≤ π1 π2 , l=1

where h i π1 = E0 e−δµNn ,  2p   m−1  Y  (k+1)r Y δµ . e I(φ ≤ R ) + I(φ > R ) π2 = E0  (I − µA ) 1 1 kr kr l   k=0

l=kr+1

Let U0 = 1 and define recursively Uk+1 for k = 0, 1, . . . ,  2p   (k+1)r  Y Uk+1 := (I − µAl ) eδ µ I(φkr ≤ R1 ) + I(φkr > R1 ) Uk .   l=kr+1 Applying Lemma 9 with a = 2p, we obtain that (Uk , Fkr ) is a super-martingale. Consequently π2 ≤ 1. Lemma 10 and Jensen’s inequality show that, for all µ ∈ [0, α0 /δ], δµ/α0 ≤ e−γδµn V0 , E0 [e−δµNn ] ≤ E0 [e−α0 Nn ] which concludes the proof.



4.3. Application to the TVAR model. We now apply Theorem 8 to the product Ψn (k, j; µ) defined in (34) for 0 ≤ j ≤ k < n and µ ≥ 0. To this end we need the following preliminary result, referred to as the uniform persistence of excitation in the control theory literature. Proposition 11. Assume (A1) with q ≥ 4. Let β ∈ (0, 1], L > 0, ρ ∈ (0, 1) and 0 < σ− ≤ σ + . For all R1 , C1 > 0, there exists r0 ≥ 1 such that, for all (θ, σ) ∈ C(β, L, ρ, σ− , σ+ ), for all r ≥ r0 , for all n ≥ r, for all k = 0, . . . , n − r and for all ν ∈ (0, 1], " !# k+r X Fk,n Fν (Xl,n ) ≥ C1 I(|Xk,n | ≤ R1 ), (43) Eθ,σ λmin l=k+1

where Fν is defined in (28). In addition, there exist constants δ > 0 and µ0 > 0, such that, for all (θ, σ) ∈ C(β, L, ρ, σ− , σ+ ), for all d ≤ k ≤ n and for all ν ∈ [0, µ0 ], (44)

|I − ν Eθ,σ [Fν (Xk,n )]| ≤ 1 − δν.

Remark 3. From now on, for convenience, we let the same M , the same µ0 and the same δ denote different positive constants depending neither on the time indices i, j, k, n, . . ., the step-size µ nor on the parameter (θ, σ). Proof. For any symmetric matrix A, we have |λmin (A)| ≤ |λ|max (A) = |A − B| (recall that |·| denotes the operator norm) and λmin (A) = inf |x|=1 xT Ax. From the last assertion, it follows that for any symmetric matrix B having same size as A,

RECURSIVE ESTIMATION

15

λmin (A + B) ≥ λmin (A) + λmin (B). Applying these elementary facts, we get, for all 0 ≤ k < k + r ≤ n,   k+r X λmin  Fν (Xj,n ) ≥ j=k+1

 λmin 



k+r X



F

k,n [Fν (Xj,n )] + λmin  Eθ,σ

k+r X

 F

k,n [Fν (Xj,n )]) ≥ (Fν (Xj,n ) − Eθ,σ

j=k+1

j=k+1 k+r X

λmin

j=k+1



k+r  X Fk,n F k,n Eθ,σ [Fν (Xj,n )] − (Fν (Xj,n ) − Eθ,σ [Fν (Xj,n )]) . j=k+1

d From its definition in (28), Fν (x) ∈ M+ d for all x ∈ R and for all ν ≥ 0, and

sup |Fν |Li(1) < ∞

(45)

ν≥0

Applying Lemma 30 and Lemma 31, we obtain that there exist C, M > 0 such that, for all (θ, σ) ∈ C and for all 0 ≤ k < k + r ≤ n,    k+r X √ Fk,n λmin  Eθ,σ Fν (Xj,n ) ≥ C (1 + |Xk,n |4 )−1 (r − d) − M r (1 + |Xk,n |4 )1/2 . j=k+1

Now pick two positive numbers R1 and C1 . If |Xk,n | > R1 , Eq. (43) is clearly satisfied for all 0 ≤ k < k + r ≤ n. If |Xk,n | ≤ R1 , the last equation implies that, for all 0 ≤ k < k + r ≤ n,    k+r X √ r−d Fk,n λmin  − M r (1 + R14 )1/2 . Eθ,σ Fν (Xj,n ) ≥ C 1 + R1 j=k+1

We may thus find r0 such that the RHS of this inequality is larger than or equal to C1 for all r ≥ r0 . This achieves the proof of (43). From the uniform L2 -boundedness and (45) we get that there exists M such that, for all (θ, σ) ∈ C, for all ν ∈ [0, 1], for all d ≤ k ≤ n, |Eθ,σ [Fν (Xk,n )]| ≤ M . Thus, using Lemma 32, for all ν ∈ [0, 1/M ], for all d ≤ k ≤ n, and for all (θ, σ) ∈ C, |I − ν Eθ,σ [Fν (Xk,n )]| = 1 − ν λmin (Eθ,σ [Fν (Xk,n )]). and the proof of (44) follows from Lemma 30.



Theorem 12. Assume (A1) with q ≥ 4. Let β ∈ (0, 1], L > 0, 0 < ρ < 1, and 0 < σ− ≤ σ+ . Then for all p ≥ 1, there exist constants M, δ > 0 and µ0 > 0, such that, for all 0 ≤ j ≤ k ≤ n, for all ν ∈ [0, µ0 ] and for all (θ, σ) ∈ C(β, L, ρ, σ− , σ+ ), (46)

kΨn (k, j; ν)kθ,σ,p ≤ M (1 − δν)k−j .

Proof. We apply Theorem 8 to the sequence {(Al = Fν (Xj+1+l,n ), Fj+1+l,n ), l = 0, . . . , k − j − 1} We thus have to show that conditions (a)–(d) of Theorem 8 hold with constants r, R1 , µ1 , α1 , C1 , λ, B and s which neither depend on (θ, σ) ∈ C nor on j, k, n.

ERIC MOULINES? , PIERRE PRIOURET† , AND FRANC ¸ OIS ROUEFF?

16

Condition (c). One checks easily, that, for all ν ≥ 0 and for all x ∈ Rd , |Fν (x)| = |x|/(1 + ν|x|), which yields (c). Condition (b). We set φl := |Xj+l,n |. Proposition 11 yields (b) by simply choosing r larger than or equal to r0 for a given choice of R1 and C1 which will be made later. Condition (a). Let τ ∈ (ρ, 1). From Proposition 6, there exists M such that, for all Fj,n 0 ≤ i < i + r ≤ n and for all (θ, σ) ∈ C, Eθ,σ [|Xi+r,n |] ≤ M (1 + τ r |Xi,n |); thus, for any R1 > 0, F

i,n Eθ,σ [1 + |Xi+r,n |] ≤ M τ r |Xi,n | + M + 1 ≤

(M τ r +

M +1 I(|Xi,n | > R1 ))(1 + |Xi,n |) + (M + 1)I(|Xi,n | ≤ R1 ). R1

Choose r1 and R1 so that M τ r1 + (M + 1)/R1 < 1 and let C1 > 0 to be chosen later. Now we set Vl := 1 + |Xj+l,n | so that, from the last equation, condition (a) is satisfied with λ := M τ r + (M + 1)/R1 < 1 and B = 1 + M , where r is any integer larger than or equal to r1 . Condition (d). From Lemma 34, supν≥0 | |Fν |q/2 |Li(q−1) < ∞. From Proposition 6 there exists M > 0 such that, for all ν ≥ 0, for all 0 ≤ i < i + l ≤ n and for all (θ, σ) ∈ C, i h Fi,n |Fν (Xi+l,n )|q/2 ≤ M (1 + τ ql ||Xi,n |q ). Eθ,σ Hence (d) is obtained with s = q/2 and C1 = M (1 + R1q ). Finally, we obtain, for some poisitive constants r, C0 , δ0 and µ0 , for all ν ∈ (0, µ0 ], for all (θ, σ) ∈ C and for all 0 ≤ j < k ≤ n such that n − j ≥ r := r0 ∨ r1 , h i Fj+1,n kΨn (k, j; ν)kpθ,σ,p = E Eθ,σ [Ψn (k, j; ν)] ≤ C0 e−δ0 ν n (1 + kXj+1,n kθ,σ,p ). The uniform boundedness (19) then yields (46) when n − j ≥ r. The restriction n − j ≥ r above is an edge effect because (a), (b) and (d) would not make sense for n − j < r. Now recall that (c) above implies |Ψn (k, j; ν)| ≤ 1. The result for n − j < r (implying k − j < r) follows by enlarging M adequately.  5. Risk bound of the NLMS estimator In this section, we derive risk bound for the recursive estimator. We will then deduce an appropriate choice of the step-size parameter depending on the smoothness parameter β. Theorem 13. Assume (A1) with q ≥ 4 and let p ∈ [1, q/3). Let β ∈ (0, 1], L > 0, 0 < ρ < 1, and 0 < σ− ≤ σ+ . Then, there exists M, δ > 0 and µ0 > 0 such that, for all µ ∈ (0, µ0 ], for all n ≥ 1, for all t ∈ (0, 1] and for all (θ, σ) ∈ C(β, L, ρ, σ− , σ+ ),   bn (t; µ) − θ(t)kp,θ,σ ≤ M |θ(0)| (1 − δ µ)tn + √µ + (nµ)−β . (47) kθ Proof. Similarly to (45), one easily shows that (48)

sup |Lν |Li(0) < ∞, ν≥0

RECURSIVE ESTIMATION

17

where L is defined in (28). Among with (45), from Lq -boundedness of {Xk,n , 1 ≤ k ≤ n}, we have (49)

F?q/2 := sup sup sup kFν (Xk,n )kθ,σ,q/2 < ∞,

(50)

L?q

(θ,σ)∈C ν≥0 0≤k≤n

:= sup sup sup kLν (Xk,n )kθ,σ,q < ∞. (θ,σ)∈C ν≥0 0≤k≤n

Applying (33) and (46), for all u ≥ 1, there exists M > 0 such that (51)

kδu k,n kθ,σ,u ≤ M (1 − δµ)k |θ(0)|

∀1 ≤ k ≤ n, ∀µ ∈ (0, µ0 ], ∀(θ, σ) ∈ C.

Define Ξ? n (k, k) = 0

and Ξ? n (k, j) :=

k−1 X

ξ? i,n ,

0 ≤ j < k ≤ n.

i=j

For all 1 ≤ j ≤ k ≤ n, we have Ψn (k − 1, j; µ) − Ψn (k − 1, j − 1; µ) = µ Ψn (k − 1, j; µ) Fν (Xj−1,n ); By integration by parts, for all 1 ≤ k ≤ n and for all µ ∈ (0, µ0 ], (36) reads (52) δ? k,n = Ψn (k − 1, 0; µ) Ξ? n (k, 0) + µ

k−1 X

Ψn (k − 1, j; µ)Fν (Xj−1,n ) Ξ? n (k, j).

j=1

By applying (46) and (49) and the H¨older inequality, we get that, for all u ∈ (0, q/2), there exists M > 0 such that, for all (θ, σ) ∈ C and for all 0 ≤ j ≤ k ≤ n, (53)

kΨn (k, j; µ) Fν (Xj−1,n )kθ,σ,u ≤ M (1 − δµ)k−j+1 .

We consider now the two terms δw k,n and δv k,n separately. We have, for all 0 ≤ j < k ≤ n, Ξw n (k, j) :=

(54)

k−1 X

ξw i,n = θ j,n − θ k,n ,

i=j

and thus, for all 0 ≤ j < k ≤ n, |Ξw n (k, j)| ≤ |θ|Λ,β n−β (k − j)β . Plugging (54), (46) and (53) into (52), we thus get that there exist M > 0, δ > 0 and µ0 > 0 such that, for all (θ, σ) ∈ C, for all 1 ≤ k ≤ n and for all µ ∈ (0, µ0 ],   k−1 X kδw k,n kθ,σ,u ≤ M (1 − δ µ)k (k/n)β + µ (1 − δ µ)k−j ((k − j)/n)β  . j=0

Hence, from Lemma 35 and choosing u = p, we obtain (55)

kδw k,n kθ,σ,p ≤ M |θ|Λ,β (µ n)−β

for all

δv .

1 ≤ k ≤ n, µ ∈ (0, µ0 ], (θ, σ) ∈ C.

We finally bound From Burkh¨older inequality (see (Hall and Heyde, 1980, Theorem 2.12)) and (50), we get, for all (θ, σ) ∈ C, for all ν ≥ 0 and for all 1 ≤ j ≤ k ≤ n



X

k

≤ M (k − j + 1)1/2 . (56) σi,n Lµ (Xi,n ) i+1,n



i=j θ,σ,q

ERIC MOULINES? , PIERRE PRIOURET† , AND FRANC ¸ OIS ROUEFF?

18

By Lemma 35, using (52), (56), and (53) and choosing u such that 1/p = 1/u + 1/q, we get, (57)

kδv k,n kθ,σ,p ≤ M



1 ≤ k ≤ n, µ ∈ (0, µ0 ], (θ, σ) ∈ C.

µ for all

Eq. (47) easily follows from (29), (51), (55) and (57) and Theorem 13 follows.



Corollary 14. For all η ∈ (0, 1) and for all α > 0, there exists M > 0 such that, for all (θ, σ) ∈ C(β, L, ρ, σ− , σ+ ), bn (t; αn−2β/(1+2β) ) − θ(t)kp,θ,σ ≤ M n sup kθ

β − 1+2β

.

t∈[η,1]

Theorem 7 shows that the Lp error rate in Corollary 14 is minimax for p ≥ 2 within the class C(β, L, ρ, σ− , σ+ ) if β ∈ (0, 1].

6. Perturbation expansion of the error 6.1. General approximation results. Observe that Theorem 13 only provides a bound of the risk. The approach developed in this section relies upon a perturbation (0) technique (see Aguech et al. (2000)). Decompose the LHS of (36) as δ? k,n = J? k,n + (0)

H? k,n , with (0)

(0)

(0)

J? k+1,n = (I − µ Eθ,σ [Fµ (Xk,n )]) J? k,n + ξ? k,n , (0)

J? 0,n = 0,

(0)

(0)

H? k+1,n = (I − µ Fµ (Xk,n ))) H? k,n + µ (Eθ,σ [Fµ (Xk,n )] − Fµ (Xk,n )) J? k,n ,

(0)

H? 0,n = 0,

(0)

Hence J? k,n satisfies a deterministic inhomogeneous first-order linear difference equation (58)

(0)

J? k+1,n =

k X

ψn (k, i; µ, θ, σ) ξ? i,n ,

i=0

where, for all ν ≥ 0, for all 0 ≤ i < k ≤ n, and for all (θ, σ), (59)

ψn (i, i; ν, θ, σ) = I

and ψn (k, i; ν, θ, σ) =

k Y

(I − ν Eθ,σ [Fν (Xj,n ))]) .

j=i+1

For ease of notations, when there is no possible confusion, we write Fk,n := Fµ (Xk,n ), Fk,n := Fk,n (µ) − Eθ,σ [Fk,n (µ)], E instead of Eθ,σ , Ψn (k, i) instead of Ψn (k, i; µ), ψn (k, i) instead of ψn (k, i; µ, θ, σ) and so on. This decomposition of the error process {δ? k,n , 1 ≤ k ≤ n} can be extended to further approximation order s > 0 as follows. (0)

(1)

(s)

(s)

δ? k,n = J? k,n + J? k,n + · · · + J? k,n + H? k,n ,

RECURSIVE ESTIMATION (r)

19

(s)

where the processes J? k,n , r ∈ {0, · · · , s}, and H? k,n are respectively defined as (60)

(0)

(0)

(r)

(r)

(0)

J? k+1,n = (I − µ E[Fk,n ]) J? k,n + ξ? k,n ,

J? 0,n = 0,

.. . (61)

(r−1)

(r)

J? k+1,n = (I − µ E[Fk,n ]) J? k,n + µ Fk,n J? k,n ,

J? l,n = 0, 0 ≤ l < r,

.. . (62)

(s)

(s)

(s)

H? k+1,n = (I − µ Fk,n ) H? k,n + µ Fk,n J? k,n ,

(s)

H? l,n = 0, l = 0, . . . , s.

(r)

The processes J? k,n depend linearly on ξ? k,n and polynomially in the error Fk,n . We now show that Jw(0) and Jv(0) , respectively defined by setting ξ? = ξw and ξ? = ξv in (58), are the main terms in the error terms δw and δv defined by (36). Proposition 15. Assume (A1) with q > 4 and let p ∈ [1, q/4). Let β ∈ (0, 1], L > 0, 0 < ρ < 1, and 0 < σ− ≤ σ+ . Then, there exist constants M and µ0 > 0, such that, for all (θ, σ) ∈ C(β, L, ρ, σ+ , σ− ), for all µ ∈ (0, µ0 ] and for all 1 ≤ k ≤ n, (0)

(63)

|Jw k,n | ≤ M (nµ)−β ,

(64)

kδw k,n − Jw k,n kp,θ,σ ≤ M

(0)



µ (nµ)−β .

Proof. From Proposition 11, there exist δ > 0, µ0 > 0 and M > 0 such that, for all 0 ≤ i ≤ k ≤ n, for all ν ∈ [0, µ0 ] and for all (θ, σ) ∈ C, |ψn (k, i; ν, θ, σ)| ≤ M (1 − δ ν)k−i .

(65)

Note that ψn (k, i − 1) − ψn (k, i) = −µ ψn (k, i) E[Fi,n ]. As in (52), write, for all 1 ≤ k ≤ n, (66)

(0)

Jw k,n = ψn (k − 1, 0) Ξw n (k, 0) + µ

k−1 X

ψn (k − 1, j) E[Fj−1,n ] Ξw n (k, j).

j=1

Using (54), (49) and Lemma 35 shows (63). From (60)–(62), δw − Jw(0) = Hw(0) = Jw(1) + Hw(1) , where, for all 1 ≤ k ≤ n, (67)

(1)

Jw k,n = µ

k−1 X

(0)

ψn (k − 1, j) Fj,n Jw j,n ,

j=0

(68)

(1)

Hw k,n = µ

k−1 X

(1)

Ψn (k − 1, j) Fj,n Jw j,n .

j=0 (0)

Set φj (x) = ψn (k − 1, j)Fµ (x)Jw j,n , j = 0, . . . , k − 1. Note that, from (45), (65) and (63), for all µ ∈ (0, µ0 ], for all 0 ≤ j < k ≤ n and for all (θ, σ) ∈ C, (0)

|φj |Li(1) ≤ |ψn (k − 1, j)| |Jw j,n | |Fµ |Li(1) ≤ M (1 − δµ)k−j (µn)−β .

20

ERIC MOULINES? , PIERRE PRIOURET† , AND FRANC ¸ OIS ROUEFF?

By applying Proposition 28 componentwise, we get, for all µ ∈ (0, µ0 ], for all 1 ≤ k ≤ n and for all (θ, σ) ∈ C,

 1/2

k−1

k−1 X

X

(0) −β 

(1 − δ µ)k−1−j  . ψn (k − 1, j) Fj,n Jw j,n

≤ M (µn)

j=0

j=0 q/2

Hence, for all µ ∈ (0, µ0 ], for all 1 ≤ k ≤ n and for all (θ, σ) ∈ C



w (1) (69)

J k,n ≤ M µ (µn)−β . q/3

Let u be such that 2/q + 2/q + 1/u = 1/p. Thus, by Theorem 12 and (49), for all µ ∈ (0, µ0 ], for all 1 ≤ k ≤ n and for all (θ, σ) ∈ C, (1)

(70) kHw k,n kp ≤ µ

k−1 X

(1)

kΨn (k − 1, j)ku kFj,n kq/2 kJw j,n kq/2 ,

j=1

≤ M µ3/2 (µn)−β

k−1 X

(1 − δµ)k−j ≤ M



µ (µn)−β .

j=1

 Proposition 16. Assume (A1) with q ≥ 7 and let p ∈ [1, 2q/11). Let β ∈ (0, 1], L > 0, 0 < ρ < 1, and 0 < σ− ≤ σ+ . Then, there exist constants M and µ0 such that for all (θ, σ) ∈ C(β, L, ρ, σ− , σ+ ), for all µ ∈ (0, µ0 ] and for all 1 ≤ k ≤ n, √ (0) (71) kJv k,n (θ, σ)kq,θ,σ ≤ M µ, (0)

kδv k,n − Jv k,n (θ, σ)kp,θ,σ ≤ M µ.

(72)

Proof. The Burkh¨older inequality (see (Hall and Heyde, 1980, Theorem 2.12)) shows that for all 1 ≤ k ≤ n, for all µ ∈ (0, µ0 ] and for all (θ, σ) ∈ C,  1/2  1/2 k−1 k−1 X X (0) (73) kJv k,n kq ≤ M µ σ+ L?q ?q  |ψn (k − 1, j)|2  ≤ M µ  (1 − δ µ)2  j=0

j=0

and (71) follows from (50) and (65). We now bound (1) Jv k,n



k−1 X

(0)

ψn (k − 1, j) Fj,n Jv j,n ,

1 ≤ k ≤ n.

j=0

Let us pick 2 ≤ k ≤ n. By plugging (58) with ξ? = ξw , we obtain X (1) (74) Jv k,n = µ2 φj (Xj,n ) γi,j (Xi,n ) σi+1,n i+1,n 0≤i 0, such that, for all (θ, σ) ∈ C ? , σ 2 (t) ≤ λmin (Σ(t; θ, σ)) |z|=1 t∈[0,1] |θ(z; t)|2

(83) δ ≤ inf

inf

σ 2 (t) ≤ M. 2 |z|=1 t∈[0,1] |θ(z; t)|

≤ λmax (Σ(t; θ, σ)) ≤ sup sup

Eq. (80) then follows from Lemma 32. Similarly, there exists M < ∞, such that, for all (θ, σ) ∈ C ? and all 0 ≤ s ≤ t ≤ 1, (84)

|Σ(t; θ, σ) − Σ(s; θ, σ)| ≤ M (t − s)β .

By Lemma 17, we get, for all 1 ≤ k ≤ l ≤ n and for all (θ, σ) ∈ C ? , (85) |Σ(l/n; θ, σ) − E[Xk,n XTk,n ]| ≤ |Σ(l/n; θ, σ) − Σ(k/n; θ, σ)|+ |Σ(k/n; θ, σ)) − E[Xk,n XTk,n ]| ≤ M (τ k + n−β (l − k + 1)β ). This is (81) and (82) with ν = 0. One easily shows that, for all x ∈ Rd and for all ν ≥ 0, |Fν (x) − xxT | ≤ ν |x|4 and |Lν LTν (x)| ≤ 2ν |x|4 . Since q ≥ 4, we deduce (81) and (82) for ν > 0 from (85) and uniform L4 boundedness. 

ERIC MOULINES? , PIERRE PRIOURET† , AND FRANC ¸ OIS ROUEFF?

24

Let ρ ∈ (0, 1) and let θ ∈ S(ρ) and σ : [0, 1] → R+ . Define the following sequence of recurrence equations applying to some increment process {ξ? k,n , 0 ≤ k ≤ n}. (86)

J˜? k+1,n (θ, σ) :=

k X

(I − µ Σ((k + 1)/n, θ, σ))k−j ξ? j,n ,

0 ≤ k ≤ n.

j=0

We now show that Jw(0) and Jv(0) may be approximated by J˜w and J˜v respectively defined by setting ξ? = ξw and ξ? = ξv in (86) and then we compute asymptotic equivalents of J˜w and of J˜v ’s variance respectively. Proposition 19. Assume (A1) with q ≥ 4. Let β ∈ (0, 1], L > 0, ρ < 1, and 0 < σ− ≤ σ+ . Then, there exist constants M > 0 and µ0 > 0 such that, for all (θ, σ) ∈ C ? (β, L, ρ, σ− , σ+ ), for all µ ∈ (0, µ0 ] and for all 1 ≤ k ≤ n, (87) (88)

(0)

|Jw k,n (θ, σ) − J˜w k,n (θ, σ)| ≤ M (µ n)−β ((µ n)−β + µ), √ (0) kJv k,n (θ, σ) − J˜v k,n (θ, σ)kq,θ,σ ≤ M µ ((µ n)−β + µ).

Proof. Let ∆n (k, j; ν, θ, σ) := ψn (k, j; ν, θ, σ) − (I − ν Σ(k/n, θ, σ))k−1−j for all 0 ≤ j ≤ k ≤ n, for all ν ≥ 0 and for all (θ, σ). Eq. (60) and (86) yield (89)

(0) J? k,n

− J˜? k,n =

k−1 X

∆n (k − 1, j)ξ? j,n ,

1 ≤ k ≤ n.

j=0

From (79), we get, for all 0 ≤ j < k ≤ n, for all µ ≥ 0 and for all (θ, σ) ∈ C ? , ∆n (k − 1, j) = µ

k−1 X

ψn (k − 1, i + 1) (E[Fi,n ] − Σ(k/n)) (I − µ Σ(k/n))i−j−1 .

i=j

Using (65), (80) and (81), there exists δ > 0, µ0 > 0 and M > 0, such that, for all 1 ≤ j < k ≤ n, for all µ ∈ (0, µ0 ] and for all (θ, σ) ∈ C ? (90)

|∆n (k − 1, j)| ≤ M µ (1 − δµ)k−1−j (τ j + n−β (k − j)β+1 + µ (k − j)).

We further write, for all 1 ≤ j < k ≤ n, ∆n (k − 1, j) − ∆n (k − 1, j − 1) = µ ∆n (k − 1, j) E[Fj−1,n ]+ µ (I − µ Σ(k/n))k−j−1 (E[Fj−1,n ] − Σ(k/n)). Applying (90), (80) and (81) and observing that F?1 is finite, we get that there exists δ > 0, µ0 > 0, τ ∈ (ρ, 1) and M > 0, such that, for all 1 ≤ j < k ≤ n, for all µ ∈ (0, µ0 ] and for all (θ, σ) ∈ C ? , (91) |∆n (k − 1, j) − ∆n (k − 1, j − 1)| ≤   M µ(1 − δµ)k−j−1 n−β (k − j)β (µ (k − j) + 1) + µ (µ (k − j) + 1) + τ j . Integrating (89) by parts gives, for all 1 ≤ k ≤ n, (0)

J? k,n − J˜? k,n = ∆n (k − 1, 0) Ξ? n (k, 0) +

k−1 X j=1

(∆n (k − 1, j) − ∆n (k − 1, j − 1)) Ξ? n (k, j).

RECURSIVE ESTIMATION

25

Using (54) and (56) to bound |Ξw n | and kΞv n kq respectively together with (90), (91) and Lemma 35, there exists δ > 0, µ0 > 0 and M such that, for all 1 ≤ k ≤ n, for all µ ∈ (0, µ0 ] and for all (θ, σ) ∈ C,   β  k−1 X j+1  (0) , |Jw k,n − J˜w k,n | ≤ M (µ n)−β ((µ n)−β + µ) + µ (1 − δµ)j τ k−j n j=0   k−1 X p √ (0) kJv k,n − J˜v k,n kq ≤ M  µ((µ n)−β + µ) + µ2 (1 − δµ)j τ k−j j + 1 . j=0

Using Lemma 35, we have, for any α ≥ 0, k−1 X

(1 − δµ)j τ k−j (j + 1)α ≤ (1 − τ )−1 sup (1 − δµ)j (j + 1)α ≤ C (1 − τ )−1 (δµ)−α . j∈Z+

j=0

We thus obtain (87) and (88).



Proposition 20. Let β ∈ (0, 1], L > 0, ρ < 1, and 0 < σ− ≤ σ+ and let (θ, σ) ∈ C ? (β, L, ρ, σ− , σ+ ). Let t ∈ (0, 1] and assume that there exists θ t,β ∈ Rd , L0 > 0 and β 0 > β such that, for all u ∈ [0, t], 0

|θ(u) − θ(t) − θ t,β (t − u)β | ≤ L0 (t − u)β .

(92)

Then, for all µ ∈ (0, µ0 ] and for all n ≥ 1, (93)   w J˜ [tn],n − β Γ(β) Σ−β (t) θ t,β ≤ M (µ n)−β 0 + (µ n)−β (n−β + (1 − δ µ)tn ) , (µ n)β where Γ is the Gamma function and where M and µ0 are positive constants only depending on β, L, ρ, σ− , σ+ , L0 and β 0 . Proof. Let us write J˜w [tn],n − Jn = [tn]−1 i   h X (I−µ Σ([tn]/n))[tn]−1−j ξw j,n − n−β ([tn] − j)β − ([tn] − 1 − j)β θ t,β , where, j=0

within this proof section, we denote [tn]−1

Jn := n

−β

X

  (I − µ Σ([tn]/n))[tn]−1−j ([tn] − j)β − ([tn] − 1 − j)β ) θ t,β .

j=0

Using (92), the partial sums of the terms between brackets in the next-to-last equation satisfy, for all 0 ≤ j ≤ [tn] − 1, [tn]−1 h   i X w −β β β ξ i,n − n ([tn] − i) − ([tn] − 1 − i) θ t,β = i=j 0 −β β 0 −β 0 θ(j/n) − θ([tn]/n) + n ([tn] − j) θ ([tn] − j)β . t,β ≤ L n

26

ERIC MOULINES? , PIERRE PRIOURET† , AND FRANC ¸ OIS ROUEFF?

Integration by parts with this bound and (80), and then Lemma 35 give that there exists a constant M such that, for all µ ∈ (0, µ0 ] and for all n ≥ 1, ˜w −β 0 . J − J [tn],n n ≤ M (µ n) Now, from (80) and using Lemma 35, we have, for all µ ∈ (0, µ0 ] and for all n ≥ 1, Jn − n−β Sβ (I − µ Σ([tn]/n)) θ t,β ≤ X M n−β (1 − δ µ)[tn] (1 − δ µ)l lβ−1 ≤ M (1 − δ µ)[tn] (µ n)−β , l≥1

P∞

 where Sβ (A) := i=0 Ai (i + 1)β − iβ . Using that A → Sβ (A) is a power series with unit radius of convergence, (80) and (84), from the mean value theorem and Lemma 35, we have, for all µ ∈ (0, µ0 ] and all n ≥ 1, X |Sβ (I − µ Σ([tn]/n)) − Sβ (I − µ Σ(t))| ≤ M µ n−β (1 − δ µ)i iβ ≤ M (µ n)−β . i≥1

Collecting the last three inequalities together with Lemma 35, we obtain (93).



Remark 6. We will see in the proof Theorem 23 that, for β = 1, (92) can be obtained from a Taylor expansion. However this asumption stays valid under much more general situations, for instance, if θ behaves as a sum of non-entire power laws at the left of point t, say a square root plus a linear term. Proposition 21. Assume (A1) with q ≥ 4. Let β ∈ (0, 1], L > 0, ρ < 1, and 0 < σ− ≤ σ+ . Then, there exist constants M > 0 and µ0 > 0 such that, for all (θ, σ) ∈ C ? (β, L, ρ, σ− , σ+ ), for all µ ∈ (0, µ0 ] and for all 1 ≤ k ≤ n, i h   σ 2 (k/n) vT v −β k ˜ ˜ J − µ (94) J I E ≤ M µ µ + (µ n) + (1 − δ µ) k,n θ,σ k,n 2 Proof. Since {ξv i,n , j ≥ 0} is a martingale increment sequence, we may write, for any 1 ≤ k ≤ n, k−1 h i X T 2 (I − µ Σ(k/n))k−1−j σj,n E[Lµ LTµ (Xj,n )] (I − µ Σ(k/n))k−1−j , E J˜v k,n J˜v k,n = µ2 j=0

=

2 µ σk,n Gk,n

+ Rk,n ,

where, for all 1 ≤ k ≤ n, Gk,n := µ

k−1 X

(I − µ Σ(k/n))k−1−j Σ(k/n) (I − µ Σ(k/n))k−1−j ,

j=−∞

|Rk,n | ≤ M µ2

k−1 X j=0

  (1 − δ µ)2(k−1−j) (τ j + n−β (k − j)β + µ) ≤ M µ µ + (µ n)−β ,

RECURSIVE ESTIMATION

27

and where, in the last bound, we have used (80), (82), σ ∈ Λ(β, L) and Lemma 35. From (80) and (83), we have, for all µ ∈ (0, µ0 ], for all 1 ≤ k ≤ n, −1 X −1 k k−1−j k−1−j (95) (I − µ Σ(k/n)) Σ(k/n) (I − µ Σ(k/n)) ≤ M µ (1 − δ µ) . j=−∞ From the previous bounds, we obtain, for all µ ∈ (0, µ0 ] and for all 1 ≤ k ≤ n, h i   T 2 (96) Gk,n ≤ M µ µ + (µ n)−β + (1 − δ µ)k E J˜v k,n J˜v k,n − µ σk,n Now, by definition of Gk,n , we have (I − µ Σ(k/n)) Gk,n (I − µ Σ(k/n)) + µ Σ(k/n) = Gk,n ,

1 ≤ k ≤ n,

which simplifies to (Σ(k/n) (I − 2Gk,n ) + (I − 2Gk,n ) Σ(k/n)) = 2 µ Σ(k/n) Gk,n Σ(k/n)

1 ≤ k ≤ n.

From (95), sup1≤k≤n |Gk,n | < ∞ uniformly over µ ∈ (0, µ0 ]. We thus have, for all µ ∈ (0, µ0 ] and for all 1 ≤ k ≤ n, |Σ(k/n) (I − 2Gk,n ) + (I − 2Gk,n ) Σ(k/n)| ≤ M µ. For any d×d matrix C and positive definite matrix Γ, the equation Γ B+B Γ = C has a unique solution B linear in C and continuous in Γ over the set of positive definite matrices (see (Horn and Johnson, 1991, Corollary 4.4.10)). Hence we obtain, for all µ ∈ (0, µ0 ] and for all 1 ≤ k ≤ n, |I − 2Gk,n (µ)| ≤ M µ, which, with (96), gives (94).  6.3. Mean square error approximation. Collecting all the approximation results above we obtain the following computational approximation of the pointwise bn . This result can serve as a basis for comparing the performance of θ bn MSE for θ with other estimators achieving the same rate of convergence. Theorem 22. Assume (A1) with q > 11. Let β ∈ (0, 1], L > 0, ρ < 1, and 0 < σ− ≤ σ+ and let (θ, σ) ∈ C ? (β, L, ρ, σ− , σ+ ). Let t ∈ (0, 1], θ t,β ∈ Rd , L0 > 0 and β 0 > β such that (92) holds true for all u ∈ [0, t]. Then there exists M > 0 and µ0 > 0 such that for all µ ∈ (0, µ0 ] and for all n ≥ 1, 1/2   σ 2 (t) −β −β b (97) MSEθ,σ θ n (t; µ) − β Γ(β) (µ n) Σ(t, θ, σ) θ t,β − µ I ≤ 2 √ √ √  0 µ ( µ + (µ n)−β/2 ) + (µ n)−β ((µ n)−β + (µ n)β−β + µ) . M Proof. We use the decomposition δ = δu + (δw − Jw(0) )) + (δv − Jv(0) ) + (Jw(0) − J˜w ) + (Jv(0) − J˜v ) + J˜w + J˜v . Let η ∈ (0, 1). Applying (51), (64), (72), (87) and (88), there exists M > 0 such that, for all µ ∈ (0, µ0 ] and for all n ≥ 1,   bn (t; µ) − θ(t) − J˜w [tn],n − J˜v [tn],n k2,θ,σ ≤ M √µ (µ n)−β + (µ n)−2β + µ . (98) kθ We thus obtain (97) from (98), (93) and using β-Lipschitz smoothness of σ.



28

ERIC MOULINES? , PIERRE PRIOURET† , AND FRANC ¸ OIS ROUEFF?

7. Bias correction : the case β ∈ (1, 2] Observe that, in Theorem 13 and its corollary, we are limited to β ≤ 1. It turns bn (t; αn−1/(1+2β) ) does not achieve the minimax rate anymore. out that for β > 1, θ In this section, we show how to derive a recursive estimator which achieves the bn thanks to an minimax rate in for β ∈ (1, 2]. This estimator will be deduced from θ explicit approximation of the asymptotic error of this estimator (see Theorem 23). Recall that C is defined in (18) and C ? in (77). We thus have C(β, L, ρ, σ− , σ+ ) ⊇ C ? (β, L, ρ, σ− , σ+ ) ⊇ C(β, L, ρ, σ+ , σ+ ), where the second inequality is obtained by noting that, for last set of parameters imply that σ is the constant function in the corresponding class. This inequality shows that Theorem 7 is still valid when replacing C by C ? . The object of this section is to build a recursive estimator which achieves the minimax rate in C ? (β, L, ρ, σ− , σ+ ) for β ∈ (1, 2]. This estimator will be obtained thanks to the following result. Theorem 23. Assume (A1) with q > 4 and let p ∈ [1, q/4). Let β ∈ (1, 2], L > 0, ρ ∈ (0, 1), and 0 < σ− ≤ σ+ . Then, for all η ∈ (0, 1), there exists M > 0 and µ0 > 0 such that, for all (θ, σ) ∈ C ? (β, L, ρ, σ− , σ+ ), for all t ∈ [η, 1], for all n ≥ 1 and for all µ ∈ (0, µ0 ],



b

−1 −1 ˙ (99) θ (t; µ) − θ(t) + (µ n) Σ (t, θ, σ) θ(t) ≤ M ( µ+(µ n)−β +(µ n)−2 ).

n p,θ,σ

Proof. We use that δ = δu + δv + (δw − Jw(0) ) + (Jw(0) − J˜w ) + J˜w . Observe that there exists C > 0 such that Λ(β, L) ⊆ Λ(1, C L). Hence we may apply (51), (57), (64) and (87), so that, there exists M and µ0 such that, for all µ ∈ (0, µ0 ]  bn (t; µ) − θ(t) − J˜w [tn],n kp,θ,σ ≤ M √µ + (µ n)−2 . (100) sup sup kθ (θ,σ)∈C ? t∈[η,1]

Now, since β > 1, for all (θ, σ) ∈ C ? and for all t ∈ (0, 1], we may apply the Taylor ˙ expansion θ(u) = θ(t) + θ(v)(u − t), where v ∈ [u, t], which yields |θ(u) − θ(t) + ˙θ(t) (t − u)| ≤ |θ(v) ˙ ˙ − θ(t)| |u − t| ≤ L |t − u|β−1 . Hence (92) holds (β = 1 and β 0 equal to the actual β) at every point t > 0 and we may apply Proposition 20 for computing J˜w Eq (99) is then easily obtained.  Choosing (θ, σ) ∈ C ? such that θ˙ does not vanish, Theorem 23 shows that, for bn is not minimax rate optimal (see Theorem 7). Here, applying a techβ > 1, θ nique inspired by the so-called Romberg method in numerical analysis (see Baranger (1991)), we propose a recursive estimator which achieves the minimax rates for b ·) β ∈ (1, 2]. This estimator is obtained by combining the recursive estimators θ(t, associated to two different step-sizes. More precisely, let   bn (t; µ) − γ θ bn (t; γ µ) . ˜ n (t; µ, γ) := 1 θ θ 1−γ where γ ∈ (0, 1). We obtain the following result. Theorem 24. Assume (A1) with q > 4 and let p ∈ [1, q/4). Let β ∈ (1, 2], L > 0, ρ < 1, and 0 < σ− ≤ σ+ . For all η ∈ (0, 1) and for all γ ∈ (0, 1), there exists M > 0

RECURSIVE ESTIMATION

29

and µ0 > 0 such that, for all (θ, σ) ∈ C ? (β, L, ρ, σ− , σ+ ) , for all n ≥ 1 and for all µ ∈ (0, µ0 ],



˜

(101) sup θ ≤ M ( µ + (µ n)−β + (µ n)−2 ). n (t; µ, γ) − θ(t) p,θ,σ

t∈[η,1]

Proof. Let η ∈ (0, 1), γ ∈ (0, 1) and t ∈ [η, 1]. One easily checks that  ˜ n (t; µ, γ) − θ(t) = (1 − γ)−1 θ bn (t; µ) − θ(t) − (µn)−1 Σ−1 (t, θ, σ)θ(t)− ˙ θ   bn (t; γµ) − θ(t) − (γµn)−1 Σ−1 (t, θ, σ)θ(t) ˙ γ θ The Minkowski inequality and Theorem 23 then give the result.



Corollary 25. For all η ∈ (0, 1) and for all α > 0, there exists M > 0 such that, for all (θ, σ) ∈ C ? (β, L, ρ, σ− , σ+ ),

˜

− β −2β/(1+2β) , γ) − θ(t) ≤ M n 1+2β . sup θ n (t; α n p,θ,σ

t∈[η,1]

8. Central limit theorem Theorem 26. Assume (A1) with q > 11. Let β ∈ (0, 1], L > 0, ρ < 1, and 0 < σ− ≤ σ+ and let (θ, σ) ∈ C ? (β, L, ρ, σ− , σ+ ). Let (µn ) be a positive sequence verifying   (102) lim µn + µ−(1+2β) n−2β = 0 n n→∞

Then, for all m ≥ 1 and for all (t1 , . . . , tm ) ∈ (0, 1]m , " T #T bT (tm ; µn ) − θ T (tm ) b (t1 ; µn ) − θ T (t1 ) √ −1/2 θ θ L n (103) 2µn ... n −→ N (0, I). σ(t1 ) σ(tm ) Proof. Observing that {ξv k,n , Fk+1,n , 1 ≤ k ≤ n} is a martingale increment sequence, we apply a martingale CLT. Let Zj,k,n := µ−1/2 (I − µ Σ(k/n))k−1−j ξv j,n ∈ Rd for all 0 ≤ j < k ≤ n. We have, for all µ ∈ (0, µ0 ] and for all 1 ≤ k ≤ n,   2 Lµ (Xj,n )LTµ (Xj,n )(I−µ Σ(k/n))k−1−j . EFj,n Zj,k,n ZTj,k,n = µ (I−µ Σ(k/n))k−1−j σj,n From Lemma 34 and (48), supν≥0 |Lν (·)LTν (·)|Li(1) < ∞. Hence Proposition 28, (A1) and (80) give that, for all µ ∈ (0, µ0 ] and for all 1 ≤ k ≤ n,

k−1

h i X

  √ Fj,n T −1 v vT ˜ ˜

E Zj,k,n Zj,k,n − µ E J k,n J k,n

≤ M µ.

j=0

q/2

Hence from (94) and since q > 2, for all u ∈ (0, 1],

X

k−1 2

  σ (u) Fj,n T

(104) lim sup E Zj,k,n Zj,k,n − I

= 0. n→∞ k:|u−k/n|≤(µ n)−1 2

n j=0 1

ERIC MOULINES? , PIERRE PRIOURET† , AND FRANC ¸ OIS ROUEFF?

30

Let η > 0. From the Markov inequality and using (A1) and (80), we have, for all µ ∈ (0, µ0 ] and for all 1 ≤ k ≤ n, k−1 X

k−1 X   EFj,n |Zj,k,n |2 I(|Zj,k,n | ≥ η) ≤ M η 2−q µq/2 (1 − δ µ)k−1−j |Lµ (Xj,n )|q .

j=0

j=0

Since sup E|Lµ (Xj,n )|q < ∞ (see (50)) and q > 2, we get 1≤j≤n

(105)



k−1

X

  Fj,n 2

lim sup E |Zj,k,n | I(|Zj,k,n | ≥ η)

→0 µ→0 k,n:1≤k≤n

j=0

1

Using (104) and (105), we may thus apply (Hall and Heyde, 1980, Corollary 3.1) and we obtain, for all u ∈ (0, 1], for (µn ) satisfying (102), [un]−1

(106)

µ−1/2 n

J˜v [un],n =

X



L

Zj,[un],n −→ N

j=0

Let us now define, for all 0 ≤ k−m ≤ k ≤ n, J˜v k,m,n :=

σ 2 (u) I 0, 2 k−1 X



(I−µ Σ(k/n))k−1−j ξv j,n .

j=k−m

Now observe that, from (80), for all 1 ≤ k −m ≤ k ≤ n and for all φ ∈ Li(Rd , 1, R; 0),     φ µ−1/2 J˜v k,n − φ µ−1/2 J˜v k,m,n ≤ k−m−1 X −1/2 m k−1−m−j v M |φ|Li(0) µ (1 − δ µ) (I − µ Σ(k/n)) ξ j,n . j=0 The Burkh¨older inequality (see (Hall and Heyde, 1980, Theorem 2.12)) shows that for all µ ∈ (0, µ0 ], for all 1 ≤ k − m ≤ k ≤ n and for all φ ∈ Li(Rd , 1, R; 0),

   

(107)

φ µ−1/2 J˜v k,n − φ µ−1/2 J˜v k,m,n ≤ M |φ|Li(0) (1 − δ µ)m . 2

For all 0 ≤ k − m ≤ k ≤ n, we may write where, for all (x0 , . . . , xm ) ∈ (Rd )m+1 , Υk,m,µ (x0 , . . . , xm ) :=

m−1 X

J˜v

k,m,n

= µ Υk,m,µ (Xk−m,n , . . . , Xk,n ),

(I − µ Σ(k/n))k−1−j Lµ (xj ) ([1 0 · · · 0] xj+1,n − θ Tj,n xj ).

j=0

Using Lemma 34, (48) and (80), Υk,m,µ ∈ Li(Rd , m+1, Rd ; 1) and there exists M and µ0 such that, for all µ ∈ [0, µ0 ] and for all 0 ≤ k − m ≤ k ≤ n, |Υk,m,µ |Li(1) ≤ M .   Thus, for all 0 ≤ k − m ≤ k ≤ n, for all φ ∈ Li(Rd , 1, R; 0) φ µ−1/2 J˜v k,m,n = φ ◦ (µ1/2 Υk,m,µ )(Xk−m,n , . . . , Xk,n ). From Lemma 34, φ ◦ (µ1/2 Υk,m,µ ) ∈ Li(Rd , m + 1, Rd ; 1) and there exists M and µ0 > 0 such that, for all 0 ≤ k − m ≤ k ≤ n and for all µ ∈ [0, µ0 ], |φ ◦ (µ1/2 Υk,m,µ )|Li(1) ≤ M |φ|Li(0) . From Proposition 2 and uniform

RECURSIVE ESTIMATION

31

exponential and L2 stability, we thus get, for all µ ∈ (0, µ0 ], for all 1 ≤ k1 ≤ k2 ≤ n and for all 0 ≤ m < k2 − k1 ,

h  i h  i

Fk1 ,n φ µ−1/2 J˜v k2 ,m,n − E φ µ−1/2 J˜v k2 ,m,n ≤ M |φ|Li(0) m τ k2 −m−k1 .

E 1

Li(Rd , 1, R; 0).

Let now ψ be a bounded mapping of We conclude from this bound and (107) that, for all 1 ≤ k1 < k2 ≤ n and for all 1 ≤ m < k2 − k1 ,      −1/2 ˜v −1/2 ˜v J J cov ψ µ , φ µ ≤ k1 ,n k2 ,n

h  i  

sup |ψ(x)| EFk1 ,n φ µ−1/2 J˜v k2 ,n − E[φ µ−1/2 J˜v k2 ,n ] x

m

≤ M sup |ψ(x)| |φ|Li(0) ((1 − δ µ) + m τ

1 k2 −m+k1

).

x

Set k1 = [t1 n] and k2 = [t2 n] and m = [(k2 − k1 )/2]. Then there exist M, δ, µ0 > 0 such that, for all n ≥ 1,      −1/2 ˜v v ˜ , φ µ J J ≤ cov ψ µ−1/2 [t2 n,],n [t1 n],n n n   M sup |ψ(x)| |φ|Li(0) e−δ(nµn )(t2 −t1 ) + n (t2 − t1 ) τ n(t2 −t1 )/2 → 0 as n → ∞, x

implying that J˜v [t1 n],n and J˜v [t2 n],n are asymptotically independent. Eq (103) follows by using (98) and (106).  Appendix A. Proof of Proposition 2 Define Hk,r : Rd → R by, for all x ∈ Rd , Hk,r (x) := E [φ([hk,r+j1 (x, Uk+1 , . . . , Uk+r+j1 ), . . . , hk,r+jm (x, Uk+1 , . . . , Uk+r+jm )])] , where, for all i, j ∈ Z+ and for all (x, u1 , . . . , uj ) ∈ Rd × Rqj , hi,j (x, u1 , . . . , uj ) = α(i + j, i) x +

j X

α(i + j, i + k) Bi+k uk ,

k=1

with α(l, l) = I

and α(l + m, l) :=

l+m Y

Ak ,

l ≥ 0, m ≥ 1.

k=l+1

We thus have, for all k, r ≥ 0, (108)

EFk [Φk+r ] = Hk,r (Zk ).

Using (5), observe that, for all i, j ∈ Z+ and for all (x, y, u1 , . . . , uj ) ∈ R2d × Rqj , |hi,j (x, u1 , . . . , uj ) − hi,j (y, u1 , . . . , uj | ≤ C ρj |x − y|, ! j X |hi,j (x, u1 , . . . , uj )| ≤ C ρj |x| + ρj−k B ? |uk | . k=1

ERIC MOULINES? , PIERRE PRIOURET† , AND FRANC ¸ OIS ROUEFF?

32

Using these bounds among with the Minkowski inequality, the definition of Hk,r above and the assumptions on φ, we easily obtain, for all (x, y) ∈ R2d , ! m X |Hk,r (x) − Hk,r (y)| ≤ c1 |φ|Li(p) ρ(r+ji )p |x − y| i=1

1+

m X

ρ(r+ji )p (|x|p + |y|p ) + 2

i=1

 |Hk,r (x)| ≤ c1 |φ|Li(p) 1 +

m X

ρ(r+ji )(p+1) |x|p+1 +

i=1

m X

r+j Xi

i=1

k=1

m X

r+j Xi

i=1

k=1

!p ! ρr+ji −k B ? U?p

,

!p+1  , ρr+ji −k B ? U?p+1

P where c1 is a constant only depending on C, ρ and p. Observing that jk=1 ρj−k ≤ P ji α ≤ 1/(1 − ρα ) for all j ≥ 1, for all 0 = j < . . . < j and for 1/(1 − ρ) and m 1 m i=1 ρ all α > 0, we obtain   (109) |Hk,r (x)| ≤ C1 |φ|Li(p) m 1 + ρr(p+1) |x|p+1 , (110)

|Hk,r (x) − Hk,r (y)| ≤ C1 m ρr |φ|Li(p) |x − y| (1 + ρrp (|x|p + |y|p )),

where C1 is a constant only depending on B ? , U?p+1 , C, ρ and p. Eq. (8) follows from (108) and (109). Observe now that, for any probability measure ζ of Rd , Z Z Hk,r (x) − Hk,r (y) ζ(dy) ≤ |Hk,r (x) − Hk,r (y)| ζ(dy) Z r |x − y| (1 + |x|p + |y|p ) ζ(dy), ≤ C1 m ρ where we used (110). The proof of (9)-(10) follows from (108), from |x − y| ≤ |x| + |y| and choosing ζ respectively equal to the conditionnal distribution of Zk given (Z0 , U1 , . . . , Ul ) and to the distribution of Zk . ¨ lder inequalities applied to the TVAR model Appendix B. Burkho Throughout this section we let β ∈ (0, 1], L, 0 < ρ < 1, 0 < σ− ≤ σ+ and we set C := C(β, L, ρ, σ− , σ+ ). We further let τ ∈ (ρ, 1). The following Lemma is adapted from (Dedecker and Doukhan, 2002, Proposition 4). Lemma 27. Let (Ω, F, P, {Fn ; n ∈ Z+ }) be a filtered space. Let p ≥ 2, p1 , p2 ∈ [1, ∞] −1 −1 and let {Z ; n ∈ Z+ } be an adapted sequence such that such that p−1 n 1 + p2 = 2p E[Zn ] = 0 and kZn kp < ∞ for all n ∈ Z+ . Then  1

n 2 n n

X X X



F

E i [Zj ]  . kZi kp1 (111) Zi ≤ 2p

p2

i=1

p

i=1

j=i

Proposition 28. Assume (A1) with q ≥ 2 and let p ≥ 0 such that 2(p + 1) ≤ q. Then there exists M > 0 such that, for all (θ, σ) ∈ C, for all 1 ≤ s ≤ t ≤ n and for

RECURSIVE ESTIMATION

all sequence {φi }s≤i≤t in Li(Rd , 1, R; p), (112)

t

X

(φi (Xi,n ) − Eθ,σ [φi (Xi,n )])

q i=s

p+1

≤M

33

|φi |Li(p)

sup i∈{s,...,t}

,θ,σ

t X

! 12 |φi |Li(p)

.

i=s

Proof. Let us apply Lemma 27 with Zi = φi (Xi,n ) − Eθ,σ [φi (Xi,n )] and p1 = p2 = q/(p + 1). From the Lq stability, we see that kZi k q ,θ,σ ≤ M |φi |Li(p) . It now p+1

Fi,n

. Using the exponential stability and the Lq remains to bound Eθ,σ [Zk ] q p+1

,θ,σ

stability, Proposition 2 shows that, for all s ≤ i ≤ k ≤ t, for all (θ, σ) ∈ C,

Fi,n

≤ M τ k−i |φk |Li(p) . The proof follows. 

Eθ,σ [Zk ] q p+1

,θ,σ

Another application of Lemma 27 is the following result. Proposition 29. Assume that q ≥ 5 and let p, r ≥ 0 such that u := 2(p + r) + 5 ≤ q. There exists a constant M such that, for all (θ, σ) ∈ C, for all 1 ≤ s ≤ t ≤ n and for all sequences {γi,j }s≤i 0 such that, for all φ ∈ Li(G, 1, F; p1 ) and ψ ∈ Li(E, m, G; p2 ), 1 +1 |φ ◦ ψ|Li(p1 p2 +p1 +p2 ) ≤ C |φ|Li(p1 ) (1 + |ψ|pLi(p ). 2)

(2) Let (G, | · |G ) be a normed algebra. For any p1 , p2 ≥ 0 and any integers m1 , m2 ≥ 1, there exists C > 0 such that, for all φ ∈ Li(E, m1 , G; p1 ) and ψ ∈ Li(F, m2 , G; p2 ), |φ ψ|Li(p1 +p2 +1) ≤ C |φ|Li(p1 ) |ψ|Li(p2 ) .

38

ERIC MOULINES? , PIERRE PRIOURET† , AND FRANC ¸ OIS ROUEFF?

Lemma 35. • Let β ≥ 0 and µ ∈ (0, 1). Then, there exists constants C1 , C2 only depending on β such that sup tβ (1 − µ)t ≤ C1 µ−β , t>0 ∞ X

(1 − µ)s sβ ≤ C2 µ−(1+β)

s=1

with the convention 00 = 1. • Let A be a positive definite matrix and β > 0. Then, as µ ↓ 0, ∞ X (128) (I − µ A)s ((s + 1)β − sβ ) = β Γ(β) (µ A)−β (1 + O(µ)). s=0

where Γ is the Gamma function. Proof. The result is trivial for β = 0, so we assume β > 0. A straightforward computation shows that supt≥0 tβ (1 − µ)t is achieved in t = t0 := −β/ log(1 − µ). Since log(1 − µ) is bounding above by −µ, the first bound is obtained. The second bound is obtained by bounding (1 − µ)s sβ by this supremum for s < t0 + 1 and by bounding the remainder of the sum (whose terms are decreasing) by Z ∞ Z ∞ 1 (1 − µ)s sβ ds = e−s sβ ds, β+1 (− log(1 − µ)) 0 0 which easily yields the first bound. Finally, let δ be an eigenvalue of A and write, for all β > 0 and for all µ ∈ (0, 1), Z s+1 ∞ ∞   X X s β β s (1 − µ δ) tβ−1 dt. S := (1 − µ δ) (s + 1) − s = β s=0

s=0

s

The proof of (128) follows by noting that Z ∞ (1 − µ δ) S ≤ (1 − µ δ)t tβ−1 dt = Γ(β)(− log(1 − µ δ))−β ≤ S 0

 References Aguech, R., Moulines, E. and Priouret, P. (2000). On a perturbation approach for the analysis of stochastic tracking algorithm. SIAM Journal on Control and Optimization 39 872–899. Baranger, J. (1991). Analyse num´erique. Hermann. Belitser, E. (2000). Recursive estimation of a drifted autoregressive parameter. The Annals of Statistics 28 860–870. Brockwell, P. J. and Davis, R. A. (1991). Time series: theory and methods. 2nd ed. Springer Series in Statistics, Springer-Verlag, New York. Dahlhaus, R. (1996). On the Kullback-Leibler information divergence of locally stationary processes. Stochastic Process. Appl. 62 139–168. Dahlhaus, R. (1997). Fitting time series models to non-stationary processes. Annals of Statistics 25 1–37.

RECURSIVE ESTIMATION

39

Dahlhaus, R. and Giraitis, L. (1998). On the optimal segment length for parameter estimates for locally stationary processes. J. Time Ser. Anal. 19 629–655. Dedecker, J. and Doukhan, P. (2002). A new covariance inequality and applications. Preprint. Dunford, N. and Schwartz, J. T. (1958). Linear Operators, vol. I. New York: : Wiley. Grenier, Y. (1983). Time-dependent ARMA modeling of nonstationary signals. IEEE Transactions on ASSP 31 899–911. Guo, L. (1994). Stability of recursive stochastic tracking algorithms. SIAM J. on Control and Optimization 32 1195–1125. Hall, P. and Heyde, C. C. (1980). Martingale Limit Theory and its Application. Academic Press Inc., New York. Hallin, M. (1978). Mixed autoregressive-moving average multivariate processes with time-dependent coefficients. J. Multivariate Anal. 8 567–572. Horn, R. A. and Johnson, C. R. (1991). Topics in matrix analysis. Cambridge university press. Kailath, T. (1980). Linear Systems. Prentice Hall. Kushner, H. and Yin, G. (1997). Stochastic Approximation Algorithms and Applications. Springer-Verlag, New York. ¨ m, T. (1983). Theory and Practive of Recursive IdentiLjung, L. and Soderstro fication. MIT Press, Cambridge, New Jersey. Priouret, P. and Veretennikov, A. Y. (1995). A remark on stability of the L.M.S. tracking algorithm. Stochastic Analysis and its Applications 16 118–128. Rao, T. S. (1970). The fitting of non-stationary time-series models with timedependent parameters. J. Roy. Statist. Soc. Ser. B 32 312–322. Solo, V. and Kong, X. (1995). Adaptive Signal Processing Algorithms: Stability and Performance. Prentice Hall, New Jersey.