Relative-Error Prediction in Nonparametric Functional Statistics ...

3 downloads 0 Views 621KB Size Report
nonparametric functional statistics, and there are several ways to explain it. A common ... study of the nonparametric prediction via relative error regression. They.
Relative-Error Prediction in Nonparametric Functional Statistics: Theory and Practice J. Demongeota , A. Hamiea , A. Laksacib , and M. Rachdi1c a

Univ. Grenoble-Alpes, Laboratoire AGIM FRE 3405 CNRS, Facult´e de M´edecine de Grenoble, Universit´e J. Fourier, 38700 La Tronche, France. e-mails: [email protected] and [email protected] b Laboratoire de Statistique et Processus Stochastiques, Universit´e Djillali Liab`es, Sidi Bel-Abb`es, BP. 89, Sidi Bel-Abb`es 22000, Algeria. e-mail: [email protected] c Univ. Grenoble-Alpes, Laboratoire AGIM FRE 3405 CNRS, UPMF, UFR SHS, BP. 47, 38040 Grenoble Cedex 09, France. e-mails: [email protected] or [email protected]

Abstract In this paper, an alternative kernel estimator of the regression operator of a scalar response variable Y given a random variable X taking values in a semi-metric space is considered. The constructed estimator is based on the minimization of the mean squared relative error. This technique is useful in analyzing data with positive responses, such as stock prices or life times, that are particularly common in economic/financial or biomedical studies. Least squares or least absolute deviation are among the most widely used criteria in statistical estimation for regression models. However, in many practical applications, especially in treating, for example, the stock price data, the size of the relative error, rather than that of the error itself, is the central concern of the practitioners. This paper offers then an alternative to traditional estimation methods by considering the minimization of the least absolute relative error for operatorial regression models. We prove the strong and the uniform consistencies of the constructed estimator. Moreover, the mean squared convergence rate is given and the asymptotic normality of the proposed estimator is proved. Notice that all these asymptotic results are established under less restrictive conditions and are closely related to the concentration properties of the probability measure of small balls of the underlying explanatory variable. Finally, supportive evidence is shown by 1

Corresponding author

Preprint submitted to the journal JMVA-SI: High and Infinite Dimensional Statistics.February 15, 2015

simulation studies and an application on some economic data was performed. Keywords: Mean square relative error, Nonparametric estimation, Functional data, Regression operator, Asymptotic normality, Small ball property, Stock price, Economic data 1. Introduction Let F be a semi-metric space equipped with a semi-metric d. We consider n pairs of independent random variables (Xi , Yi ) for i = 1, . . . , n that we assume drawn from the pair (X, Y ) which is valued in F × IR. In this paper, we aim to study the link between the response variables Y and the explanatory variable X. Notice that this link is an important subject in nonparametric functional statistics, and there are several ways to explain it. A common nonparametric modelization of this relationship is based on the following consideration: Y = r (X) + ε, (1) where r is the regression operator defined from F to R and ε is a random error variable. Recall that the operator r is usually estimated by minimizing the expected squared loss function:   E (Y − r(X))2 |X . However, this loss function which is considered as a measure of the prediction performance may be unadapted to some situations. Indeed, using the least square regression is translated as treating all variables, in the study, as having an equal weight. Thus, the presence of outliers can lead to irrelevant results. So, in this paper we circumvent the limitations of this classical regression by estimating the operator r with respect to the minimization of yhe following mean squared relative error (MSRE): " 2 # Y − r(X) |X for Y > 0. (2) E Y This criterium is clearly a more meaningful measure of the prediction performance than the least square error, in particular, when the range of predicted values is large. Moreover, the solution of (2) can be explicitly expressed by 2

the ratio of the first two conditional inverse moments of Y given X. Thus, we estimate the regression operator r which minimizes the MSRE by: Pn Y −1 K(h−1 d(x, Xi )) (3) re(x) = Pni=1 i−2 −1 i=1 Yi K(h d(x, Xi )) where K is a kernel and h = hK,n is a sequence of positive real numbers. Although the MSRE is frequently used as a measure of the performance in practice, especially for time series forecasting, the theoretical properties of this regression analysis were not widely studied in the statistical literature and even less in the functional framework. Indeed, the first consideration of the MSRE as a loss function, in the estimation method, is given by Narula and Wellington (1977). Since this work, criteria based on minimizing the sum of absolute relative errors (ARE) and the sum of squared relative errors (SRE) were proposed in different areas (cf. for instance, Shen et al. (1985) and Khoshgoftaar et al. (1992) for some models in the software engineering, Chatfield (2007) for some examples in medicine or Chen et al. (2010) for some financial applications). Notice that, most of the recent results, in this estimation method, are devoted to linear or multiplicative regression models (cf. Yang and Ye (2013) for recent advances and references). The nonparametric treatment of these models has not yet been fully explored. As far as we know, only the paper by Jones et al. (2008) has paid attention to the study of the nonparametric prediction via relative error regression. They studied the asymptotic properties of an estimator minimizing the sum of the squared relative errors by considering both estimation methods (the kernel method and the local linear approach). Then, the main purpose of this paper is to study this regression model when the explanatory variable is of functional kind. Recall that the observation of functional variables has become usual, due to the progress of computational tools which offer the opportunity to observe phenomena in a real time monitoring. Such continuous data (or functional data, curves, surfaces, etc) may occur in different fields of applied sciences, for example in biomechanics (e.g., human movements), in chemometrics (e.g., spectrometric curves), in econometrics (e.g., the stock market index) or in medicine (e.g., electrocardiograms/electro-encephalograms). Then, the modelization of the functional variables is becoming more and more popular since the publication of the monograph by Ramsay and Silverman (1997) on the functional data analysis. For an overview on this topic, we refer readers to the monographs 3

by Ramsay and Silverman (2005), Bosq (2000), Ferraty and Vieu (2006) and Ferraty and Romain (2011). Recall that the ordinary least square error regression (OLS), based on the minimization of the sum of squared errors, is one of the most popular models in nonparametric functional data analyses. The first results on this topic were obtained in Ferraty and Vieu (2000). In this latter paper, it is established the almost-complete 2 (a.co.) pointwise consistency of the kernel estimator of the regression operator when the observations are independent and identically distributed (i.i.d.). The convergence in Lp -norm, of this model, has been stated in Niang and Rhomari (2003), while in Delsol (2007) it is stated its exact asymptotic expression for Lp -errors. Then, asymptotic normality results for the same estimator, in the strong mixing case, have been established in Masry (2005). Recently, in Ferraty et al. (2010) it is given the uniform version of the almost-complete convergence rate, in the i.i.d. case. Indeed, in Delsol et al. (2008) it is treated the Lp -consistency of a family of M-estimators of the regression operator with a functional regressor in both circumstances (1) i.i.d. and (2) strong mixing conditions. Other alternative estimation methods which are based on the robust approach or on the local linear smoothing have been recently considered in functional statistics. For instance, motivated by its superiority over the classical kernel method, the local linear smoothing technique has been considered, in the functional data setting, by many authors (cf. for instance, Barrientos et al. (2010), Ba`ıllo and Gran´e (2009), El Methni and Rachdi (2011) and Demongeot et al. (2011, 2012, 2013, 2014) and references therein). The main aim of this paper, is then to study, under some general conditions, the asymptotic properties of an alternative functional kernel estimator of the regression operator. More precisely, we prove the almost-complete convergence, the uniform almost-complete convergence, the mean square convergence and the asymptotic normality of the estimator given by (3). All these asymptotic properties are obtained with a precision of the convergence rate. Moreover, as for all nonparametric models in functional statistics, the topological structure of the functional space of the regressor is exploited through 2

Let (zn )n∈IN be a sequence of real r.v.’s. We say that zn converges almost-completely P∞ (a.co.) toward zero if, and only if, ∀ > 0, n=1 P (|zn | > ) < ∞. Moreover, we say that the rate of the almost complete convergence of zn to zero isPof order un (with un → 0) and ∞ we write zn = Oa.co. (un ) if, and only if, ∃ > 0 such that n=1 P (|zn | > un ) < ∞. This kind of convergence implies both almost-sure convergence and convergence in probability.

4

the concentration properties of the probability measure of the functional variable. We recall that, in functional statistics the uniform convergence is not a direct extension of the pointwise one, but it requires some additional techniques and conditions. The convergence rate of this latter is expressed, here, in terms of Kolmogorov’s entropy which is also related to the topological structure of the functional space of the regressor. On the other hand, in order to show the easy implementation of the constructed estimator, in practice, we give the exact expression involved in the leading terms of the quadratic error. This latter is useful for the practical determination of the smoothing parameters h and the semi-metric d of the functional space. Then, we establish the asymptotic law i.e., asymptotic normality, of re which leads to interesting perspectives from a practical point of view, such as, hypothesis testing and confidence intervals determination. Finally, it should be noticed that our assumptions and results unify both cases of finite and infinite dimension of the covariates, which permits to overcome the curse of dimensionality problem. The outline of this paper is as follows: we study the pointwise consistency in Section 2. The uniform almost-complete convergence is treated in Section 3. Section 4 is dedicated to the mean square error of our estimator, while its asymptotic normality is studied in Section 5. Eventually, the proofs of the auxiliary results are given in the Appendix. 2. Preliminaries In this section, we construct the regression estimator allowing the best mean squared relative error prediction. For this aim, assume that the first two conditional inverse moments of Y given X, that is: gγ (x) := E(Y −γ |X = x) for γ = 1, 2, exist and are finite almost-surely (a.s.). Then, one can show easily (cf. Park and Stefanski, 1998) that the best mean squared relative error predictor of Y given X is: g1 (X) E(Y −1 |X) = , a.s. (4) g˜(X) = −2 E(Y |X) g2 (X) In order to show this result, let g(X) be any predictor of Y . Since:   Y − r(X) E |X = 0, a.s. Y2 5

(5)

then by some algebra, the conditional MSRE of g(X) as a predictor of Y is given by:  2 !  Y − g(X) var(Y −1 |X) E |X = + (g(X) − ge(X))2 E(Y −2 |X) −2 Y E(Y |X) := MSRE1 + MSRE2

(6)

Notice that the term MSRE1 does not depend on g(X) and MSRE2 is minimized when g(X) = ge(X) almost-surely. Consequently, ge(X) is the best MSRE predictor. Then (6) becomes: var(Y −1 |X) E(Y −2 |X) Remark 1. Remark that ge(X) may be written as follows: ge(X) =

E(Y −1 |X) . var(Y −1 |X) + (E(Y −1 |X))2

(7)

Then ge(X) is a function of the conditional mean and variance of Y −1 given X. This suggests a method permitting to estimate this function. Then, that motivates the use of mean and variance modelling methods to fit models to the mean and variance of Y −1 as functions of X (cf. Carroll and Ruppert, 1988). From remark 1, one can construct kernel estimators of E(Y −γ |X) for γ = 1, 2, which may be plugged in the right-hand side of (7) which leads to the kernel estimator given by (3). 3. The strong consistency The main purpose of this section is to study the almost-complete convergence of re(x) toward r(x). To do that we fix a point x in F, and we denote by Nx a neighborhood of this point. Hereafter, when no confusion is possible, we will denote by C and C 0 some strictly positive generic constants and by: Ki = K(h−1 d(x, Xi )) for i = 1, . . . n, where K is a kernel function and h := hn,K is a sequence of positive numbers decreasing toward 0, which satisfy some supplementary hypotheses. Moreover, we denote the closed-ball centered at x with radius s by B(x, s) = {z ∈ F : d(z, x) < s}. In what follows, we will need the following assumptions: 6

(H1) IP(X ∈ B(x, s)) =: φx (s) > 0 for all s > 0 and lim φx (s) = 0. s→0

(H2) For all (x1 , x2 ) ∈ Nx2 , we have: |gγ (x1 ) − gγ (x2 )| ≤ C dkγ (x1 , x2 ) for kγ > 0. (H3) The kernel K is a measurable function which is supported within (0, 1) and satisfying: 0 < C ≤ K(·) ≤ C 0 < ∞. (H4) The small ball probability verifies: nφx (h) → ∞ as n → +∞. log n (H5) The inverse moments of the response variable verify: for all m ≥ 2, E[Y −m |X = x] < C < ∞. It must be noticed that our conditions are very usual in the context of nonparametric functional statistics. The assumption (H1) is the classical concentration property of the probability measure of the functional variable X. It allows to quantify the contribution of the topological structure of F in the convergence rate. While the regularity condition (H2) permits to evaluate the bias term of the estimator (3), assumptions (H3), (H4) and (H5) are technical conditions imposed for getting briefly the proof of our results. Theorem 1. Under assumptions (H1)-(H5), we have: s |e r(x) − r(x)| = O(hk1 ) + O(hk2 ) + Oa.co.

log n nφx (h)

! .

(8)

Proof of Theorem 1. The estimator re(x) may be written in the following form: ge1 (x) re(x) = ge2 (x) where n

geγ (x) =

X −γ 1 Y K(h−1 d(x, Xi )) for γ = 1, 2. nE[K(h−1 d(x, X1 ))] i=1 i 7

On the other hand, we have the following decomposition: i r(x) 1 h ge1 (x) − g1 (x) + [g2 (x) − ge2 (x)] , re(x) − r(x) = ge2 (x) ge2 (x)

(9)

Therefore, Theorem 1’s result is a consequence of the following intermediate results (cf. Lemmas 1 and 2). Lemma 1. Under hypotheses (H1) and (H3)-(H5), we have, for γ = 1, 2, that: s ! log n . |geγ (x) − E[geγ (x)]| = Oa.co. nφx (h) Lemma 2. Under hypotheses (H1)-(H4), we have, for γ = 1, 2, that: |E[geγ (x)] − gγ (x)| = O(hkγ ). Corollary 1. Under the hypotheses of Theorem 1, we obtain:   ∞ X g2 (x) IP ge2 (x) < < ∞. 2 n=1 4. The uniform consistency In this section, we focus on the uniform almost-complete convergence of the estimator re, given by (3), over a fixed subset SF of F. For this, we denote by ψSF (·) the Kolmogorov’s ε-entropy of SF and we reformulate the previous conditions (H1)-(H5) as follows: (U1) For all x ∈ SF and s > 0: 0 < Cφx (s) ≤ IP (X ∈ B(x, s)) ≤ C 0 φx (s) < ∞. (U2) There exists η > 0, such that: for all x, x0 ∈ SFη , |gγ (x) − gγ (x0 )| ≤ Cdkγ (x, x0 ), where SFη = {x ∈ F : there exists x0 ∈ SF such that d(x, x0 ) ≤ η}. (U3) The kernel K is a bounded and Lipschitzian function on its support (0, 1). 8

(U4) The functions φx and ψSF are such that: (U4a) there exists η0 > 0 such that for all η < η0 , φ0x (η) < C, where φ0x denotes the first derivative function of φx . (U4b) for a large enough integer n, we have:   (log n)2 log n n φx (h) < ψSF < , n φx (h) n log n (U4c) the Kolmogorov’s -entropy of SF satisfies: ∞ X

   log n exp (1 − β)ψSF < ∞ for some β > 1. n n=1

(U5) For any m ≥ 2, E(|Y −m ||X = x) < C < ∞ for all x ∈ SF and inf g2 (x) ≥ C 0 > 0. x∈SF

Notice that it is clear that conditions (U1)-(U3) and (U5) are simple uniformizations of assumptions (H1)-(H3) and (H5), while assumption (U4) controls the entropy of SF which is closely linked to the semi-metric d. Similarly to the concentration property, this additional argument controls also the contribution of the topological structure of F in the uniform convergence rate. Theorem 2. Under hypotheses (U1)-(U5), we have: s r(x) − r(x)| = O(hk1 ) + O(hk2 ) + Oa.co.  sup |e

x∈SF

log n n

ψSF nφx (h)

 .

(10)

Proof of Theorem 2. The proof of this theorem is based on the decomposition given by (9) and on the following intermediate results (cf. Lemmas 3 and 4 bellow). Lemma 3. Under assumptions (U1) and (U3)-(U5), we have, for γ = 1, 2, that: sup |E[geγ (x)] − gγ (x)| = O(hkγ ). x∈SF

9

Lemma 4. Under hypotheses (U1)-(U4), we have, for γ = 1, 2, that: s  log n ψSF n . sup |geγ (x) − E[geγ (x)]| = Oa.co.  nφx (h) x∈SF Corollary 2. Under the hypotheses of Lemma 4, we obtain:   ∞ X there exists δ > 0, such that IP inf ge2 (x) < δ < ∞. n=1

x∈SF

5. The mean squared consistency It is well known that the main feature of the convergence in mean square is that, unlike to all other consistency modes, the mean squared error can be easily quantified in the empirical way. This feature is useful in numerous functional statistical methodologies, in particular in the prediction problems, for the bandwidth choice or for the semi-metric choice. Our main interest, in this paragraph, is to give the exact expression involved in the leading terms of the quadratic error (cf. for instance Demongeot et al. (2013) and Rachdi et al. (2014) and references therein). To do that, we replace assumptions (H1), (H3) and (H4) respectively by the following hypotheses: (M1) The concentration property (H1) holds. Moreover, there exists a function χx (·) such that: for all s ∈ [0, 1],

φx (sr) = χx (s). r→0 φx (r) lim

h i (M2) For γ ∈ {1, 2}, the functions Ψγ (·) = E gγ (X) − gγ (x) d(x, X) = · are derivable at 0. (M3) The kernel function K satisfies (H3) and is a differentiable function on ]0, 1[ where its first derivative function K 0 is such that: −∞ < C < K 0 (·) < C 0 < 0. (M4) The small ball probability satisfies: nφx (h) −→ ∞. 10

(M5) For m ∈ {1, 2, 3, 4}, the functions E[Y −m |X = ·] are continuous in a neighborhood of x. Similarly to the previous asymptotic properties, the mean squared consistency is obtained under very standard conditions. These latter are simple adaptations of those used in Ferraty et al. (2007). We recall that condition (M1) is fulfilled by several small ball probability functions; we quote, for instance, the following cases (which can be found in Ferraty et al., 2007): (i) φx (h) = Cx hγ for some γ > 0 with χx (u) = uγ ,  (ii) φx (h) = Cx hγ exp −Ch−p for some γ > 0 and p > 0 with χx (u) = δ1 (u) where δa (·) denotes the Dirac’s function at a, (iii) φx (h) = Cx | ln h|−1 with χx (u) = 1I]0,1] (u), where 1IA denotes the indicator function on the set A. Notice that assumption (M2) is a regularity condition which characterizes the functional space of our model and is needed to explicit the bias term. Moreover, hypotheses (M3)-(M5) are technical conditions and are also similar to those considered in Ferraty et al. (2007) in the regression case. Theorem 3. Under assumptions (M1)-(M5), we have: 2

E [e r(x) − r(x)] =

Bn2 (x)h2

where Bn (x) =

σ 2 (x) + o(h) + o + nφx (h)



1 nφx (h)

 ,

(Ψ01 (0) − r(x)Ψ02 (0)) β0 β1 g2 (x)

(11)

and σ2 =

(g2 (x) − 2r(x)E[Y −3 |X = x] + r2 (x)E[Y −4 |X = x]) β2 g22 (x)β12

with Z 1 Z 1 0 j β0 = K(1)− (sK(s)) χx (s)ds and βj = K (1)− (K j )0 (s)χx (s)ds for j = 1, 2. 0

0

11

Proof of Theorem 3. By standard arguments we show that:   E [e g1 (x)] 1 +O E [e r(x)] = E [e g2 (x)] nφx (h) and var [e r(x)] =

var [e g1 (x)] E [e g1 (x)] cov(e g1 (x), ge2 (x)) −2 2 (E [e g2 (x)]) (E [e g2 (x)])3   [e g2 (x)] (E [e g1 (x)])2 1 + +o (E [e g2 (x)])4 nφx (h)

Consequently, the proof of Theorem 3 can be deduced from the following intermediate results (cf. Lemmas 5 and 6) Lemma 5. Under the hypotheses of Theorem 3, we have, for γ = 1, 2: E [e gγ (x)] = gγ (x) + Ψ0γ (0)

β0 h + o(h). β1

Lemma 6. Under the hypotheses of Theorem 3, we have, for γ = 1, 2:   β2 1 −2γ var [e gγ (x)] = E[Y |X = x] 2 +o , β1 nφx (h) nφx (h) and cov(e g1 (x), ge2 (x)) = E[Y

−3

β2 +o |X = x] 2 β1 nφx (h)



1 nφx (h)

 .

6. The asymptotic normality This section is devoted to the establichment of the asymptotic normality of re(x). For this, we keep conditions (M1)-(M5) (cf. Section 5) to establish the following Theorem: Theorem 4. Assume that (M1)-(M5) hold, then for any x ∈ F, we have: 1/2  nφx (h) D (e r(x) − r(x) − Bn (x) − o(h)) → N (0, 1) as n → ∞. 2 σ (x) D

where → means the convergence in distribution. 12

Proof of Theorem 4. We write: re(x) − r(x) = where An =

1 [Dn + An (ge2 (x) − E[ge2 (x)])] + An ge2 (x)

1 [E[ge1 (x)]g2 (x) − E[ge2 (x)]g1 (x)] E[ge2 (x)]g2 (x)

and Dn =

i h i i 1 hh ge1 (x) − E[ge1 (x)] g2 (x) + E[ge2 (x)] − ge2 (x) g1 (x) . g2 (x)

Since An = Bn (x) + o(h), then re(x) − r(x) − Bn (x) − o(h) =

1 [Dn + An [ge2 (x) − E[ge2 (x)]]] ge2 (x)

(12)

Therefore, Theorem 4 is a consequence of the following results (cf. Lemmas 7 and 8). Lemma 7. Under the hypotheses of Theorem 4, we obtain: 

nφx (h) 2 g2 (x)σ 2 (x)

1/2 h

i h i  D ge1 (x) − E[ge1 (x)] g2 (x) + E[ge2 (x)] − ge2 (x) g1 (x) → N (0, 1).

and Lemma 8. Under the hypotheses of Theorem 4, we obtain: ge2 (x) → g2 (x), in probability, and 

nφx (h) g2 (x)2 σ 2 (x)

1/2 An (ge2 (x) − E[ge2 (x)]) → 0, in probability.

13

7. A simulation study In this section, we will conduct a simulation study and an application of our method on some economic real data. In order to highlight the pertinence and the superiority of the regression estimator re(x) over the following conventional classical kernel one: Pn Yi K(h−1 d(x, Xi )) . rb(x) = Pi=1 n −1 i=1 K(h d(x, Xi )) Then, we will try to examine the asymptotic normality of the relative regression model through the same simulated data. 7.1. Accuracy of the relative method on a simulated data The main purposes of this section is (1) to illustrate our model by showing its easily implementation via a Monte Carlo study and (2) to test the sensitivity of our procedure to the presence of outliers. For these aims, we consider the following explanatory curves: Xi (t) = ai sin(4(bi − t)) + bi + ηi,t for all t ∈ [0, 1[ and for i = 1, 2, . . . , 300, where ai ∼ N (5, 2), bi ∼ N (0, 0.1) and ηi,t ∼ N (0, 0.2). Moreover, the curves Xi ’s are discretized on the same grid generated from 100 equispaced measurements in [0, 1] (cf. Fig. 1). On the other hand, we define the regression operator r by: Z 1 dt . r(x) = 0 1 + |x(t)| So the scalar response Y is then defined by: Y = r(X) +  where  ∼ N (0, 1). To carry out our goals, we compare on a finite-sample the behaviours of re and rb in both cases: (1) data without outliers and (2) data affected by some outliers. In this application example, we use a MAD-Median rule (cf. Wilcox, 2005) for detecting outliers. Recall that this method will refer to declaring Yi an outlier if: |Yi − M | > C, MAD ∗ 0.6745 where M is the sample median, MAD is the median absolute deviation given by: MAD = median(|Y1 − M |, |Y2 − M |, ..., |Yn − M |), 14

5 0

Curve

−5 0.0

0.2

0.4

0.6

0.8

1.0

t

Figure 1: The functional regressors

p and C is taken to be χ2 .975 : the square root of the quantile of a chi-squared distribution with one degree of freedom. Applied to our sample data the MAD-Median method identifies 23 outliers. For the first comparison, we remove all the detected outliers observations from the original data. Thus, our first example is carried out by homogeneous data, i.e., does not contain outliers observations. Further, we randomly split this data into two samples. The first one, called the learning sample, contains the first 200 observations, and the second one, called the testing sample, contains 50 observations. Then, the performance of both estimators is described by the mean squared prediction error: MSE =

X 1 b i ))2 (Yi − θ(X 50 i,(X ,Y )∈the testing sample i

i

and the relative mean squared prediction error: RMSE =

X 1 b i ))2 (Yi − θ(X var(Y ) i,(X ,Y )∈the testing sample i

i

where θb means both regression models re and rb. 15

Method CKE REE

MSE 2.05 2.09

RMSE 0.0263 0.0268

Table 1: Values of the MSE and RMSE for the simulated data

As in all smoothing methods, the choice of the smoothing parameter has a crucial role in the computational issues. In this illustration, we use the cross-validation procedure described in Rachdi and Vieu (2007) for which the bandwidth h is chosen via the following rule: hopt = arg min CV (h) = arg min h

h

n  X

Yj − θb(−j) (Xj )



j=1

where θb(−j) is the leave-one-out-curve estimator of re and rb. We point out that the asymptotic optimality of this procedure, in the classical case, has been studied in Rachdi and Vieu (2005, 2007) and Benhenni et al. (2007), while for the relative case there is no data-driven rule which permits to select automatically and optimally the bandwidth parameter even in the non-functional case. Then, this subject is an important prospect of this work. Now, regarding the shape of the curves Xi (cf. Fig. 1), it is clear that the PCA-type semi-metric (cf. Ferraty and Vieu (2006), Benhenni et al. (2007)) is well adapted to this data set. It should be also noticed that best results in terms of prediction are obtained for q = 4 (the number of components in the PCA-type semi-metric). Finally, we precise that for both estimators (e r or rb) used the quadratic kernel. The obtained results are shown in Fig. 2. It is clear that there is no meaningful difference between the two estimation methods: the Classical Kernel Estimator (CKE) and the Relative Error Estimator (REE) (cf. Table 1.). Then, we concentrate on the comparison of both models performances in presence of outliers. For this aim, we introduced a artificial outliers by multiplying some values of Y , in the learning sample by 10. Both estimators are obtained by the same previous selection methods of the smoothing parameter, the same metric and, also, the same kernel. Finally, we report the obtained results in Table 2. (resp. Table 3.) where we show the values of the MSE (resp. RMSE) according to the number of introducted artificial outliers. 16

 

Figure 2: The prediction results (predicted responses VS observed responses), obtained (on the left) by LCV with respect to the MSE and (on the right) LCV with respect to the RMSE

Statistics 50 40 CKE 3412 2202.0 REE 5.035 4.62

30 20 10 0 1178.0 557.9 138.8 3.047 4.086 3.945 3.549 3.56

Table 2: Values of the MSE according to the number of introducted artificial outliers (first line)

Recall that, in the first case, both estimators are equivalent. However, when there are outliers (cf. Tables 2 and 3), the relative regression estimator performs better than the classical kernel method, i.e., the classical kernel

17

Statistics 50 CKE 21.080 REE 0.03104

40 13.96 0.02843

30 7.379 0.02559

20 3.379 0.0238

10 0.8862 0.02265

0 0.0186 0.02177

Table 3: Values of the RMSE according to the number of introducted artificial outliers (first line)

method is very sensitive to the presence of outliers. The MSE and RMSE values for the classical kernel method case increase substantially with respect to the number of the outlier values, whereas these errors remain very low for the relative error estimator case. The third illustration concerns the asymptotic normality of re. Precisely, our aim is to test the normal distribution of the data expressed by: 1/2  nφx (h) (e r(x) − r(x) − Bn (x) − o(h)) . σ 2 (x)  1/2 x (h) Clearly, the estimation of the normalized deviation nφ and the bias σ 2 (x) term Bn (x) are the main challenges in the practical implementation of this asymptotic property. For both terms, we calculate the quantities g3 (x) and g4 (x) in the same way as for g1 (x) or g2 (x) and we estimate empirically β1 and β2 by: n

βb1 =

1 X Ki nφx (h) i=1

n

and βb2 =

X 1 Ki2 . nφx (h)g(x) i=1

These last estimations are justified by the fact that: 1 IE[K1j ] → βj for j = 1, 2. φx (h) Thus, the practical estimator of the normalized deviation is: !1/2 P 2 ( ni=1 Ki ) ge2 2 (x) P . ( ni=1 Ki2 ) (ge2 (x) − 2e r(x)ge3 (x) + re2 (x)ge4 (x)) We point out that the functions φx (·) did not intervene in the computation of the normalized deviation. Concerning the bias term, we have to estimate 18

the parameters Ψ01 (0), r(x), Ψ02 (0), β0 , β1 and g2 (x) confined in the formula (11). For this, we estimate the quantity β0 by: n

βb0 =

1 X d(x, Xi )Ki nφx (h) i=1

and the real functions Ψi , for i = 1, 2, can be viewed as real regression functions with response variables (gi (X)−gi (x)) and regressors d(X, x). Once again, φx (·) does not intervene in the computation of the bias term. For a sake of shortness, we neglect this bias term and we use the same choice of the kernel K, the bandwidth parameter h and the semi-metric d as in previous illustrations. In order to examine/conduct a Monte Carlo study of the asymptotic normality, we fix one curve, X0 say, from the previous data. Then, we draw m independent n-samples of the same data and we compute, for each sample, the quantity: !1/2 P 2 ( ni=1 Ki ) ge2 2 (X0 ) P (e r(X0 ) − r(X0 )) r(X0 )ge3 (X0 ) + re2 (X0 )ge4 (X0 )) ( ni=1 Ki2 ) (ge2 (X0 ) − 2e Finally, the Kolmogorov-Smirnov test, for n = m = 100, gives 0.84 as a P-

 

Figure 3: The QQ-plot of the sample quantiles VS the theoretical quantiles

value. That permits to conclude that our asymptotic law has a good behavior with respect to the standard normal law. This conclusion is confirmed by comparing its QQ-Plot against a standard normal distribution (cf. Fig. 3). 19

 

Figure 4: Some curves of the monthly energy prices

7.2. Application on real dataset We focus, now, on the comparison of the relative regression estimator (RRE) performances than the classical kernel regression estimator (KRE) ones, for some economic dataset. The considered dataset concerns the inflation data in 34 countries. In this example we aim to analyze the relationship between the energy and the food prices. Specifically, we want to predict the annual mean of the food consumer prices given the annual curve of the energy prices. For this purpose, we consider a dataset3 which comes from the Organization for Economic Cooperation and Development (OECD). We precise that, we use the dataset associated to the 2010-index of some independent years that are (1990, 1998, 2006, 2012) in the 34 countries. So, the functional predictor X is the monthly energy prices in some countries during one year from the above four years. The functional predictors Xi for i = 1, . . . , 136, are plotted in Fig. 4. As in all economic data, the response variable Y , which corresponds to the annual mean of the food consumer prices in the same country and in the same year, is affected by the presence of some outliers (cf. Fig. 5). Notice that the routine ODM in the R-Package OutlierDM detected 36-outliers in the response variable Y . In order to give a fair comparison between both models, we use the same rule to select parameters involved in the definition of the estimators. More precisely, we use, for both models, the local cross-validation method on the number of nearest-neighbors to select the smoothing parameter h. On the 3

available on the Website http://stats.oecd.org/

20

4e−16

100

2e−16

80 60 40

0e+00

20 0

40

80

120

−4

−2

0

Figure 5: The response variable (on the left) and the outlier detection (on the right) : the dashed line corresponds to the third quantile and the continuous line corresponds to the upper bound

other hand, the choice of the semi-metric is closely linked to the shape of the curves Xi . For this example, we use the semi-metric based on the first q eigenfunctions of the empirical covariance operator associated to the q greatest eigenvalues (cf. Ferraty and Vieu (2006) for more discussions). We point out that we have carried out our study for several values of the parameters q and we have remarked that our results show a slightly better performance for q = 5. Elsewhere, the kernel is chosen to be a quadratic function, as in previous examples. In this comparison study, we splitted randomly our data into two subsets (100 times). In each time, we took a learning sample of 100 observations and a test sample of 36 observations. The first sample will be our statistical sample from which are calculated the estimators and the 36 remaining curves are considered as the test sample. Finally, we check the performance of both models by computing and comparing the MSE and RMSE errors as defined as in previous section. Then, we have plotted, in Fig.6, these errors where it appears clearly that the REE is quite more performant than the CKE. Moreover, the superiority of this model is more important when the RMSE error is considered.

21

 

Figure 6: Box plots of the RMSE (on the left) and of the MSE (on the right): in each graphique, the left column corresponds to the relative technique of estimation and the right column corresponds to the classical kernel method

8. Appendix Proof of Lemma 1. Define, for γ = 1, 2:   1  Γi,γ = Ki Yi−γ − IE Ki Yi−γ . E[K1 ] Then: geγ (x) − E [geγ (x)] =

n X

Γi,γ .

i=1

The proof of this Lemma is based on the exponential inequality given in Corollary A.8.iiin Ferraty and Vieu (2006), which requires the evaluation of  m the quantity E |Γi,γ | . Firstly, we write for all j ≤ m:      E Y1−jγ K1j = E K1j E Y1−jγ |X1   = CE K1j ≤ Cφx (h), which implies that: 1 E j [K

1]

   E Y1−jγ K1j = O φx (h)−j+1 , 22

(13)

and

  1 E Y1−γ K1 ≤ C. E[K1 ]

Next, by the Newton’s binomial expansion we obtain: E



|Γm i,γ |



≤ C

m X j=0

 −jγ j  1 E |Y K (x)| 1 1 (E[K1 ])k

≤ C max φ−j+1 (h) x ≤

j=0,...,m Cφ−m+1 (h). x

It follows that:   −m+1 E |Γm (h)). i,γ | = O(φx

(14)

Thus, we apply the mentioned exponential inequality with a = φx −1/2 (h) to get, for all η > 0 and for γ = 1, 2, that: s ! log n 2 ≤ C 0 n−Cη . IP |geγ (x) − E[geγ (x)]| > η n φx (h) Finally, an appropriate choice of η permits to deduce that: s ! X log n IP |geγ (x) − E[geγ (x)]| > η < ∞. n φx (h) n Proof of Lemma 2. Since (X1 , Y1 ), . . . , (Xn , Yn ) are identically distributed, we have that:     1 E K1 1IB(x,h) (X1 ) gγ (x) − E Y1−γ |X = X1 , E [K1 ] (15) where 1IA denotes the indicator function on the set A. Then, by the H¨older hypothesis (H2) we get that: |E[geγ (x)] − gγ (x)| =

1IB(x,h) (X1 )|gγ (X1 ) − gγ (x)| ≤ Chkγ . Thus, |E[geγ (x)] − gγ (x)| ≤ Chkγ . 23

Proof of Corollary 1. It is easy to remark that: |ge2 (x)| ≤

g2 (x) , 2

implies that: |g2 (x) − ge2 (x)| ≥

g2 (x) . 2

So,     g2 (x) g2 (x) IP |ge2 (x)| ≤ ≤ IP |g2 (x) − ge2 (x)| > . 2 2 Consequently:

∞ X



g2 (x) IP |ge2 (x)| < 2 n=1

 < ∞.

Proof of Lemma 3. Let x1 , . . . , xN be a finite set of points in F such that: SF ⊂

N [

B(xk , ε) with ε =

k=1

For all x ∈ SF , we set k(x) = arg

log n . n

min k∈{1,2,...,N (SF )}

d(x, xk ) and Ki (x) =

K(h−1 d(x, Xi )). Then, from the following decomposition: sup |geγ (x)−E[geγ (x)]| ≤ sup |geγ (x) − geγ (xk(x) )| + sup |geγ (xk(x) ) − E[geγ (xk(x) )]| x∈S x∈SF {z } | F {z } |

x∈SF

F1

F2

+ sup |E[geγ (xk(x) )] − E[geγ (x)]| x∈S | F {z } F3

we would have demonstrated the expected result if we show that: Fi → 0 for i = 1, 2, 3. • For the term F1 , a direct consequence of the assumption (H3) is that: Cφx (h) ≤ E[K1 (x)] ≤ C 0 φx (h). 24

Therefore, n 1 1 1 X −γ −γ Ki (x)Yi − Ki (xk(x) )Yi ≤ sup E[K1 (x)] E[K1 (xk(x) )] x∈SF n i=1

F1

n C 1 X ≤ sup Ki (x) − Ki (xk(x) ) Yi−γ 1IB(x,h)∪B(xk(x) ,h) (Xi ) φx (h) x∈SF n i=1

≤ C sup (F11 + F12 + F13 ) , x∈SF

with F11 =

n 1 X Ki (x) − Ki (xk(x) ) Yi−γ 1IB(x,h)∩B(xk(x) ,h) (Xi ), nφx (h) i=1 n

F12

1 X = Ki (x)Yi−γ 1IB(x,h)∩B(xk(x) ,h) (Xi ), nφx (h) i=1

F13

1 X = Ki (xk(x) )Yi−γ 1IB(x,h)∩B(xk(x) ,h) (Xi ). nφx (h) i=1

n

where A¯ denotes the complement set of the set A. Concerning the term F11 , we use the fact that the kernel K is a lipschitzian function on (0, 1) and we write: F11 ≤ sup x∈SF

n CX  Y −γ 1IB(x,h)∩B(xk(x) ,h) (Xi )Yi−γ . Zi,γ with Zi,γ = n i=1 h φx (h) i

While for the two last terms, F12 and F13 , since the kernel K is bounded, we write that: F12

n CX 1 ≤ Wi,γ with Wi,γ = Y −γ 1IB(x,h)∩B(xk(x) ,h) (Xi ) n i=1 φx (h) i

and F13 ≤

n CX 1 Vi,γ with Vi,γ = Y −γ 1IB(x,h)∩B(xk(x) ,h) (Xi ). n i=1 φx (h) i

Thus, it suffices to use the same arguments as those used in Lemma 1’s proof, where Γi,γ is replaced by Zi,γ , Wi,γ and Vi,γ . In this case, 25

we apply the inequality of Corollary A.8.ii in Ferraty and Vieu (2006) with a2 = /h φx (h), to get: s !  log n , F11 = Oa.co. n h φx (h) s !     log n F12 = O + Oa.co. φx (h) n φx (h)2 and  F13 = O

 φx (h)

s

 + Oa.co.

 log n n φx (h)2

! .

Then, the combination of conditions (U4a) and (U4b) allows to simplify the convergence rate and to get: s ! ψSF () . F1 = Oa.co. nφx (h) • Similarly, one can state the same rate of convergence for F3 : s ! ψSF () . F3 = Oa.co. nφx (h) • Now, to evaluate the term F2 , we write, for all η > 0, that: s ! ψSF () IP F2 > η nφx (h) s ! ψSF () = IP max |geγ (xk(x) ) − E[geγ (xk(x) )]| > η k∈{1,...,N } nφx (h) s ! ψSF () ≤ N max IP |geγ (xk ) − E[geγ (xk )]| > η . k∈{1,...,N } nφx (h) Once again, we apply the exponential inequality given by Corollary A.8.ii in Ferraty and Vieu (2006) to: ∆i,γ =

   1 Ki (xk )Yi−γ − IE Ki (xk )Yi−γ . E[K1 (xk ))] 26

Since E [|∆i,γ |]m = O(φx (h)−m+1 ), then, we can take a2 = 1/φx (h). Hence, for all η > 0: s s ! ! n ψSF () 1 X ψSF () ∆i,γ > η = IP IP |geγ (xk ) − E[geγ (xk )]| > η nφx (h) n i=1 nφx (h) ≤ 2 exp ( − Cη 2 ψSF () ). By using the fact that ψSF () = log N and by choosing η such that Cη 2 = β, we obtain: s ! ψSF () N max IP |geγ (xk ) − E[geγ (xk )]| > η ≤ CN 1−β . (16) k∈{1,...,N } nφx (h) which completes this proof. Proof of Lemma 4. The proof of this lemma is very similar to the Lemma 2’s proof where we have shown that: h i 1 E[K1 (x) |gγ (X1 ) − gγ (x)|] . |E[geγ (x)] − gγ (x)| ≤ E[K1 (x)] Consequently, a combination of hypotheses (U1) and (U2) gives: ∀x ∈ SF , |E[geγ (x)] − gγ (x)| ≤ C

1 [EK1 (x)1IB(x,h) (X1 )dkγ (X1 , x)] ≤ Chkγ . E[K1 (x)]

This last inequality yields the proof, since C does not depend on x. Proof of Corollary 2. It is easy to remark that: g2 (x) g2 (x) implies that there exists x ∈ SF such that g2 (x) − ge2 (x) ≥ x∈SF 2 2 g2 (x) which implies that sup |g2 (x) − ge2 (x)| ≥ . 2 x∈SF inf |ge2 (x)| ≤

We deduce, from Lemma 3, that:     g2 (x) g2 (x) IP inf |ge2 (x)| ≤ ≤ IP sup |g2 (x) − ge2 (x)| > . x∈SF 2 2 x∈SF 27

Consequently:

∞ X



g2 (x) IP inf |ge2 (x)| < x∈SF 2 n=1

 < ∞.

Proof of Lemma 5. By the stationarity property, we write, for γ = 1, 2: E[e gγ (x)] =

  1 E K1 E[Y1−γ |X1 ] . E[K1 ]

Now, by the same arguments as those used by Ferraty et al. (2007) for the regression operator, we show that:   E K1 E[Yi−γ |Xi ] = gγ (x)E[K1 ] + E [K1 E [gγ (X1 ) − gγ (x)|d(X1 , x)]] = gγ (x)E[K1 ] + E [K1 (Ψγ (d(X1 , x))] Therefore, according to the definition of Ψγ , we have that: E[e gγ (x)] = gγ (x) +

1 E [K1 (Ψγ (d(X1 , x))] . E[K1 ]

Since Ψγ (0) = 0 for γ ∈ {1, 2}, we obtain: E [K1 (Ψγ (d(X1 , x))] = Ψ0γ (0)E [d(X1 , x)K1 ] + o(E [d(X1 , x)K1 ]). By some simple algebra, we get, under the hypothesis (M1), that:   Z 1 0 (uK(u)) χx (u)du + o(hφx (h)), (17) E[K1 d(X1 , x)] = hφx (h) K(1) − 0

and

1

 Z E[K1 ] = φx (h) K(1) −

 K (u)χx (u)du + o(φx (h)). 0

(18)

0

It follows that: Z

 E [e gγ (x)] = gγ (x) +

 K(1) −

hΨ0γ (0)  

1 0



(uK(u)) χx (u)du   + o(h). Z 1  K(1) − K 0 (u)χx (u)du 0

0

28

Proof of Lemma 6. Similarly to the previous lemma’s proof, we have, for γ ∈ {1, 2}: var[e gγ (x)] =

and

n X     1 1 −γ var Ki Yi−γ = . 2 2 var K1 Y1 (nE[K1 ]) i=1 n (E[K1 ])

By conditioning on the random variable X and by using hypotheses (M1) and (M4), we get:   Z 1  2 −2γ  −2γ 2 2 0 = E[Y |X = x] K (1) − (K (s)) χx (u)du + o(1) E K1 Y1 0   E K1 Y1−γ = O(φx (h)). (19) Thus: var



K1 Y1−γ



= E[Y

−2γ



Z

2

|X = x] K (1) −

1

  (K (s)) χx (u)du +O φ2x (h) . 2

0

0

(20) We can then write that: E[Y var[e gγ (x)] =

−2γ

  Z 1 2 0 2 |X = x] K (1) − (K (s)) χx (u)du 0

 Z nφx (h) K(1) −

1

2 0 K (s)χx (s)ds

 +o

1 nφx (h)

 .

0

Concerning the covariance term, we follow the same steps as for the variance one i.e.,  1 −2 −1 cov(e g1 (x), ge2 (x)) = 2 cov K1 Y1 , K1 Y1 n (E[K1 ]) where, cov K1 Y1−2 , K1 Y1−1



      = E K12 Y1−3 − E K1 Y1−2 E K1 Y1−1 .

Since the first term is the leading one in this quantity then:   Z 1 2 0 −3 2  E[Y |X = x] K (1) − (K (s)) χx (s)ds 0 cov(e g1 (x), ge2 (x)) = +o  2 Z 1 K 0 (s)χx (s)ds

nφx (h) K(1) − 0

29

1 nφx (h)

 .

Proof of Lemma 7. Let: Sn =

n X

(Li (x) − E[Li (x)]),

i=1

where, p  nφx (h) Ki g1 (x)Yi−2 − g2 (x)Yi−1 . Li (x) := nE[K1 ]

(21)

Obviously, we have: p Sn nφx (h)σ −1 ((e g2 (x) − E[e g2 (x)]) g1 (x) − (e g1 (x) − E[e g1 (x)]) g2 (x)) = . σ Thus, to achieve this lemma’s proof it suffices to show the asymptotic normality of Sn . This last is reached by applying the Lyapounov central limit Theorem on Li (x) i.e., it suffices to show, for some δ > 0, that: n X

  IE |Li (x) − IE [Li (x)] |2+δ

i=1

var

n X

!!(2+δ)/2

→ 0.

(22)

Li (x)

i=1

Clearly, var

n X

! Li (x)

= nφx (h)var [ge1 (x)g2 (x) + ge2 (x)g1 (x)]

i=1

= nφx (h) var [ge1 (x)] g22 (x) + var [ge2 (x)] g12 (x) +g1 (x)g2 (x)cov(ge1 (x), ge2 (x)))  β2 = nφx (h) g 3 (x) + g2 (x)g1 (x)E[Y −3 |X = x] 2 β1 nφx (h) 2    1 2 −4 +g1 (x)E[Y |X = x] + o . nφx (h) Hence, var

n X

! Li (x)

i=1

30

= σ + o(1)

Therefore, to complete the proof of this Lemma, it is enough to show that the numerator of (22) converges to 0. For this, we use the Cr -inequality (cf. Lo`eve (1963), page 155) to show that: n X

 n n 2+δ  X X   2+δ 0 |IE [Li (x)] |2+δ . IE Li (x) − IE [Li (x)] ≤C IE |Li (x)| +C

i=1

i=1

i=1

(23)  j Recall that, for all j > 0, IE K1 = O(φx (h)), then, because of hypothesis (H5), we have: n X

h 2+δ i   IE |Li (x)|2+δ = n−δ/2 (φx (h))−1−δ/2 IE K12+δ g1 (x)Yi−2 − g2 (x)Yi−1

i=1

h −2(δ+2) |X] ≤ n (φx (h)) IE 21+δ g1 (x)2+δ E[|Yi ii −(δ+2) +21+δ g2 (x)2+δ E[|Yi |X]     ≤ C(nφx (h))−δ/2 IE K12+δ /φx (h) → 0. −δ/2

−1−δ/2

h

K12+δ

Similarly, the second term of (23) is evaluated as follows: n X

2+δ

|IE [Li (x)] |

−δ/2

≤ n

−(2+δ)/2

(φx (h))

  2+δ −2 −1 IE K1 g1 (x)Yi − g2 (x)Yi

i=1

2+δ ≤ Cn−δ/2 (φx (h))−(2+δ)/2 IE [K1 ] ≤ Cn−δ/2 (φx (h))1+δ/2 → 0, which completes the proof. Proof of Lemma 8. For the first limit, we have, by Lemmas 5 and 6’s results, that: E [e g2 (x) − g2 (x)] → 0, and var [e g2 (x)] → 0. Hence, ge2 (x) − g2 (x) → 0, in probability.

31

Next, for the last needed convergence, we obtain by the same manner: " # 1/2 nφx (h) E An (ge2 (x) − E[ge2 (x)]) = 0 g1 (x)2 σ 2 and " var

nφx (h) g1 (x)2 σ 2

1/2

# An (ge2 (x) − E[ge2 (x)]) = O(A2n ) = O(h2 ) → 0.

It follows that:  1/2 nφx (h) An (ge2 (x) − E[ge2 (x)]) → 0, in probability. g1 (x)2 σ 2

[1] Barrientos-Marin, J., Ferraty, F. and Vieu, P. (2010). Locally modelled regression and functional data. J. of Nonparametr. Stat., 22, 617–632. [2] Ba`ıllo, A. and Gran´e, A. (2009). Local linear regression for functional predictor and scalar response. J. of Multivariate Analysis, 100, 102–111. [3] Benhenni, K., Ferraty, F., Rachdi, M. and Vieu, P. (2007). Local smoothing regression with functional data. Comput. Stat., 22 (3), 353369. [4] Bosq, D. (2000). Linear processs in function spaces. Theory and Application. Lectures Notes in Statistics, 149. Springer-Verlag. [5] Caroll, J. and Ruppert, D. (1988). Transformation and Weighting in Regression. CRC Press, Aug 1, 1988 - Mathematics - 264 p. [6] Chatfield, C. 2007. The joys of consulting. Significance, 4, 33–36. [7] Chen, K., Guo, S., Lin, Y. and Ying, Z. (2010). Least absolute relative error estimation. J. Am. Statist. Assoc., 105, 1104–1112. [8] Crambes, C., Delsol , L., Laksaci, A. (2008). Robust nonparametric estimation for functional data. J. Nonparametr. Stat., 20, 573-598.

32

[9] Dabo-Niang, S. and Rhomari, N. (2003). Estimation non param´etrique de la r´egression avec variable explicative dans un espace m´etrique. C. R. Math. Acad. Sci. Paris, 336, 75-80. [10] Delsol, L. (2007). R´egression non-param´etrique fonctionnelle: expressions asymptotiques des moments. Annales ISUP, 51(3), 43–67. [11] Demongeot, J., Laksaci, A., Madani, F. and Rachdi, M. (2011). A fast functional locally modeled conditional density and mode for functional time-series. Recent Advances in Functional Data Analysis and Related Topics, Contributions to Statistics, 2011, Pages 85–90, DOI: 10.1007/978-3-7908-2736-1 13, Physica-Verlag/Springer. [12] Demongeot, J., Laksaci, A., Madani, F. and Rachdi, M. (2013). Functional data: local linear estimation of the conditional density and its application. Statistics, 47(1), 26–44. [13] Demongeot, J., Laksaci, Rachdi, M. and Rahmani, S. (2014). On the local linear modelization of the conditional distribution for functional data. Sankhy A : The Indian Journal of Statistics. 76(2), 328–355. [14] El Methni, M. and Rachdi, M. (2010). Local weighted average estimation of the regression operator for functional data. Commun. in Statist.Theory and Methods, 40, 3141–3153. [15] Ferraty, F. and Vieu, P. (2000). Dimension fractale et estimation de la r´egression dans des espaces vectoriels semi-norm´es. C. R. Math. Acad. Sci. Paris, 330, 403-406. [16] Ferraty, F., Vieu, P. (2006). Nonparametric functional data analysis. Theory and Practice. Springer-Verlag. [17] Ferraty, F., Mas, A. and Vieu, P. (2007). Nonparametric regression on functional data: inference and practical aspects. Aust. N.Z.J. Stat., 49, 267–286. [18] Ferraty, F., Laksaci, A., Tadj, A. and Vieu, P. (2010). Rate of uniform consistency for nonparametric estimates with functional variables. J. Statist. Plann. Inference, 140, 335-352.

33

[19] Ferraty, F. and Romain, Y. (2011). The Oxford Handbook of Functional Data Analysis. Oxford University Press. [20] Khoshgoflaar, T.M., Bhattacharyya, B.B. and Richardson, G.D. (1992). Predicting software errors, during development, using nonlinear regression models: a comparative study. IEEE Trans. Reliab., 41, 390-395. [21] Loeve, M. (1963). Probability theory. 3rd edition. (The University Series in Higher Mathematics.) Princeton, N. J.-Toronto- New York-London: D. Van Nostrand Company, Inc. XVI, 685 p. [22] Masry, E. (2005) Nonparametric regression estimation for dependent functional data: asymptotic normality. Stochastic Process. Appl., 115, 155-177. [23] Narula, S.C. and Wellington, J.F. (1977). Prediction, linear regression and the minimum sum of relative errors. Technometrics, 19, 185–190. [24] Park, H. and Stefanski, L.A. (1998). Relative-error prediction. Stat. Probab. Lett., 40(3), 227–236. [25] Rachdi, M., Laksaci, A., Demongeot, J., Abdel, A. and Madani, F. (2014). Theoretical and practical aspects on the quadratic error in the local linear estimation of the conditional density for functional data. Computational Statistics and Data Analysis, 73, 53-68. [26] Rachdi, M. and Vieu, P. (2007). Rachdi, M.; Vieu, P. Nonparametric regression for functional data: automatic smoothing parameter selection. J. Stat. Plann. Inference, 137(9), 2784–2801. [27] Ramsay, J. and Silverman, B. (2005). Functional Data Analysis. 2nd ed. Springer-Verlag. [28] Shen, V.Y., Yu, T. and Thebaut, S.M. (1985). Identifying error-prone software an empirical study. IEEE Trans. Software Eng., 11, 317–324. [29] Wilcox, R. (2005). Introduction to Robust Estimation and Hypothesis Testing. Academic Press. [30] Yang, Y. and Ye, F. (2013) General relative error criterion and Mestimation. Front. Math. China, 8, 695–715. 34