UNIVERSIDADE DE SANTIAGO DE COMPOSTELA DEPARTAMENTO DE ESTATÍSTICA E INVESTIGACIÓN OPERATIVA

Presmoothed estimation with left truncated and right censored data

M. A. Jácome, M. C. Iglesias-Pérez

Report 07-02

Reports in Statistics and Operations Research

Presmoothed estimation with left truncated and right censored data M.A. J´acome a Facultad b E.U.

a,∗

, M.C. Iglesias-P´erez

b,1

de Ciencias, Universidade da Coru˜ na, A Coru˜ na, 15071, Spain

Ing. Tecn. Forestal, Universidade de Vigo, Pontevedra, 36005, Spain

Abstract We propose a new method to estimate the cumulative hazard function and the corresponding distribution function of survival times under randomly left truncated and right censored observations (LTRC). For the density function, a new kerneltype estimator is also presented obtained by the convolution of a kernel with this new estimator of the distribution function. The new methodology is based on presmoothing ideas, the estimation of the conditional expectation m of the censoring indicator. Asymptotic properties, including an almost sure representation, strong consistency, asymptotic normality and mean integrated squared error expressions, are given. It is shown that the presmoothed modification leads to a gain in terms of asymptotic mean squared error. The practical performance of these estimators and the efficiency with respect to the classical ones are illustrated in a simulation study. Finally, they are used to analyze the lifetime in a real data example. Key words: Almost sure representation, Kernel-type density estimator, LTRC data, Presmoothing, Survival analysis

1

Introduction

When analyzing times of duration (e.g. in medical follow-up or in engineering life testing studies), one may not be able to observe the variable of interest, referred to hereafter as the lifetime. Among the number of problems related to ”loss of information”, left truncation and right censoring are common. Left ∗ Corresponding author. Tel.: +34-981167000, ext. 2119; fax:+34-981167065 Email address: [email protected] (M.A. J´ acome ). 1 Research supported in part by MCyT Grants MTM2005-00429 for the first author and MTM2005-01274 (ERDF support included) for the second one.

truncation arises when a subject is not included in the study because its lifetime origin precedes the starting time of the study and the subject dies before this moment. Right censoring appears when the lifetime is only partially observed because a different event occurs before the end of the lifetime. More specifically, let (Y, T, C) denote a random vector where Y is the lifetime with distribution function (df) F , T is the random left truncation variable with df L and C is the random right censoring time with df G. So, in the random left truncation and right censoring (LTRC) model, one observes (T, Z, δ), if T ≤ Z, where Z = min(Y, C) and δ = 1{Y ≤ C} indicates if the observation is censored (δ = 0) or not (δ = 1). When Z < T nothing is observed (truncation occurs). Moreover, H denotes the distribution function of Z, and aH = inf {y : H (y) > 0} and bH = sup {y : H (y) < 1} the left and right support endpoints respectively. The same notation will be used for the support endpoints of any distribution function. Finally, it is assumed that aL ≤ aH , Y is independent of (T, C) and α = P (T ≤ Z) > 0. Under these assumptions, it can easily be shown that the cumulative hazard function (chf) can be expressed as follows, Λ(y) =

Z

y

aH

dH1∗ (u) , C(u)

(1)

where H1∗ (y) = P (Z ≤ y, δ = 1|T ≤ Z) and C(y) = P (T ≤ y ≤ Z|T ≤ Z). (2) Let (Ti , Zi , δi ), i = 1, 2, ..., n, be an iid sample from (T, Z, δ) which one observes (i.e., Zi ≥ Ti ). Replacing in (1) the functions H1∗ and C with the empirical estimators of them, ∗ H1n (y) =

n 1X 1{Zi ≤ y}δi n i=1

and Cn (y) =

n 1X 1{Ti ≤ y ≤ Zi }, n i=1

(3)

and considering the one-to-one relation between the survival function and the chf Λ: Y (1 − 4Λ(u)) (4) 1 − F (y) = exp(−Λc (y)) u≤y

where Λc is the continuous part of Λ and 4Λ(u) = Λ(u) − Λ(u−), the classical estimators of Λ and F (see Tsai et al (1987)) are: ΛTn JW (y)

X

δi = i:Zi ≤y nCn (Zi )

and

FnT JW (y)

= 1−

Y

i:Zi ≤y

!

δi . (5) 1− nCn (Zi )

Note that the product-limit (PL) estimator FnT JW in (5) reduces to the KaplanMeier (KM) estimator (see Kaplan and Meier (1958)) when there is no truncation, and to the Lynden-Bell (1971) estimator when there is no censoring. For 2

LTRC data, the properties of FnT JW were studied by Tsai et al (1987), Gijbels and Wang (1993), Zhou (1996) and Zhou and Yip (1999), among others. Specifically, the importance of accounting for truncation effects when estimating the distribution function F was emphasized by Tsai et al (1987); they illustrated, with an example, that the KM estimator of F , obtained by ignoring the truncation effects, underestimates F considerably. Gijbels and Wang (1993) decomposed the PL estimator FnT JW as a mean of iid random variables plus a negligible remainder term of the order O (n−1 log n) almost surely, uniformly over compact intervals when aL < aH . Zhou (1996) derived a strong approximation for FnT JW when aL = aH , and Zhou and Yip (1999) obtained the improved order O (n−1 log log n), in the case of aL < aH , and for the critical case aL = aH under the condition (Int1 ) below. These approximations have been used to derive asymptotic properties of FnT JW (and ΛTn JW ) and also applied to density and hazard function estimation. In order to motivate presmoothing ideas, note that the cumulative hazard function given in (1) can also be expressed as follows: Λ(y) =

Z

m(u) ∗ dH (u), C(u)

y

aH

(6)

where H ∗ (y) = P (Z ≤ y, |T ≤ Z)

(7)

is the conditional df of the observed lifetime, and m(y) = P (δ = 1|Z = y, T ≤ Z) is the conditional probability of uncensoring. The importance of m has been pointed out, without truncation, in Stute and Wang (1993) or Dikta (1998) among others. Note that, since m is, in fact, the conditional expectation m(y) = E(δ|Z = y, T ≤ Z), this regression function can be estimated using the observations (Ti , Zi , δi ), i = 1, ..., n. Let us consider the nonparametric estimator of m proposed by Nadaraya (1964) and Watson (1964):

mb (y) =

1 n

n P

Kbn (y − Zi )δi i=1 , n 1 P K (y − Z ) b i n n i=1

(8)

with Kbn (·) = 1\bn K (·\bn ) the rescaled kernel and bandwidth sequence bn ↓ 0. The functions H ∗ and C, appearing in the expression (6), can be estimated P empirically by Hn∗ (y) = n1 ni=1 1{Zi ≤ y} and Cn in (3) respectively. Hence, the appealing intuitive idea that leads to the presmoothed estimators consists of replacing in (6) the function m with the Nadaraya-Watson smoother mb 3

and the other functions with their standard empirical counterparts: ΛPb (y)

!

Y mb (Zi ) mb (Zi ) . and FbP (y) = 1 − 1− = nCn (Zi ) i:Zi ≤y i:Zi ≤y nCn (Zi ) X

(9)

Note that the presmoothed estimators (9) are constructed similarly as (5), just replacing δi with mb (Zi ). This clearly shows that presmoothing is very useful with missing censoring indicators, since if some δi0 s are not available, one just cannot compute the PL estimator FnT JW . Remark 1 When the presmoothing bandwidth bn is close to zero, then mb (Zi ) → δi , and therefore the presmoothed estimators tend to the classical ones in (5): ΛPb (y) → ΛTn JW (y)

and

FbP (y) → FnT JW (y).

Remark 2 When there is no truncation, ΛbP and FbP reduce to the presmoothed estimators studied in Cao et al (2005). The parametric version of this idea can be found in Sun and Zhu (2000). They proposed working with a parametric estimator of m. In that work, the so called semiparametric estimator of F is shown to be at least as efficient as the classical PL estimator FnT JW if the parametric model taken for m is the correct one. Presmoothing has the advantage that there is no need for a parametric candidate of m. Without truncation, presmoothed estimators of the distribution, density and hazard functions have been proved to be more efficient than their classical counterparts for a suitable choice of the bandwidth bn (see Cao and J´acome (2004) or Cao et al (2005) among others). In that context, the presmoothed estimator of F has, unlike the KM estimator, a jump at any of the observations regardless of its status (censored or not), which provides more information on the local behavior of F . Besides, a better mean squared error performance than that of the KM estimator has also been proved. These good features of the presmoothed estimator of the df F remain in presence not only of right censoring but also left truncation. In many applications involving follow-up studies, lifetime is subject to left truncation in addition to the usual right censoring and, as we mentioned before, it is crucial to have available good estimators that take into account the truncation effects. A first aim of the present paper is to extend presmoothing ideas to estimate F and Λ when the observations may be not only right censored but also left truncated, and study the efficiency of these new estimators with respect to the classical ones. As a second topic, in this paper we also study nonparametric presmoothed 4

estimation of the density function with truncated and censored data, so we assume that F is absolutely continuous with density function f and h denotes the density function of Z. Among the nonparametric methods, the most popular in literature is the kernel-type estimator introduced by Parzen (1962) and Rosenblatt (1956). With complete data, it is defined as the convolution of a kernel K (which is usually a density function) with the empirical estimator, Fn , of the distribution function F , fbs (y) =

Z

Ks (y − u) dFn (u) =

n 1X Ks (y − Zi ) , n i=1

(10)

where Ks (·) = s−1 K (·/s) is the rescaled kernel function K according to the bandwidth sequence s ≡ sn ↓ 0. In situations where incomplete data appear, Fn is replaced in (10) by an appropriate estimator of F . For LTRC data, the classical kernel-type density estimator is obtained by the convolution of K with the well-known productlimit estimator (PLE) of F in (5):

fsT JW

(y) = =

Z

Ks (y − u) dFnT JW (u)

n X i=1

h

Ks (y − Zi ) FnT JW (Zi ) − FnT JW Zi−

i

(11)

is a kernel-type density estimator but replacing the classical 1/n weights in (10) by the PLE weights. The presmoothed kernel-type density estimator is defined as the convolution of the kernel K and the presmoothed df estimator given in (9): P fs,b (y) =

Z

Ks (y − u) dFbP (u) .

(12)

As mentioned before, the presmoothed estimator FbP has a jump at any of the observations regardless of its status (censored or not). This fact, which implies that FbP provides more information on the local behavior of F , produces a more efficient estimation of the density function (see Cao and J´acome (2004) for the untruncated case). Another aspect to highlight is that two different smoothing parameters are used in (12): the presmoothing bandwidth bn , to compute the NW estimator of m in (8), and the smoothing parameter sn , involved in the convolution procedure. When the presmoothing bandwidth is close to zero P (bn ' 0), then mb (Zi ) → δi , and therefore the presmoothed estimator fs,b (y) tends to the classical one given in (11), to the kernel density estimator studied by Lo et al (1989) for right censored data, and the one analyzed by Arcones and Gin´e (1995) under left truncation. On the other hand (bn > 0), if there is 5

P no truncation, fs,b (y) reduces to the presmoothed density estimator studied in Cao and J´acome (2004).

The paper is organized as follows: Section 2 contains an iid almost sure representation, consistency asymptotic bias, variance and limit distribution for the presmoothed chf estimator, and Sections 3 and 4 for the distribution and density function estimators, respectively. In Section 5 the asymptotic expansion of the variances shows a gain in second order efficiency towards the classical ones. This results in a better mean squared error performance, which makes clear that presmoothing is beneficial. The optimal bandwidths are analyzed in Sections 6 and 7. A simulation study is carried out in Section 8 to compare the behavior of the presmoothed and classical estimators with small samples, and the presmoothed estimators are applied in a real data example in Section 9. Finally, Section 10 contains the proofs of the main results.

2

The presmoothed estimator of the cumulative hazard function

Before deriving the main results, we need to introduce some conditions and notations. Throughout this paper we shall assume the above mentioned standard conditions in LTRC models: α = P (T ≤ Z) > 0, the random variables Y and (T, C) are independent and aL ≤ aH . Fix any τ < bH . Here are the assumptions on the kernel K, the density functions h∗ = (H ∗ )0 and f and the conditional probability of uncensoring m: (K) The kernel K is a symmetric differentiable density function, of bounded variation with compact support [−1, 1]. (h1 ) H and H ∗ are continuous functions, with h∗ twice continuously differentiable at y ∈ [aH , τ ] . (h2 ) h∗ (y) ≥ > 0 for some > 0 at y ∈ [aH , τ ] . (C1 ) C is twice continuously differentiable at y ∈ [aH , τ ]. (C2 ) C (y) ≥ ε > 0 for some ε > 0 at y ∈ [aH , τ ]. (m1 ) The function m is twice continuously differentiable at y ∈ [aH , τ ] . (m2 ) The function m is three times continuously differentiable at y ∈ [aH , τ ] . (f1 ) The density function f is twice continuously differentiable at y ∈ [aH , τ ]. (f2 ) The density function f is four times continuously differentiable at y ∈ [aH , τ ]. Z dH ∗ (v) (Int1 ) τaH 31 < ∞. C (v) Z dv (Int2 ) τaH 2 < ∞. C (v) h∗ (v) Z dH ∗ (v) < ∞. (Int3 ) τaH C (v) (I) The variables T and Z are independent. 6

Condition (K) is usual when constructing nonparametric kernel-type estimators of the regression function, such as the NW estimator mb . Assumptions (h1 ), (C1 ), (m1 ), (m2 ), (f1 ) and (f2 ) are standard regularity conditions, needed to apply Taylor expansions. The density function h∗ must be bounded away from zero (assumption (h2 )) to control the error rates of the NW estimator. Conditions (Int1 ) − (Int2 ), involving the function C, are required for the a.s. asymptotic representation of the presmoothed distribution estimator. For the assumption (C2 ), the function C must be bounded away from zero to obtain the asymptotic normality result. Note that assumption (C2 ) implies conditions (Int1 ) − (Int3 ). Condition (I) makes easier the derivation of the bias and variance of the presmoothed density estimator. The assumptions on the bandwidths sn and bn are the following: (b1 ) n1−ε bn → ∞ for some ε > 0 and Σbλn < ∞ for some λ > 0. Moreover, bn ln ln n → 0 as n → ∞. −2 2 8 → ∞ as n → ∞. (b2 ) ns−1 n bn → 0 and nsn bn (ln n) The first assumption in (b1 ) establishes the assumptions for Lemma 1 and Theorem B in Mack and Silverman (1982), and the second one is needed for the almost sure iid representation of the presmoothed distribution estimator. Condition (b2 ) is required for the asymptotic normality result. If the bandwidths −α −α −β are assumed to be sn = cs n + o (n ) and bn = cb n + o n−β for some cs , cb , α, β > 0, then an example of constants α and β for bandwidth sequences satisfying conditions (b1 ) and (b2 ) isα < 53 and 81 (α + 1) < β < 12 (1 − α). In particular, if sn = cs n−1/5 + o n−1/5 for some cs > 0, then the presmoothing

bandwidth should be of the form bn = cb n−β + o n−β with

3 20

< β < 52 .

Theorem 1 Under conditions (K), (h1 ), (h2 ), (m1 ), (b1 ), (Int1 ) and (Int2 ), then P ΛPb (y) − Λ (y) = Λb (y) − Λ (y) + Sn (y) P

where Λb (y) = Λ (y) +

1 n

n P

i=1

ηnP (y, Zi , δi , Ti ) with

ηnP (y, Zi , δi , Ti ) = g1P (y, Zi ) − g2P (y, Zi , Ti ) + g3P (y, Zi , δi ) ,

(13)

and m (Zi ) 1 {Zi ≤ y} , C (Zi ) Z y m (v) P g2 (y, Zi , Ti ) = 1 {Ti ≤ v ≤ Zi } dH ∗ (v) , 2 aH C (v) Z y δi − m (v) dv, Kbn (v − Zi ) g3P (y, Zi , δi ) = C (v) aH g1P (y, Zi ) =

7

(14) (15) (16)

ln n nbn

and supaH ≤y≤τ |Sn (y)| = O b2n +

!1/2 2

a.s.

Theorem 2 Under conditions (K), (h1 ), (h2 ), (m1 ), (b1 ), (Int1 ) and (Int3 ), then !1/2 ln n a.s. (17) sup ΛPb (y) − Λ (y) = O b2n + nbn aH ≤y≤τ Therefore, if bn → 0 and n(ln n)−1 bn → ∞ as n → ∞,

sup |ΛPb (y) − Λ(y)| → 0 with probability one.

aH ≤y≤τ

The asymptotic normality of ΛPb − Λ is an easy consequence of the representation in Theorem 1. For that result, we need a closed formula for the asymptotic expressions of the bias and variance of the estimator. Proposition 3 Under conditions (I), (K), (h1 ), (m2 ), (C1 ) and (C2 ), then

P

E Λb (y) − Λ (y) = dK α (y) b2n + o b2n , V ar

P Λb

! 1 Z y dH1∗ (u) 2 − 2bn eK (q1 (y) + q1 (aH )) + O bn , (y) − Λ (y) = n aH C 2 (u)

where dK =

Z

Z

2

u K(u)du and eK = y 1 m00 2

Z

uK(u)

Z

u

K(v)dv du,

−∞

(v) h∗ (v) + m0 (v) h∗0 (v) dv, C (v) aH m (y) (1 − m (y)) h∗ (y) . q1 (y) = C 2 (y) α (y) =

(18) (19) (20)

Next theorem presents the asymptotic normality for n1/2 (ΛPb − Λ). It follows from Theorem 1 and Proposition 3. The proof is similar to that of Corollary 3 in Iglesias-P´erez and Gonz´alez-Manteiga (1999), so it is omitted. Theorem 4 Under the assumptions (I), (K), (h1 ), (h2 ), (m2 ), (b1 ), (b2 ), (C1 ) and (C2 ), then for any y < τ :

d

a) If nb4n → 0, then n1/2 ΛPb (y) − Λ (y) −→ N (0, σΛ2 (y)).

d

b) If nb4n → B 4 , then n1/2 ΛPb (y) − Λ (y) −→ N (bΛ (y) , σΛ2 (y)), where 8

bΛ (y) = B 2 α (y) dK , σΛ2

(y) = γ(y) with

γ(y) =

Z

y

aH

m (v) h∗ (v) dv, C 2 (v)

(21)

and dK and α given in (18) and (19) respectively.

3

The presmoothed estimator of the distribution function

To obtain the asymptotic properties of the presmoothed estimator of the df FbP , we rely on the relation (4) and the results for ΛPb in Section 2. Therefore, the almost sure asymptotic representation of FbP is based on the one of ΛPb given in Theorem 1. Theorem 5 Under the conditions of Theorem 1 then P

FbP (y) − F (y) = F b (y) − F (y) + Rn (y) P

where F b (y) = F (y) + and sup aH ≤y≤τ

1 n

n P

i=1

(1 − F (y))ηnP (y, Zi , δi , Ti ) with ηnP given in (13),

2 |Rn (y)| = O bn +

ln n nbn

!1/2 2

a.s.

The rate of the uniform consistency of the presmoothed estimator of F is given in the following theorem. The proof is analogue to that of Theorem 2. Theorem 6 Under the conditions of Theorem 2 then sup aH ≤y≤τ

P Fb (y) − F (y) = O b2n +

ln n nbn

!1/2

a.s.

Therefore, if bn → 0 and n(ln n)−1 bn → ∞ as n → ∞, sup |FbP (y) − F (y)| → 0 with probability one.

aH ≤y≤τ

The next proposition gives the asymptotic expression of the bias and variance P of the iid representation F b . Using Theorem 5 and Proposition 3, the proof is straightforward. Proposition 7 Under conditions of Proposition 3, then 9

P

E F b (y) − F (y) = dK α (y) (1 − F (y)) b2n + o b2n P 1 V ar F b (y) − F (y) = (1 − F (y))2 n ! Z y dH1∗ (u) 2 × − 2bn eK (q1 (y) + q1 (aH )) + O bn . aH C 2 (u) with dK , eK , α and q1 given in (18)-(20). Note that, when the presmoothing bandwidth bn is close to zero, the expressions of the bias and variance of ΛPb and FbP , given in Propositions 3 and 7 respectively, reduce to the ones of the classical estimators ΛTn JW and FnT JW . The asymptotic normality for n1/2 (FbP − F ) follows from Theorem 5 and the P asymptotic expressions for the bias and variance of F b in Proposition 7. Theorem 8 Under the assumptions of Theorem 4, then for any y < τ :

d

a) If nb4n → 0, then n1/2 FbP (y) − F (y) −→ N (0, σF2 (y)).

d

b) If nb4n → B 4 , then n1/2 FbP (y) − F (y) −→ N (bF (y) , σF2 (y)), where bF (y) = B 2 (1 − F (y)) α (y) dK , σF2 (y) = (1 − F (y))2 γ(y), and dK , α and γ given in (18), (19) and (21) respectively.

4

The presmoothed estimator of the density function

The strong representation of the presmoothed distribution estimator is a key tool in the proof of the following results. Next theorem is an extension of Theorem 1 in J´acome and Cao (2007) to the LTRC model. It shows that, P − f can be expressed as a sum of iid variables. up to a remainder term, fs,b This result is an application of the strong representation of the presmoothed distribution estimator given in Theorem 5. The strategy of the proof is similar to that of J´acome and Cao (2007). Without presmoothing, it was already employed by Lo et al (1989) with censored data and by Gijbels and Wang (1993) under the LTRC model, so the details are omitted. Theorem 9 Assume conditions (K), (h1 ), (h2 ), (m1 ), (b1 ), (Int1 ) and (Int2 ). Then the presmoothed density estimator admits the following representation: P fs,b (y) = f (y) + βnP (y) + σnP (y) + ePn (y)

10

(22)

where βnP

(y) =

is essentially the bias, σnP (y) =

Z

f (y − sn v) K (v) dv − f (y)

(23)

Z n 1 P P ξi,n (y − sn v) K 0 (v) dv nsn i=1

(24)

P with ξi,n (y) = (1 − F (y)) ηbP (y, Zi , δi , Ti ), ηbP given in (13) is the stochastic P component of fs,b , and the remainder term satisfies

sup aH ≤y≤τ

1 P b2 + en (y) = O n

sn

ln n nbn

!1/2 2

a.s.

(25)

It should be observed that the bias part βnP is not random, and the variance part σnP is a sum of iid random variables. This is a very useful result to obtain asymptotic properties of the presmoothed density estimator, since it enables P to handle a sum of iid variables instead of the complicated structure of fs,b . Without presmoothing, this representation reduces to the one by Gijbels and P Wang (1993) and Zhou and Yip (1999). The strong consistency of fs,b follows now from Theorem 9. Theorem 10 Assume conditions (K), (h1 ), (h2 ), (m1 ), (b1 ), (f1 ), (Int1 ) and (Int3 ), then 1 P b2 + fs,b (y) − f (y) = O n

sup

sn

aH ≤y≤τ

ln n nbn

!1/2

a.s.

Consider the following function and constants depending on the kernel K:

AK (L) = cK =

ZZZ

Z

K (u) K (v) K (w) K (u + L (v − w)) dudvdw,

K 2 (v) dv = AK (0)

and the function Q2 (y) =

Z

y

aH

m (v) h∗ (v) dv. C 2 (v)

P

P Recall f s,b (y) = f (y) + βnP (y) + σnP (y) the iid representation of fs,b . Next P

proposition gives an asymptotic expansion of the bias and variance of f s,b . Several cases will be distinguished according to the rate at which the smoothing parameters tend to zero. 11

Proposition 11 Assume conditions (K), (h1 ), (C 1 ), (C2 ), (m2 ), (f2 ), (I). If the bandwidths satisfy sn → 0, bn → 0 and nsn → ∞, nbn → ∞, then the P bias and variance of f s,b admit the following representation: P 1 Bias f s,b = f 00 (y) dK s2n +[(1 − F (y)) α (y)]0 dK b2n +O b4n +O s4n , (26) 2

and, depending on the ratio bn /sn : (a) If

bn → 0 then sn

P

V ar f s,b =

(b) If

bn = Ln → L > 0, then sn

V ar

(c) If

τ1 (y) τ2 (y) bn sn cK − − 2 f 2 (y) (q1 (y) + q1 (aH )) eK + O nsn n n n (27)

P f s,b

τ2 (y) τ1 (y) sn . (28) [(1 − m (y)) AK (L) + m (y) cK ]− +O = nsn n n

bn → ∞, then sn

V ar

P fb

!

1 1 bn 1 . (29) (1 − m(y)) + m(y) − τ2 (y)+O = cK τ1 (y) nbn nsn n n

where τ1 (y) =

1 − F (y) 1 − C(y)

!2

m(y)h∗ (y) and τ2 (t) = f (y) [(1 − F (y)) Q2 (y)]0 .

√ P Next theorem presents the asymptotic normality for nsn (fs,b (y) − f (y)). It follows from Theorem 9 and the asymptotic expressions for the bias and P variance of f s,b in Proposition 11. The proof is similar to that of Corollary 3 in Iglesias-P´erez and Gonz´alez-Manteiga (1999) in the LTRC model, and that of Cao and J´acome (2004) without truncation, so it is omitted. Theorem 12 Assume conditions (K), (h1 ), (h2 ), (C1 ), (C2 ), (m2 ), (f2 ), (b1 ), (b2 ), (I), then √

P nsn (fs,b (y) − f (y)) → N a (y) , σ 2 (y)

12

where

a (y) =

and

0

1 5/2 00 C f (y) dK if ns5n → C 5 and nb5n → 0 2 1 C 5/2 f 00 (y) dK + C 1/2 B 2 [(1 − F (y)) α (y)]0 dK if ns5 → C 5 and nb5 → B 5 n n 2

σ 2 (y) =

5

if ns5n → 0

τ1 (y)cK

if bn /sn → 0

τ1 (y) [(1 − m (y)) AK (L) + m (y) cK ] if bn /sn → L > 0 τ1 (y)m(y)cK if bn /sn → ∞

Beneficial effect of presmoothing

It is easy to show, under a suitable choice of the bandwidth bn , the efficiency of the presmoothed estimators ΛPb and FbP with respect to ΛTn JW and FnT JW . P In fact, the asymptotic expressions of the mean squared error (AMSE) of Λb P and F b show a gain in second order efficiency towards the classical estimators. We just illustrate this for ΛPb , since for FbP the ideas are similar. P

Note that the mean squared error (MSE) of Λb is P

P

P

M SE(Λb ) = Bias2 (Λb ) + V ar(Λb ).

(30) P

From the asymptotic expressions of the bias and variance of Λb given in Proposition 3, we have that P

AM SE(Λb (y)) = d2K α2 (y)b4n +

1 bn γ(y) − 2 eK (q1 (y) + q1 (aH )). n n

(31)

Hence, the asymptotically optimal presmoothing bandwidth is bn,AM SE (y) =

P arg min AM SE(Λb (y)) b>0

=

eK (q1 (y) + q1 (aH )) 2d2K α2 (y)

!1/3

n−1/3 ,

P

and, therefore, the expression of AMSE(Λb ) with bn,AM SE becomes

P AM SEOP T (Λb (y))

3 1 = γ(y) − 4/3 n 2 13

e4K (q1 (y) + q1 (aH ))4 d2K α2 (y)

!1/3

n−4/3 .

On the other hand, for the classical estimator of the cumulative hazard function we have: 1 AM SE(ΛTn JW (y)) = γ(y) + O(n−3/2 ). n This expression comes from the iid representation of ΛTn JW − Λ together with the order of the moments of the remainder term (see, for example, Gijbels and P Wang (1993)). This makes clear the second order efficiency of Λb with respect to ΛTn JW .

6

Optimal bandwidth for the presmoothed distribution and chf estimators

One of the most important aspects involving nonparametric estimation is the choice of the smoothing parameter or bandwidth. For the sake of clearness, let us denote bΛ the bandwidth for the presmoothed estimator of the chf, ΛPb , and bF for FbP . The bandwidth is frequently chosen by minimizing a measure of the distance between the estimator and the true function, such as the mean integrated squared error (MISE): M ISEΛ (bn ) = E

Z

P Λb (y)

2

− Λ(y)

ω(y)dy .

(32)

The MISEΛ is obtained integrating over the entire line the MSE expression in (30), from which one can easily see the tradeoff of bias versus variance. In the definition of MISE a non-negative weight function ω was introduced. The role of ω is to eliminate endpoint effects. Using the expression (31) it is easy to obtain the following asymptotic expression of the mean integrated squared error: AM ISEΛ (bn ) = b4n d2K

Z

0

∞

1Z∞ γ(y)ω(y)dy α (y)ω(y)dy + n 0 2

bn Z ∞ −2 eK (q1 (y) + q1 (aH ))ω(y)dy. n 0

(33)

Hence, the optimal bandwidth in the sense of AMISE is the one minimizing (33), the so-called AMISE bandwidth: bAM ISE,Λ = arg min AM ISEΛ (bn ) = bn >0

eK Q 2d2K A

!1/3

n−1/3 ,

where Q=

Z

0

∞

(q1 (y) + q1 (aH ))ω(y)dy and A =

14

Z

0

∞

α2 (y)ω(y)dy,

and dK , eK given in (18). P

Analogously, from the asymptotic expressions of the bias and variance of F b given in Proposition 2, one can easily infer the expression of the optimal bandwidth for the presmoothed estimator of the distribution function: bAM ISE,F =

eK Q 2d2K A

!1/3

n−1/3 .

Whereas the asymptotic representation of MISE provides considerable insight on the effect of the bandwidth bn in the bias and variance of the estimator, it has the drawback that the minimizer depends on the unknown quantities Q and A. For only censored data, the choice of the bandwidth for the presmoothed chf and df estimators was studied by Cao et al (2005). They proposed a ”plug-in” bandwidth selector, which was obtained by plugging in the expression of the AMISE bandwidth some estimates of the corresponding Q and A. The authors b and A b by replacing in α and q the unknown funcconsidered the estimates Q 1 0 00 tions m, m , m with the Nadaraya-Watson estimator or m, and its first and second derivatives, and h and h0 with the Parzen-Rosenblatt kernel estimator and its first derivative. Hence, the practical implementation of this bandwidth selector required the choice of some pilot bandwidths. The criterion to choose properly the pilot bandwidths was also analyzed by the authors. Following the same ideas, a plug-in bandwidth selector for bAM ISE,Λ and bAM ISE,F can be carried out as in Cao et al (2005). A deeper analysis of this would be a rather complicated task, so we do not pursue this issue further. However, some ideas on the practical implementation of this bandwidth selector are given in Section 8. The simulations results point out that a plug-in bandwidth selector gives promising outcomes.

7

Optimal bandwidths for the presmoothed density estimator

When presmoothing, the choice of the smoothing parameters is crucial, since the efficiency of the presmoothed density estimator with respect to the classical one depends on the limit behavior of the ratio bn /sn (see J´acome and Cao (2007) for details when presmoothing in the untruncated setting). The mean integrated squared error (MISE), a deterministic distance between the true density and some estimate, is probably the measure most commonly 15

used in practice: M ISE (s, b) = E

Z

P f s,b

−f

2

ω =

Z

Bias

2

P f s,b

ω+

Z

P

V ar f s,b ω (34)

Both the upper and lower tails of the distribution of the lifetime Y are affected under the LTRC model. For that reason, a non-negative weighting function, ω, is introduced to eliminate endpoints effects. For that function, the following assumption is required: (ω) ω is compactly supported and C (y) ≥ ε > 0 and h∗ (y) ≥ > 0 for some ε, > 0 and every y in the support of ω. The asymptotic expression of the MISE, say AMISE, that will be given in (36), can be easily obtained just assuming condition (ω) and considering the P asymptotic expansion of the bias and variance of f s,b in (26) and (27)-(29) respectively. Observe that (ω) implies (C2 ) and (h2 ) in Proposition 11. Note that, from (6), the conditional probability m can also be expressed as follows: f (y) C (y) m (y) = . (1 − F (y)) h∗ (y) This makes easier to check that, without presmoothing (bn = 0), the AMISE expression when bn /sn → 0 extends Theorem 2.1 in S´anchez-Sellero (1999) in the LTRC model, Theorem 3.2 in Lo et al (1989) without truncation and Theorem 1 in Cao (1993) for complete data. The bandwidths minimizing (34) are called the MISE bandwidths. Analogously, the bandwidths minimizing the asymptotic expression of the MISE, say AMISE, are the so-called AMISE bandwidths, denoted by sAM ISE and bAM ISE . R

R

To reduce notation, we use f 002 ω = f 00 (v)2 ω (v) dv and so on. The expression of the AMISE bandwidths depends on the limit of the ratio bn /sn . We will only consider the case when bn /sn → L0 ≥ 0, since when bn /sn → ∞ the minimizers of AM ISE are both of order n−1/5 , which contradicts the above mentioned condition. For the sake of readability, define the following functions:

Z

2 1 00 f + L2 [(1 − F ) α]0 ω, 2 Z 1−F 2 ∗ c2 (L) = h m [(1 − m) AK (L) + mcK ] ω, C !1/5 c2 (L) . c0 (L) = 4c1 (L)

c1 (L) = d2K

16

(35)

When bn /sn = Ln → L0 ≥ 0, consider the reparametrization bn = Ln sn . The AMISE function can be written in terms of c1 and c2 as follows: AM ISE (sn , Ln ) = s4n c1 (Ln ) +

1 c2 (Ln ) . nsn

(36)

Minimizing (36) in sn gives an expression of sAM ISE as a function of Ln . Now, replacing that AMISE bandwidth in (36), a new minimization in Ln gives an approximation of L0 . Therefore, the limiting constant L0 can be obtained as follows: h

i1/5

5

n4/5 AM ISE c0 (L) n−1/5 , L . L≥0 L≥0 44/5 (37) This results in the following AMISE bandwidths: L0 = arg min c1 (L) c42 (L)

= arg min

Theorem 13 Assume conditions (K), (h1 ), (C 1 ), (m2 ), (f1 ), (I) and (ω). If the bandwidths satisfy sn → 0, bn → 0 and nsn → ∞, nbn → ∞, then the AMISE bandwidths are (a) If bAM ISE /sAM ISE → L0 > 0, sAM ISE = c0 (L0 ) n−1/5 and bAM ISE = L0 c0 (L0 ) n−1/5 .

(38)

(b) If bAM ISE /sAM ISE → L0 = 0 sAM ISE = c0 (0) n−1/5 and bAM ISE = b0 n−3/5

(39)

with

b0 =

2/5

R

f 002 ω 2 Z 1 − F cK d3K mh∗ ω C

R

eK f 2 (q1 + q1 (aH )) ω . R f 00 [(1 − F ) α]0 ω

The expression of b0 is derived minimizing the AMISE function with an extra term in the integrated variance (27). Remark 3 Consider the constant L0 defined in (37). When L0 > 0, the efficiency of the presmoothed estimator with respect to the classical one fsT JW is of first order, since in such a case:

P

i1/5 44/5 −4/5 h n c1 (L0 ) c42 (L0 ) 5 4/5 h i1/5 4 < n−4/5 c1 (0) c42 (0) = min AM ISE fsT JW . s>0 5

min AM ISE f s,b =

s,b>0

If L0 = 0, we are back in case (b), and the efficiency is just of third order. 17

The expressions (38) and (39) of the AMISE bandwidths suggest a plug-in bandwidth selector for s and b. It is defined replacing the integrals in c1 , c2 and b0 by estimates of them. The parameter L0 , that connects both AMISE bandwidths, can be obtained minimizing the estimation of c1 (L)c42 (L). When L0 = 0, sAM ISE in (39) coincides with the optimal AMISE bandwidth s for the kernel-type density estimator with no presmoothing fsT JW . A plug-in selector for that bandwidth, together withR the optimal expression of the pilot bandwidth needed to give an estimate of f 002 , were given by S´anchez-Sellero et al (1999). Note that, when L0 = 0, the function c2 can be estimated without smoothing, since its empirical estimate is root-n consistent (see Theorem 2.2 in S´anchez-Sellero et al (1999)), that is, Z

1 − FnT JW Cn

!2

∗ ωdH1n

−

Z

1−F C

2

h∗ mω = OP (n−1/2 )

∗ (y) = n−1 with H1∗ (y) = P (Z ≤ y, δ = 1|T ≤ Z) and H1n

n P

i=1

1{Zi ≤ y, δi = 1}.

Following the same ideas, when L0 > 0, several pilot bandwidths are needed to estimate the integrals appearing in c1 and c2 . The choice of those optimal pilot bandwidths can be carried out as in S´anchez-Sellero et al (1999), but we do not pursue this issue further. This bandwidth selector is expected to give quite good results in practice, as the small simulation in the next section suggests.

8

Simulations

For illustration, a small simulation study has been carried out to compare the performance of the new estimators with the classical ones. Since the key idea in presmoothing is the nonparametric estimation of the function m, we have considered two different models, according to the shape of m. The first model is the one in J´acome and Iglesias-P´erez (2007), in which both the lifetime and censoring variables follow Weibull distributions, Y ∈ W eibull (0.4, 4) and C ∈ W eibull (0.3, 3). For the truncation distribution, an exponential with mean 1 has been chosen. This results in a censoring proportion of 29.5% and the conditional probability of uncensoring is −1 m (y) = 1.264 (1.264 + y −1 ) (see Figure 1). The second model is that considered by Uzunogullari and Wang (1992). The 18

2.0

Density f, Model 1 Density f, Model 2 Function m, Model 1 Function m, Model 2

1.5

1.0

0.5

0.0

0

1

2

3

4

5

6

Fig. 1. Density function f (solid lines) and conditional probability m (dashed lines) for Model 1 (thin lines) and Model 2 (thick lines).

density function of interest is given by 1 (y − 1)3 −y− f (y) = (y − 1) + 1 exp − 3 3

2

!

for y ≥ 0,

and both C and T are simulated from exponential distributions with means 4 and 0.1 respectively. Now, the censoring proportion is 14.8%, and the condi −1 2 tional probability of uncensoring is m (y) = (y − 1) + 1 (y − 1)2 + 1.25 , a nearly constant function (see figure 1). In order to avoid boundary effects, the weight function ω discards 5% of the distribution in the upper tail for both models. Since in Model 2 there are high values of the density function f in a neighborhood of zero, in that model the weight function also discards 25% of the distribution in the lower tail. The function ω has been chosen uniform in these supports. A total of m = 500 samples of size n = 50, 100 and n = 200 have been simulated. The gaussian kernel was used for whatever kernel-type estimation required. 8.1 Efficiency of the presmoothed estimators To compare the presmoothed estimators of the distribution and cumulative hazard functions with the classical ones, let us consider the mean integrated squared errors (MISE) for the df estimators FbP and FnT JW as follows: 19

MODEL 2

MODEL 1

1.00 n=50 n=100 n=200

n=50 n=100 n=200

Relative efficiency REF

Relative efficiency REF

1.1

0.95

1.0

0.90

0.9

0.85

0.8 0

1

2 3 Presmoothing bandwidth bn

4

0

1

2 3 Presmoothing bandwidth bn

4

Fig. 2. Relative efficiency, in terms of MISE, of the presmoothed estimator FbP with respect to the PL estimator FnT JW for Models 1-2 and sample sizes n = 50, 100 and 200. MODEL 2

MODEL 1 1.00 n=50 n=100 n=200

0.99

Relative efficiency REL

Relative efficiency REL

1.00

0.98

0.97

n=50 n=100 n=200

0.98

0.96

0.94

0.92

0.90

0.96 0

1

2 3 Presmoothing bandwidth bn

0

4

1

2 3 Presmoothing bandwidth bn

4

Fig. 3. Relative efficiency, in terms of MISE, of the presmoothed estimator ΛPb with respect to the classical estimator Λn for Models 1-2 and sample sizes n = 50, 100 and 200.

M ISEFP (bn )

=E

Z

FbP

−F

2

ω and

M ISEFT JW

=E

Z

FnT JW

−F

2

ω ,

and denote the MISE of the estimators of the cumulative hazard function with M ISEΛP and M ISEΛT JW respectively. Now, the relative efficiency of the presmoothed estimators with respect to the classical ones, depending on the presmoothing bandwidth bn , is defined as follows: REF (bn ) =

M ISEFP (bn ) M ISEFT JW

and REΛ (bn ) =

M ISEΛP (bn ) . M ISEΛT JW

(40)

Values of RE(bn ) lower than 1 mean that the presmoothed estimator constructed with bandwidth bn is more efficient, in terms of MISE, than the classical one. In fact, the lower than 1 it is, the more efficient presmoothing is. 20

MODEL 2

0.9

0,96

1,00

2.5

0,97

MODEL 1

1,50

2.0

Bandwidth b

0.5

0.35 0.40 Bandwidth s

0.5

1,20

1,10

10 1,

0.30

0.45

1,05

0,98 20 1,

0.25

1.0

1,00

0.3

0.1

1.5

1, 1,2 30 0

0.50

0.0 0.10

1,1 0

1,1 0

Bandwidth b

0.7

1,05

0.15

0.20

0.25

Bandwidth s

Fig. 4. Relative efficiency REM ISE in a grid of bandwidths (s, b) for Model 1 (left panel) and Model 2 (right panel).

We have approximated the relative efficiencies in (40) by Montecarlo using a total of m = 500 samples of sizes, n = 50, 100 and 200, in a wide range of possible bandwidths bn . These functions are plotted in Figure 2 for the distribution function and Figure 3 for the cumulative hazard function. Note that, for values of bn close to zero, the relative efficiency tends to 1, since in such a case the presmoothed estimators tend to the classical ones. As it is shown in Figures 2 and 3, the presmoothed estimators are more efficient than their classical counterparts for almost any bandwidth bn when n is small, although this efficiency decreases as n is larger. In any case, there is a wide range of bandwidths bn for which presmoothing yields better estimates regardless the simple size n. In order to asses the importance of the suitable choice of the bandwidths s and b for the presmoothed estimator of the density function, the MISE function was approximated in a grid of values (s, b) by the sample mean of the integrated squared error (ISE) calculated over m = 500 samples of size n = 50. The optimal MISE bandwidth of the density estimator fsT JW was obtained minimizing its corresponding MISE function. Finally, the relative efficiency of the presmoothed density estimator has been considered as follows: REM ISE (s, b) =

P M ISE fs,b

mins>0 M ISE (fsT JW )

.

(41)

Values of REM ISE (see Figure 4) lower than 1 mean that presmoothing with the corresponding bandwidths sn and bn is more efficient than estimating the density function with the classical estimator using its optimal bandwidth sTMJW ISE . It is worth mentioning that, in both models, there is a wide range of bandwidths for which presmoothing gives more efficient estimates. Note that, for Model 2, the choice of the bandwidth bn seems not to be important. The reason is that the function m is almost constant, and therefore any bandwidth bn yields good estimates of m, and hence good presmoothed estimates of f . 21

8.2 A bandwidth selector for the distribution function As pointed out, presmoothed estimators are more efficient than the classical ones for a wide interval of values for bn . Nevertheless, it is essential for the practical use of these estimators to use a proper bandwidth. Since the choice of the pilot bandwidths required for the plug-in selector proposed in Section 5 is out of the scope of this paper, we show the performance of the presmoothed estimators ΛPb and FbP with a slightly simplified plug-in bandwidth. It consists in estimating the unknown functions m, m0 , m00 with a parametric estimator of the regression function m and its first and second derivative, and h∗ and h∗0 with a reference density and its first derivative. This makes possible to get rid of the problem of the pilot bandwidths. If it is assumed that m belongs to a parametric family, then m(y) ≡ m(y, θ0 ), where m(·, ·) is a known continuous function and θ0 = (θ01 , . . . , θ0k )T ∈ Θ is an unknown parameter. For the estimation of the parameter θ0 we have used a maximum likelihood approach. Possible candidates for m can be found in Cox and Snell (1989) and Dikta (1998). We have considered the logistic regression model, since it has become, in many fields, the standard method of analysis when the outcome variable is dichotomous. For h∗ , the reference density was the Gaussian. So, from now on, we will consider the following bandwidth plug-in bandwidth selector for both bΛ and bF : ! b 1/3 eK Q b b= n−1/3 , (42) 2d2K Ab where n 1X 1 b c(Zi )(1 − m c(Zi ))ω(Zi ), Q= (Cn (Zi ) + )−1 m

n i=1

Ab =

Z

0

∞

n

2

b (y)ω(y)dy with α b (y) = α

Z

y 1m c00 2

aH

b ∗ (v) + m b ∗0 (v) c0 (v) h (v) h dv. Cn (v)

Table 1. Mean integrated squared error (MISE) of the PL estimator FnT JW , the semiparametric estimator FnSZ and the presmoothed estimator FbP , for Models 1 and 2, and sample sizes n = 50 and n = 100. The presmoothed estimator has been computed with a plug-in bandwidth selector.

M ISE FnT JW Model 1 n = 50 n = 100 Model 2 n = 50 n = 100

M ISE FnSZ

M ISE FbP

9.49798 × 10−3

8.62079 × 10−3

8.147552 × 10−3

4.74977 × 10−3

4.31943 × 10−3

4.22462 × 10−3

1.13625 × 10−2

1.06319 × 10−2

1.03884 × 10−2

6.33151 × 10−3

6.06153 × 10−3

5.95351 × 10−3

22

Table 2. Mean integrated squared error (MISE) of the classical estimator ΛTn JW , P the semiparametric estimator ΛSZ n and the presmoothed estimator Λb , for Models 1-2, and sample sizes n = 50 and n = 100. The presmoothed estimator has been computed with a plug-in bandwidth selector.

M ISE ΛTn JW Model 1 n = 50

0.59043

n = 100 Model 2 n = 50 n = 100

M ISE ΛSZ n 0.58223

M ISE ΛPb 0.56943

0.53530

0.53007

0.52138

0.24941

0.23605

0.23296

0.13528

0.13004

0.12559

In order to investigate the practical performance of this plug-in bandwidth selector, we approximated the MISE of the presmoothed estimators ΛPb and FbP using 500 samples of size n = 50 and n = 100 drawn from Models 1-2, and the MISE of the classical estimators ΛTn JW and FnT JW . For comparison reasons, the semiparametric estimators proposed by Sun and Zhu (2000) have also been considered. To compute ΛSZ and FnSZ , the logistic regression was n chosen as the parametric estimator of m. MODEL 2

MODEL 1

Product-limit Semiparametric Presmoothed

0.006

Product-limit Semiparametric Presmoothed

0.010

0.008

0.004 0.006

0.002

0.004

0.002

0.000 0.000

0.0

0.5

1.0

1.5

2.0

2.5

3.0

0.0

3.5

0.5

1.0

1.5

2.0

Fig. 5. Mean squared error of the PL estimator FnT JW , the semiparametric FnSZ with a logistic model for m, and the presmoothed estimator FbP with a plug-in bandwidth selector for Models 1-2 and sample size n = 50.

Table 1 shows the MISE of the PL estimator FnT JW , the semiparametric estimator FnSZ and the presmoothed estimator FbP with the plug-in bandwidth selector (42). It should be observed that the presmoothed estimator behaves better than any of the others for any model and sample size. In fact, it is, in terms of MISE, about 11.1-14.2% and 6.0-8.6% more efficient than FnT JW for models 1 and 2 respectively, and 2.2-5.5% and 1.8-2.2% than FnSZ . Note that the amount of improvement depends on the sample size (it is lower when n increases). Table 2 contains similar results for the cumulative hazard function. 23

Finally, Figure 5 presents the mean squared error (MSE) of the three df estimators, approximated on a grid of points by Montecarlo using 500 samples of size n = 50. It can be observed the quite acceptable pointwise performance of FbP with respect to the PL and semiparametric estimators. Specifically, the presmoothed estimator has a lower MSE than FnT JW in the complete interval of computation. The improvement with respect to FnSZ is clear for inner points, while it vanishes in the boundary of the interval. In summary, the good performance of the presmoothed estimators suggests that presmoothing is a competitive method that outperforms the classical estimators in this context. 8.3 A bandwidth selector for the presmoothed density estimator P A plug-in bandwidth selector for fs,b is expected to give quite acceptable results in practice. The classical kernel density estimator fsT JW with the plugin bandwidth selector sbT JW in S´anchez-Sellero et al (1999) has been considered for comparison reasons. P can be obtained, as explained in Section The plug-in bandwidth selector for fs,b 7, replacing the integrals in c1 , c2 and b0 by estimates of them. The function c2 can be estimated empirically, as in S´anchez-Sellero et al (1999). But, for c1 , the functions f 00 and [(1 − F )α]0 have to be approximated, which implies that several pilot bandwidths are required. For f 00 , consider the pilot bandwidth g given in Theorem 2.3 in S´anchez-Sellero et al (1999), since it is the one minimizing the mean squared error of the curvature of the true density (integrated squared second derivative). Given that the choice of a pilot bandwidth for the function [(1 − F )α]0 is out of the scope of this paper, in this simulation study we have estimated 1 − F by means of the presmoothed estimator of the survival function, 1 − FgP1 , with a bandwidth g1 obtained by a cross-validation procedure. With respect to α, we have considered

αn (y) =

Z

y 1m c00 2

aH

b ∗ (v) + m b ∗0 (v) c0 (v) h (v) h dv, Cn (v)

b are parametric estimates of m and h, respectively, and C (y) c and h where m n c be the is the empirical estimator of the function C in (3). In particular, let m b the maximum likelihood polynomial regression estimator of degree two, and h R density estimator under a normal distribution. Finally, the integral f 2 (q1 + q1 (aH ))ω in b0 has been approximated by n b2 c(Zi )) f (Zi )(1 − m 1X δi ω(Zi ) n i=1 Cn (Zi )

with fb the presmoothed density estimator with bandwidths sbT JW and g1 . 24

Table 3. Integrated Squared Error (ISE) of the classical and presmoothed density estimators with the corresponding plug-in bandwidth selector. MODEL 1

MODEL 2

Classical

Presmoothed

Classical

Presmoothed

Mean

0.0184

0.0174

0.0269

0.0262

Percentile 25

0.0084

0.0081

0.0154

0.0148

50

0.0150

0.0139

0.0241

0.0232

75

0.0236

0.0234

0.0341

0.0322

With the aim of exhibiting the good behavior of this bandwidth selector for P P fs,b , we have computed the ISE of the density estimators fs,b and fsT JW for a total of m = 500 samples of size n = 50 with the corresponding plug-in bandwidths. The results can be seen in Table 3. The most remarkable fact is P that, although the above-mentioned plug-in bandwidth selector for fs,b can be improved, the ISE of the presmoothed estimator is lower than the ISE of the classical density estimator for most samples, that is, presmoothing using this plug-in bandwidth selector is more efficient than using the classical density estimator. The above results should suffice to demonstrate the advantage of the presmoothing procedure, and suggest to consider the presmoothed density estimator using a plug-in bandwidth selector as a competitive method when estimating the density function with data subject to left truncation and right censoring.

9

Real data analysis

The new density estimator for LTRC data introduced in this paper has been applied to a real life example where both censoring and truncation are present. The problem, that has been analyzed in several studies (see Andersen et al [1], pag. 14, among others), concerns the population mortality in Fyn county (Denmark) suffering from insulin-dependent diabetes mellitus. The variable of interest, Y , is the lifetime of the patient, in this case, the time since the diagnosis until the patient dies. The data set corresponds to 1499 patients, 783 males and 716 females, reported on July 1st, 1973. It was obtained by recording all insulin prescriptions in the National Health Service files for five months (including the date above), and the survival status of each patient was assessed by January 1st, 1982. For this reason, we can only observe data of patients alive at the start of the study, and 25

Classical, Male Presmoothed, Male Classical, Female Presmoothed, Female

0.025

0.020

0.015

0.010

0.005

0.000

0

10

20

30

40

50

60

Survival time (in years)

Fig. 6. Classical (dashed lines) and presmoothed (solid lines) estimates of the density function of the survival time (in years) for male (thick lines) and female (thin lines) patients.

the information regarding the patients not alive or lost before that time is not available. This gives place to left truncation, which produces, if it is not taken into account, a biased estimation of the density function. On the other hand, the lifetime may be subject to right censoring, mainly because at the end of the study there may be patients still alive. Hence, with the above-mentioned notation, the censoring variable C, is the time from the diagnosis to the end of follow-up, Z is the the observed survival time and δ indicates the survival status. The global censoring proportion is 67.24%, 3 patients because of lost of follow-up, and 1005 because they were alive at the end of the study. The truncation variable is T , the time since de diagnosis of the diabetes to the entry of the study. Using a Gaussian kernel, we have computed the presmoothed density estimator, presented in Section 4, for the lifetime of male and female patients independently. The reason is that gender has a strong influence on the survival time (males have a higher mortality than females, see Iglesias-P´erez (2003)). Both density estimates are shown in Figure 6. The solid line takes P the presmoothed density estimator fs,b with the plug-in bandwidth selector introduced in Section 7, and the dashed line the classical estimator, fsT JW , with the plug-in bandwidth selector proposed by S´anchez-Sellero et al (1999). 26

Both estimates of the density function, regardless the gender, are similar. The reason is that the presmoothing bandwidth is quite small (bbPM ale = 0.6212 and b bPF emale = 0.1614), due to, in part, the large sample size. When the bandwidth P reduces to the classical bn is close to zero, the presmoothing estimator fs,b T JW . estimator fs Nevertheless, the effect of presmoothing is clear in a deeper insight, mainly in the right tail, where the presmoothed estimator takes larger values. The product-limit estimator FnT JW , on which fsT JW is based, does not take value 1 at the right tail if the last observation is censored. However, the presmoothed estimator FbP , is also a step function, but with jumps located at any of the observations, not only at the censored ones. This fact makes the presmoothed estimator FbP to take values closer to 1 at the right tail, which has important P implications when estimating the density function. This is the reason why fs,b describes better the tail behavior.

10

Appendix. Proof of the main results

Proof of Theorem 1 The term ΛPb (y) − Λ (y) can be decomposed as follows: ΛPb (y) − Λ (y) = P1 (y) − P2 (y) + P3 (y) + R1 (y) + R2 (y) + R3 (y), (43) where the summands that represent the main part of the iid representation are Z

m (v) d (Hn∗ (v) − H ∗ (v)) , Z y mb (v) − m (v) ∗ C (v) P (y) = dHn (v) . Z 3 C (v) aH ∗ y m (v) (Cn (v) − C (v)) dH (v) , P2 (y) = aH C 2 (v) P1 (y) =

y aH

The other terms will be proved to be negligible: Z

m (v) (C (v) − Cn (v)) d (Hn∗ − H ∗ ) (v) , 2 C (v) aH ! Z y 1 1 dHn∗ (v) , R2 (y) = − m (v) (Cn (v) − C (v)) C 2 (v) Cn (v) C (v) aH ! Z y dHn∗ (v) dHn∗ (v) − (mb (v) − m (v)) . R3 (y) = Cn (v) C (v) aH R1 (y) =

y

27

For R1 , U-statistics theory gives, under condition (Int1 ),

(ln n)1/2+ε sup |R1 (y)| = O a.s. for some ε > 0. n aH ≤y≤τ

(44)

The function C in (2) can be decomposed as a difference of two df, hence: sup aH ≤y≤bH

|Cn (y) − C(y)| = O(n−1/2 (ln ln n)1/2 ) a.s.

(45)

Applying (45), Lemma A.1 in Zhou and Yip (1999) and SLLN, under condition (Int1 ) the term R2 is ln ln n ln n sup |R2 (y)| = O n aH ≤y≤τ

!

a.s.

(46)

With respect to R3 , applying Theorem B in Mack and Silverman (1982), Lemma A.1 in Zhou and Yip (1999) and (45), we have

ln n sup |R3 (y)| = O b2n + n aH ≤y≤τ

!1/2

ln ln n n

!1/2

ln n a.s.

(47)

For the functions P1 , P2 and P3 , it is easy to show that

P1 (y) − P2 (y) =

n 1X (g P (y, Zi ) − g2P (y, Zi , Ti )), n i=1 1

n 1X g3P (y, Zi , δi ) + R4 (y), P3 (y) = n i=1

with g1P , g2P and g3P given in (14)-(16), and

R4 (y) =

Z

y

aH

−

Z

n 1X δj − m (v) d (Hn∗ − H ∗ ) (v) Kbn (v − Zj ) n j=1 C (v) h∗ (v) y

aH

(mb (v) − m (v)) (h∗n (v) − h (v)) ∗ dHn (v) = R41 (y) + R42 (y). C (v) h∗ (v)

For the second term in R4 , under conditions (K), (h1 ), (h2 ), (m1 ), (b1 ) and (Int2 ), applying Lemma 1 and Theorem B in Mack and Silverman (1982) and SLLN, we have

2 sup |R42 (y)| = O bn +

aH ≤y≤τ

28

!1/2 2 ln n a.s.

n

(48)

In order to handle the first term in R4 , we use U-statistics theory and similar arguments as those in the proof of Theorem 3.4 in J´acome and Cao (2007). Hence, it can be shown that

ln n (ln ln n)1/2 b2 + O sup |R41 (y)| = O n n n3/2 bn aH ≤y≤τ

!1/2 +O

ln ln n 1/2

nbn

!

a.s. (49)

Collecting (44) and (46)-(49), under condition (b1 ) it follows that

2 sup |R1 (y) + R2 (y) + R3 (y) + R4 (y)| = O bn +

aH ≤y≤τ

ln n nbn

!1/2 2

a.s.

This concludes the proof.

Proof of Theorem 2 Consider the decomposition (43). Condition (Int1 ) and the law of iterated logarithm give

ln ln n sup |P1 + P2 | = O n aH ≤y≤τ

!1/2 a.s.

For P3 , we apply Theorem B in Mack and Silverman (1982) and, under condition (Int3 ), the SLLN:

sup |P3 | = O b2n +

aH ≤y≤τ

!1/2 ln n a.s.

nbn

Both rates, together with (44), (46) and (47), prove the theorem.

Pro of of Prop osition 3 The expression of the bias comes from

h

i

h

i

E g1P (y, Z) = E g2P (y, Z, T ) = h

i

Z y m (v) aH

C (v)

dH ∗ (v) ,

E g3P (y, Z, δ) = α (y) dK b2n + o b2n . For the variance, straightforward calculations give 29

(50) (51)

V ar

V ar

h

g1P

h

g3P

i

(y, Z) = i

Z y m2 (u) aH

(y, Z, δ) =

E[g22 (y, Z, T )]

C 2 (u)

Z y

aH

∗

dH (u) −

Z y m (v) aH

C (v)

∗

!2

dH (v)

,

q1 (v)dv − 2bn eK (q1 (y) + q1 (aH )) + O b2n ,

= 2E(g1 (y, Z)g2 (y, Z, T )), Cov(g1 (y, Z)g3 (y, Z, δ)) = O(b2n ), Cov(g2 (y, Z, T )g3 (y, Z, δ)) = O(b2n ). This completes the proof.

The iid representation of FbP −F , given in Theorem 5, is based on that of ΛPb −Λ through a exponential transformation. For that purpose, a slight change in FbP must be carried out. The next lemma proves that this change is negligible. Lemma Consider 1−

Under condition

Z

τ aH

FebP

(y) =

Y

i:Zi ≤y

!

mb (Zi ) . 1− nCn (Zi ) + 1

dH1∗ (v) < ∞, then C 2 (v)

sup FbP (y) − FebP (y) = O n−1 (ln n)2 .

aH ≤y≤τ

(52)

Proof. The absolute value can be bounded as follows: X P Fb (y) − FebP (y) =

X mb (Zi ) mb (Zi ) ≤ 2 2 i:Zi ≤y n Cn (Zi ) i:Zi ≤y (nCn (Zi ) + 1) nCn (Zi )

1 ≤ n

! C (Z ) 2 Z y i max aH ≤Zi ≤y Cn (Zi ) aH

mb (v) ∗ dH (v) . C 2 (v) n

The SLLN and Lemma A.1 in Zhou and Yip (1999) prove the lemma.

Pro of of Theorem 4 Consider the following decomposition: 2 1 P Λb (y) − Λ (y) e−Λn1 (y) h i 2 P P e − ln 1 − Fb (y) + Λb (y) e−Λn2 (y) , (53)

FebP (y) − F (y) = (1 − F (y)) ΛPb (y) − Λ (y) −

with

30

min ΛPb (y) , Λ (y) ≤ Λn1 (y) ≤ max ΛPb (y) , Λ (y) ,

min ΛPb (y) , − ln 1 − FbP (y)

≤ Λn2 (y) ≤ max ΛPb (y) , − ln 1 − FbP (y)

The first term in the right hand side of (53) yields the dominant part of the iid decomposition of FbP (y) − F (y). Taking into account (52), it suffices to show that the two last terms in (53) are negligible. The terms exp [−Λn1 (y)] and exp [−Λn2 (y)] are bounded. So, using (17), we have for the second term in (53): sup aH ≤y≤τ

1 ΛPb

2

2

(y) − Λ (y)

exp [−Λn1 (y)] = O (nbn )−1 (ln n)3 a.s. (54)

Applying a Taylor expansion, the third term in (53) is

2 Z y C Z (i) sup n i:Z(i) ≤y Cn Z(i) aH

1 ln 1 − FebP (y) + ΛP (y) ≤ b

mb (v) dHn (v) . C 2 (v)

For Lemma A.1 in Zhou and Yip (1999) and the SLLN, we have:

sup ln 1 − FebP (y) + ΛPb (y) = O n−1 (ln n)2 .

aH ≤y≤τ

(55)

The proof is finished collecting (54) and (55).

Pro of of Theorem 10 Using integration by parts, we have

P fs,b

Z Z y−v 1 y−v 1 dF (v) K d FbP − F (v) + K (y) = sn sn sn sn Z i 1 h P Fb (y − sn v) − F (y − sn v) K 0 (v) dv + f (y) + βnP (y) , = sn

and therefore 1 P sup FbP (y) − F (y) sup fs,b (y) − f (y) = sn aH ≤y≤τ aH ≤y≤τ + sup βbP (y) .

Z

|K 0 (v)| dv

aH ≤y≤τ

The rate follows from supaH ≤y≤τ βnP (y) = O (b2n ) and Theorem 6 in J´acome and Iglesias-P´erez (2007), that states sup aH ≤y≤τ

P Fb (y) − F (y) = b2n +

31

ln n nbn

!1/2

a.s.

.

P

Pro of of Prop osition 11 Consider f s,b (y) = f (y)+βnP (y)+σnP (y), with βnP P and σnP given in (23) and (24), the iid representation of fs,b . The derivation of the bias is straightforward, taking into account (50) and (51). For the variance, we proceed as follows:

P

V ar f s,b (y) n n n 1 P 1 P 1 P ω3 (y, Zi , δi ) ω2 (y, Zi , Ti ) − ω1 (y, Zi ) + = V ar − n i=1 n i=1 n i=1 n n n 1 P 1 P 1 P ρ3 (y, Zi , δi ) ρ2 (y, Zi , Ti ) + ρ1 (y, Zi ) − + n i=1 n i=1 n i=1

(56)

with, for j = 1, 2, 3

ωj (y, ·) = ρj (y, ·) =

Z

Z

Ks (y − v) f (v) gj (v, ·) dv, Ks (y − v) (1 − F (v)) dgj (v, ·) . P

For the asymptotic expression of the variance of f s,b , we have used the following lemmas.

Lemma With the LTCR notation,

E [1 {T ≤ x ≤ Z} 1 {T ≤ y ≤ Z} |T ≤ Z] = α−1 1 − H (x ∨ y)−

L (x ∧ y) .

where x ∨ y = max(x, y) and x ∧ y = min(x, y). Lemma With the LTCR notation, under assumption (I ), #

"

m (Z) 1 {T ≤ x ≤ Z} 1 {Z ≤ y} |T ≤ Z = α−1 L (x) 1 {x ≤ y} E C (Z)

Z y m (u) x

C (u)

h (u) du.

Proof. First, we derive the following relation:

P [T ≤ t, Z ≤ z|T ≤ Z] = α−1 P [T ≤ t, Z ≤ z, T ≤ Z]

(57)

= α−1

(58)

Z tZ z 0

0

1 {t´≤ z´} dFT,Z (t´, z´) ,

where FT,Z (t, z) denotes the df of (T, Z) . So, the expectation is 32

"

m (Z) 1 {T ≤ x ≤ Z} 1 {Z ≤ y} |T ≤ Z E C (Z) # " m (Z) −1 1 {T ≤ x} 1 {x ≤ Z ≤ y} =α E C (Z)

#

and, under assumption (I), we obtain that #

"

m (Z) 1 {T ≤ x} 1 {x ≤ Z ≤ y} E C (Z) Z y m (u) h (u) du. = L (x) 1 {x ≤ y} x C (u)

Lemma With the LTCR notation, under assumption (I ) #

"

m (Z) 1 {T ≤ x ≤ Z} |T ≤ Z E Ks (y − Z) (1 − F (Z)) C (Z) Z m (u) h (u) 1 {x ≤ u} du. = α−1 L (x) Ks (y − u) (1 − F (u)) C (u)

Proof. It is inmediate by applying (58) and assumption (I) .

Lemma With the LTCR notation, under assumption (I )

E [Kbn (y − Z) (δ − m (y)) 1 {T ≤ x ≤ Z}} | T ≤ Z] −1

0

= α bn L (x) m (y) h (y)

Z

1 {w ≤ (y − x) /bn } ωK (ω) dω + O b2n .

Proof. Using the function

P [δ = 1, T ≤ Z | Z = z, T = t] P [T ≤ Z | Z = z, T = t] = 1 {t ≤ z} P [δ = 1 | Z = z, T = t]

m(z, t) = E [δ | T ≤ Z, Z = z, T = t] = (I)

= 1 {t ≤ z} P [δ = 1 | Z = z]

and 33

m(z) = P [δ = 1 | T ≤ Z, Z = z] = (I)

=

L(z)P [δ = 1 | Z = z] L(z)

P [δ = 1, T ≤ Z | Z = z] P [T ≤ Z | Z = z]

we obtain, under assumption (I), that m(z, t) = 1 {t ≤ z} m(z). This latter expression, together with (58), assumption (I) and a Taylor expansion, let us to write E [Kbn (y − Z) (δ − m (y)) 1 {T ≤ x ≤ Z}} | T ≤ Z] = E [Kbn (y − Z) (m (Z, T ) − m (y)) 1 {T ≤ x ≤ Z}} | T ≤ Z] = α−1

= α−1

Z

Z

Kbn (y − u) Kbn (y − u)

= −α−1 L (x) −1

Z

Z

Z

(m (u, t) − m (y)) 1 {t ≤ x ≤ u} dL (t) h (u) du [m (u) 1 {t ≤ u} − m (y)] 1 {t ≤ x ≤ u} dL (t) h (u) du

K (ω) (m (y − bn ω) − m (y)) h (y − bn ω) 1 {x ≤ y − bn ω} dω 0

= bn α L (x) m (y) h (y)

Z

ωK (ω) 1 {ω ≤ (y − x) /bn } dω + O b2n ,

which concludes the proof. As a consequence of the previous lemmas, together with some changes of variable, Taylor expansions and applications of assumptions such as the symmetry of the kernel K, standard although tedious and long-winded calculations yield the asymptotic expressions of the variances and covariances of the functions ωj and ρj for j = 1, 2, 3: Z

Z

2 y m m2 ∗ 2 ∗ h − f (y) V ar (ω1 (y, Z)) = f (y) h + O (sn ) , aH C aH C 2 ! Z " 2 # Z y y m mh∗ Z v mh∗ 2 −1 ∗ L − V ar (ω2 (y, Z, T )) = f (y) 2α h + O (sn ) , aH C 2 aH C aH C 2

2

V ar (ω3 (y, Z, δ)) = f (y)

Z

y

y

aH

q1 − 2bn f 2 (y) (q1 (y) + q1 (aH )) eK

m2 (y) ∗ 1 h (y) cK V ar (ρ1 (y, Z)) = (1 − F (y))2 2 sn C (y) m2 (y) ∗ 2 h (y) + O (sn ) , − (1 − F (y))2 2 C (y) !2

1 − F (y) ∗ h (y) m (y) V ar (ρ2 (y, Z, T )) = C (y) 34

!

L (y) (1 − H (y)) − 1 + O (sn ) , C 2 (y)

1 (1 − F (y))2 q1 (y) AK (L) + O (sn ) + o (bn ) s n if bn /sn → L ≥ 0

V ar (ρ3 (y, Z, δ)) = 1 (1 − F (y))2 q1 (y) cK + O (bn ) bn if bn /sn → ∞

For the covariances, we have

2

Cov (ω1 (y, Z) , ω2 (y, Z, T )) = f (y) 2

Z

y

aH

−f (y)

! m Z v mh∗ h L C aH C 2

Z

y

aH

mh∗ dv C

!2

Cov (ω1 (y, Z) , ρ1 (y, Z)) = f (y) (1 − F (y)) h∗ (y)

+ O (sn ) , m (y) C (y)

!

Z y m ∗ m (y) − h + O (sn ) , × 2C (y) aH C m (y) ∗ Cov (ω1 (y, Z) , ρ2 (y, Z, T )) = −f (y) (1 − F (y)) h (y) C (y) Z y mh∗ + O (sn ) , × aH C m (y) ∗ h (y) Cov (ω2 (y, Z, T ) , ρ1 (y, Z)) = f (y) (1 − F (y)) C (y) ! Z y Z y mh∗ mh∗ −1 × α + O (sn ) , L− aH C 2 aH C m (y) h∗ (y) Cov (ω2 (y, Z, T ) ρ2 (y, Z, T )) = f (y) (1 − F (y)) C (y) ! Z y Z y ∗ ∗ 1 − H (y) mh m L− × α−1 h∗ + O (sn ) , C (y) aH C 2 aH C 1 f (y) (1 − F (y)) q1 (y) + O (sn ) + o (bn ) 2

if bn /sn → L ≥ 0

Cov (ω3 (y, Z, δ) , ρ3 (y, Z, δ)) = and Cov (ρ1 (y, Z) , ρ2 (y, Z, T )) =

1 f (y) (1 − F (y)) q1 (y) + O (bn ) 2 if bn /sn → ∞ !2

1 − F (y) m (y) h∗ (y) C (y) 35

!

1 L (y) − 1 +O (sn ) . 2 C (y)

The remaining covariances involved in (56) are of order o(sn ) + o(bn ). Finally, P with standard calculations the variance V ar f s,b (y) is obtained. This concludes the proof.

11

Acknowledgement

We are very grateful to professor Per Kragh Andersen (department of Biostatistics, University of Copenhagen) for providing the Fyn diabetes data, which were collected by Dr. Anders Green. We also acknowledge the economic support of the Grant MTM2005-00429 (FEDER funding included) of the Spanish Ministerio de Educaci´on y Ciencia and XUGA Grant PGIDT03PXIC10505PN for the first author, and Grant MTM2005-01274 (FEDER funding included) of the Spanish Ministerio de Educaci´on y Ciencia for the second one.

References [1] P.K. Andersen, O. Borgan, R.D. Gill, N. Keiding, Statistical Models Based on Counting Process. Springer-Verlag, New York, 1993. [2] M.A. Arcones, E. Gin´e, On the law of the iterated logarithm for canonical U -statistics and processes, Stochastic Process. Appl. 58, (1995) 217-245. [3] R. Cao, Bootstrapping the mean integrated squared error, J. Multivar. Anal. 45, (1993) 137-160. [4] R. Cao, M.A. J´ acome, Presmoothed kernel density estimator for censored data, J. Nonparametr. Stat. 16, (2004) 289–309. [5] R. Cao, I. L´ opez-de-Ullibarri, J. Janssen, N. Veraverbeke, Presmoothed KaplanMeier and Nelson-Aalen estimators, J. Nonparametr. Stat. 17, (2005) 31–56. [6] D.R. Cox and E.L. Snell. Analysis of binary data. 2nd Edition, Chapman and Hall, London, 1989. [7] G. Dikta, On semiparametric random censorship models. J. Statist. Plann. Inference, 66, (1998) 253-279. [8] I. Gijbels, J.L. Wang, Strong representations of the survival function for truncated and censored data with applications, J. Multivariate Anal. 47, (1993) 210–229. [9] M.C. Iglesias-P´erez, W. Gonz´ alez-Manteiga, Strong representation of a generalized product-limit estimator for truncated and censored data with some applications. J. Nonparametr. Stat. 10, (1999) 213–244.

36

[10] M.C. Iglesias-P´erez, Estimaci´ on de la funci´ on de distribuci´ on condicional en presencia de censura y truncamiento: Una aplicaci´ on al estudio de la mortalidad en pacientes diab´eticos. Estad´ıstica Espa˜ nola, 45, (2003) 275-301. [11a] M.A. J´ acome, R. Cao, Almost sure asymptotic representation for the presmoothed distribution and density estimators for censored data. To appear in Statistics (2007). [11b] M.A. J´ acome, R. Cao, Bandwidth selection for the presmoothed density estimator with censored data. Submitted, available at http://www.udc.es /dep/mate/Dpto Matematicas/Investigacion/ie publicacion/Jacome Cao BSPDE.pdf (2007). [12] M.A. J´ acome, M.C. Iglesias-P´erez, Presmoothed estimation with left truncated and right censored data. Submitted, available at http://www.udc.es /dep/mate/Dpto Matematicas/Investigacion/ie publicacion/PresmLTRC.pdf (2007). [13] E.L. Kaplan, P. Meier, Nonparametric estimation from incomplete observations. J. Amer. Statist. Assoc. 53, (1958) 457-481. [14] S.H. Lo, Y.P. Mack, J.L. Wang, Density and hazard rate estimation for censored data via strong representation of the Kaplan-Meier estimator, Probab. Theory Related Fields, 80, (1989) 461-473. [15] D. Lynden-Bell, A method of allowing for known observational selection in small samples applied to 3CR quasars. Monthly Notices Roy. Astronom. Soc., 155, (1971) 95-188. [16] Y.P. Mack, B.M. Silverman, Weak and strong uniform consistency of kernel regression estimates. Z. Wahrsch. Verw. Gebiete, 61, (1982) 405-415. [17] E.A. Nadaraya, On estimating regression. Theory Probab. Appl. 10, (1964) 186190. [18] E. Parzen, On estimation of a probability density function and mode, Ann. Math. Statist. 33, (1962) 1065–1076. [19] M. Rosenblatt, Remarks on some nonparametric estimates of a density function, Ann. Math. Statist. 27, (1956) 832-837. [20] C. S´ anchez-Sellero, W. Gonz´ alez-Manteiga, R. Cao, Bandwidth selection in density estimation with truncated and censored data, Ann. Inst. Statist. Math. 51, (1999) 51-70. [21] W. Stute and J.L. Wang, A strong law under random censhorship, Ann. Statist. 21, (1993) 1591–1607. [22] L. Sun, L. Zhu, A semiparametric model for truncated and censored data, Statist. Probab. Lett. 48, (2000) 217–227. [23] W.Y. Tsai, N.P. Jewely, M.C. Wang, A note on the product limit estimator under right censoring and left truncation, Biometrika 17, (1987) 31–56.

37

[24] U. Uzunogullari, J.L. Wang, A comparison of hazard rate estimators for left truncated and right censored data, Biometrika, 79, (1992) 297-310. [25] G.S. Watson, Smooth regression analysis, Shankya Series A, 26, (1964) 359372. [26] Y. Zhou, A note on the TJW product-limit estimator for truncated and censored data. Statist. Probab. Lett., 69, (1996) 261-280. [27] Y. Zhou, P. Yip, A strong representation of the product-limit estimator for left truncated and right censored data, J. Multivariate Anal. 69, (1999) 261–280.

38

Reports in Statistics and Operations Research 2004 04-01 Goodness of fit test for linear regression models with missing response data. González Manteiga, W., Pérez González, A. Canadian Journal of Statistics (to appear). 04-02 Boosting for Real and Functional Samples. An Application to an Environmental Problem. B. M. Fernández de Castro and W. González Manteiga. 04-03 Nonparametric classification of time series: Application to the bank share prices in Spanish stock market. Juan M. Vilar, José A. Vilar and Sonia Pértega. 04-04 Boosting and Neural Networks for Prediction of Heteroskedatic Time Series. J. M. Matías, M. Febrero, W. González Manteiga and J. C. Reboredo. 04-05 Partially Linear Regression Models with Farima-Garch Errors. An Application to the Forward Exchange Market. G. Aneiros Pérez, W. González Manteiga and J. C. Reboredo Nogueira. 04-06 A Flexible Method to Measure Synchrony in Neuronal Firing. C. Faes, H. Geys, G. Molenberghs, M. Aerts, C. Cadarso-Suárez, C. Acuña and M. Cano. 04-07 Testing for factor-by-curve interactions in generalized additive models: an application to neuronal activity in the prefrontal cortex during a discrimination task. J. Roca-Pardiñas, C. Cadarso-Suárez, V. Nacher and C. Acuña. 04-08 Bootstrap Estimation of the Mean Squared Error of an EBLUP in Mixed Linear Models for Small Areas. W. González Manteiga, M. J. Lombardía, I. Molina, D. Morales and L. Santamaría. 04-09 Set estimation under convexity type assumptions. A. Rodríguez Casal.

2005 05-01 SiZer Map for Evaluating a Bootstrap Local Bandwidth Selector in Nonparametric Additive Models. M. D. Martínez-Miranda, R. Raya-Miranda, W. González-Manteiga and A. González-Carmona. 05-02 The Role of Commitment in Repeated Games. I. García Jurado, Julio González Díaz. 05-03 Project Games. A. Estévez Fernández, P. Borm, H. Hamers 05-04 Semiparametric Inference in Generalized Mixed Effects Models. M. J. Lombardía, S. Sperlich

2006 06-01 A unifying model for contests: effort-prize games. J. González Díaz 06-02 The Harsanyi paradox and the "right to talk" in bargaining among coalitions. J. J. Vidal Puga 06-03 A functional analysis of NOx levels: location and scale estimation and outlier detection. M. Febrero, P. Galeano, W. González-Manteiga 06-04 Comparing spatial dependence structures. R. M. Crujeiras, R. Fernández-Casal, W. González-Manteiga 06-05 On the spectral simulation of spatial dependence structures. R. M. Crujeiras, R. Fernández-Casal 06-06 An L2-test for comparing spatial spectral densities. R. M. Crujeiras, R. Fernández-Casal, W. González-Manteiga. 2007 07-01 Goodness-of-fit tests for the spatial spectral density. R. M. Crujeiras, R. Fernández-Casal, W. González-Manteiga. 07-02 Presmothed estimation with left truncated and right censores data. M. A. Jácome, M. C. Iglesias-Pérez

Previous issues (2001 – 2003): http://eio.usc.es/pub/reports.html

Presmoothed estimation with left truncated and right censored data

M. A. Jácome, M. C. Iglesias-Pérez

Report 07-02

Reports in Statistics and Operations Research

Presmoothed estimation with left truncated and right censored data M.A. J´acome a Facultad b E.U.

a,∗

, M.C. Iglesias-P´erez

b,1

de Ciencias, Universidade da Coru˜ na, A Coru˜ na, 15071, Spain

Ing. Tecn. Forestal, Universidade de Vigo, Pontevedra, 36005, Spain

Abstract We propose a new method to estimate the cumulative hazard function and the corresponding distribution function of survival times under randomly left truncated and right censored observations (LTRC). For the density function, a new kerneltype estimator is also presented obtained by the convolution of a kernel with this new estimator of the distribution function. The new methodology is based on presmoothing ideas, the estimation of the conditional expectation m of the censoring indicator. Asymptotic properties, including an almost sure representation, strong consistency, asymptotic normality and mean integrated squared error expressions, are given. It is shown that the presmoothed modification leads to a gain in terms of asymptotic mean squared error. The practical performance of these estimators and the efficiency with respect to the classical ones are illustrated in a simulation study. Finally, they are used to analyze the lifetime in a real data example. Key words: Almost sure representation, Kernel-type density estimator, LTRC data, Presmoothing, Survival analysis

1

Introduction

When analyzing times of duration (e.g. in medical follow-up or in engineering life testing studies), one may not be able to observe the variable of interest, referred to hereafter as the lifetime. Among the number of problems related to ”loss of information”, left truncation and right censoring are common. Left ∗ Corresponding author. Tel.: +34-981167000, ext. 2119; fax:+34-981167065 Email address: [email protected] (M.A. J´ acome ). 1 Research supported in part by MCyT Grants MTM2005-00429 for the first author and MTM2005-01274 (ERDF support included) for the second one.

truncation arises when a subject is not included in the study because its lifetime origin precedes the starting time of the study and the subject dies before this moment. Right censoring appears when the lifetime is only partially observed because a different event occurs before the end of the lifetime. More specifically, let (Y, T, C) denote a random vector where Y is the lifetime with distribution function (df) F , T is the random left truncation variable with df L and C is the random right censoring time with df G. So, in the random left truncation and right censoring (LTRC) model, one observes (T, Z, δ), if T ≤ Z, where Z = min(Y, C) and δ = 1{Y ≤ C} indicates if the observation is censored (δ = 0) or not (δ = 1). When Z < T nothing is observed (truncation occurs). Moreover, H denotes the distribution function of Z, and aH = inf {y : H (y) > 0} and bH = sup {y : H (y) < 1} the left and right support endpoints respectively. The same notation will be used for the support endpoints of any distribution function. Finally, it is assumed that aL ≤ aH , Y is independent of (T, C) and α = P (T ≤ Z) > 0. Under these assumptions, it can easily be shown that the cumulative hazard function (chf) can be expressed as follows, Λ(y) =

Z

y

aH

dH1∗ (u) , C(u)

(1)

where H1∗ (y) = P (Z ≤ y, δ = 1|T ≤ Z) and C(y) = P (T ≤ y ≤ Z|T ≤ Z). (2) Let (Ti , Zi , δi ), i = 1, 2, ..., n, be an iid sample from (T, Z, δ) which one observes (i.e., Zi ≥ Ti ). Replacing in (1) the functions H1∗ and C with the empirical estimators of them, ∗ H1n (y) =

n 1X 1{Zi ≤ y}δi n i=1

and Cn (y) =

n 1X 1{Ti ≤ y ≤ Zi }, n i=1

(3)

and considering the one-to-one relation between the survival function and the chf Λ: Y (1 − 4Λ(u)) (4) 1 − F (y) = exp(−Λc (y)) u≤y

where Λc is the continuous part of Λ and 4Λ(u) = Λ(u) − Λ(u−), the classical estimators of Λ and F (see Tsai et al (1987)) are: ΛTn JW (y)

X

δi = i:Zi ≤y nCn (Zi )

and

FnT JW (y)

= 1−

Y

i:Zi ≤y

!

δi . (5) 1− nCn (Zi )

Note that the product-limit (PL) estimator FnT JW in (5) reduces to the KaplanMeier (KM) estimator (see Kaplan and Meier (1958)) when there is no truncation, and to the Lynden-Bell (1971) estimator when there is no censoring. For 2

LTRC data, the properties of FnT JW were studied by Tsai et al (1987), Gijbels and Wang (1993), Zhou (1996) and Zhou and Yip (1999), among others. Specifically, the importance of accounting for truncation effects when estimating the distribution function F was emphasized by Tsai et al (1987); they illustrated, with an example, that the KM estimator of F , obtained by ignoring the truncation effects, underestimates F considerably. Gijbels and Wang (1993) decomposed the PL estimator FnT JW as a mean of iid random variables plus a negligible remainder term of the order O (n−1 log n) almost surely, uniformly over compact intervals when aL < aH . Zhou (1996) derived a strong approximation for FnT JW when aL = aH , and Zhou and Yip (1999) obtained the improved order O (n−1 log log n), in the case of aL < aH , and for the critical case aL = aH under the condition (Int1 ) below. These approximations have been used to derive asymptotic properties of FnT JW (and ΛTn JW ) and also applied to density and hazard function estimation. In order to motivate presmoothing ideas, note that the cumulative hazard function given in (1) can also be expressed as follows: Λ(y) =

Z

m(u) ∗ dH (u), C(u)

y

aH

(6)

where H ∗ (y) = P (Z ≤ y, |T ≤ Z)

(7)

is the conditional df of the observed lifetime, and m(y) = P (δ = 1|Z = y, T ≤ Z) is the conditional probability of uncensoring. The importance of m has been pointed out, without truncation, in Stute and Wang (1993) or Dikta (1998) among others. Note that, since m is, in fact, the conditional expectation m(y) = E(δ|Z = y, T ≤ Z), this regression function can be estimated using the observations (Ti , Zi , δi ), i = 1, ..., n. Let us consider the nonparametric estimator of m proposed by Nadaraya (1964) and Watson (1964):

mb (y) =

1 n

n P

Kbn (y − Zi )δi i=1 , n 1 P K (y − Z ) b i n n i=1

(8)

with Kbn (·) = 1\bn K (·\bn ) the rescaled kernel and bandwidth sequence bn ↓ 0. The functions H ∗ and C, appearing in the expression (6), can be estimated P empirically by Hn∗ (y) = n1 ni=1 1{Zi ≤ y} and Cn in (3) respectively. Hence, the appealing intuitive idea that leads to the presmoothed estimators consists of replacing in (6) the function m with the Nadaraya-Watson smoother mb 3

and the other functions with their standard empirical counterparts: ΛPb (y)

!

Y mb (Zi ) mb (Zi ) . and FbP (y) = 1 − 1− = nCn (Zi ) i:Zi ≤y i:Zi ≤y nCn (Zi ) X

(9)

Note that the presmoothed estimators (9) are constructed similarly as (5), just replacing δi with mb (Zi ). This clearly shows that presmoothing is very useful with missing censoring indicators, since if some δi0 s are not available, one just cannot compute the PL estimator FnT JW . Remark 1 When the presmoothing bandwidth bn is close to zero, then mb (Zi ) → δi , and therefore the presmoothed estimators tend to the classical ones in (5): ΛPb (y) → ΛTn JW (y)

and

FbP (y) → FnT JW (y).

Remark 2 When there is no truncation, ΛbP and FbP reduce to the presmoothed estimators studied in Cao et al (2005). The parametric version of this idea can be found in Sun and Zhu (2000). They proposed working with a parametric estimator of m. In that work, the so called semiparametric estimator of F is shown to be at least as efficient as the classical PL estimator FnT JW if the parametric model taken for m is the correct one. Presmoothing has the advantage that there is no need for a parametric candidate of m. Without truncation, presmoothed estimators of the distribution, density and hazard functions have been proved to be more efficient than their classical counterparts for a suitable choice of the bandwidth bn (see Cao and J´acome (2004) or Cao et al (2005) among others). In that context, the presmoothed estimator of F has, unlike the KM estimator, a jump at any of the observations regardless of its status (censored or not), which provides more information on the local behavior of F . Besides, a better mean squared error performance than that of the KM estimator has also been proved. These good features of the presmoothed estimator of the df F remain in presence not only of right censoring but also left truncation. In many applications involving follow-up studies, lifetime is subject to left truncation in addition to the usual right censoring and, as we mentioned before, it is crucial to have available good estimators that take into account the truncation effects. A first aim of the present paper is to extend presmoothing ideas to estimate F and Λ when the observations may be not only right censored but also left truncated, and study the efficiency of these new estimators with respect to the classical ones. As a second topic, in this paper we also study nonparametric presmoothed 4

estimation of the density function with truncated and censored data, so we assume that F is absolutely continuous with density function f and h denotes the density function of Z. Among the nonparametric methods, the most popular in literature is the kernel-type estimator introduced by Parzen (1962) and Rosenblatt (1956). With complete data, it is defined as the convolution of a kernel K (which is usually a density function) with the empirical estimator, Fn , of the distribution function F , fbs (y) =

Z

Ks (y − u) dFn (u) =

n 1X Ks (y − Zi ) , n i=1

(10)

where Ks (·) = s−1 K (·/s) is the rescaled kernel function K according to the bandwidth sequence s ≡ sn ↓ 0. In situations where incomplete data appear, Fn is replaced in (10) by an appropriate estimator of F . For LTRC data, the classical kernel-type density estimator is obtained by the convolution of K with the well-known productlimit estimator (PLE) of F in (5):

fsT JW

(y) = =

Z

Ks (y − u) dFnT JW (u)

n X i=1

h

Ks (y − Zi ) FnT JW (Zi ) − FnT JW Zi−

i

(11)

is a kernel-type density estimator but replacing the classical 1/n weights in (10) by the PLE weights. The presmoothed kernel-type density estimator is defined as the convolution of the kernel K and the presmoothed df estimator given in (9): P fs,b (y) =

Z

Ks (y − u) dFbP (u) .

(12)

As mentioned before, the presmoothed estimator FbP has a jump at any of the observations regardless of its status (censored or not). This fact, which implies that FbP provides more information on the local behavior of F , produces a more efficient estimation of the density function (see Cao and J´acome (2004) for the untruncated case). Another aspect to highlight is that two different smoothing parameters are used in (12): the presmoothing bandwidth bn , to compute the NW estimator of m in (8), and the smoothing parameter sn , involved in the convolution procedure. When the presmoothing bandwidth is close to zero P (bn ' 0), then mb (Zi ) → δi , and therefore the presmoothed estimator fs,b (y) tends to the classical one given in (11), to the kernel density estimator studied by Lo et al (1989) for right censored data, and the one analyzed by Arcones and Gin´e (1995) under left truncation. On the other hand (bn > 0), if there is 5

P no truncation, fs,b (y) reduces to the presmoothed density estimator studied in Cao and J´acome (2004).

The paper is organized as follows: Section 2 contains an iid almost sure representation, consistency asymptotic bias, variance and limit distribution for the presmoothed chf estimator, and Sections 3 and 4 for the distribution and density function estimators, respectively. In Section 5 the asymptotic expansion of the variances shows a gain in second order efficiency towards the classical ones. This results in a better mean squared error performance, which makes clear that presmoothing is beneficial. The optimal bandwidths are analyzed in Sections 6 and 7. A simulation study is carried out in Section 8 to compare the behavior of the presmoothed and classical estimators with small samples, and the presmoothed estimators are applied in a real data example in Section 9. Finally, Section 10 contains the proofs of the main results.

2

The presmoothed estimator of the cumulative hazard function

Before deriving the main results, we need to introduce some conditions and notations. Throughout this paper we shall assume the above mentioned standard conditions in LTRC models: α = P (T ≤ Z) > 0, the random variables Y and (T, C) are independent and aL ≤ aH . Fix any τ < bH . Here are the assumptions on the kernel K, the density functions h∗ = (H ∗ )0 and f and the conditional probability of uncensoring m: (K) The kernel K is a symmetric differentiable density function, of bounded variation with compact support [−1, 1]. (h1 ) H and H ∗ are continuous functions, with h∗ twice continuously differentiable at y ∈ [aH , τ ] . (h2 ) h∗ (y) ≥ > 0 for some > 0 at y ∈ [aH , τ ] . (C1 ) C is twice continuously differentiable at y ∈ [aH , τ ]. (C2 ) C (y) ≥ ε > 0 for some ε > 0 at y ∈ [aH , τ ]. (m1 ) The function m is twice continuously differentiable at y ∈ [aH , τ ] . (m2 ) The function m is three times continuously differentiable at y ∈ [aH , τ ] . (f1 ) The density function f is twice continuously differentiable at y ∈ [aH , τ ]. (f2 ) The density function f is four times continuously differentiable at y ∈ [aH , τ ]. Z dH ∗ (v) (Int1 ) τaH 31 < ∞. C (v) Z dv (Int2 ) τaH 2 < ∞. C (v) h∗ (v) Z dH ∗ (v) < ∞. (Int3 ) τaH C (v) (I) The variables T and Z are independent. 6

Condition (K) is usual when constructing nonparametric kernel-type estimators of the regression function, such as the NW estimator mb . Assumptions (h1 ), (C1 ), (m1 ), (m2 ), (f1 ) and (f2 ) are standard regularity conditions, needed to apply Taylor expansions. The density function h∗ must be bounded away from zero (assumption (h2 )) to control the error rates of the NW estimator. Conditions (Int1 ) − (Int2 ), involving the function C, are required for the a.s. asymptotic representation of the presmoothed distribution estimator. For the assumption (C2 ), the function C must be bounded away from zero to obtain the asymptotic normality result. Note that assumption (C2 ) implies conditions (Int1 ) − (Int3 ). Condition (I) makes easier the derivation of the bias and variance of the presmoothed density estimator. The assumptions on the bandwidths sn and bn are the following: (b1 ) n1−ε bn → ∞ for some ε > 0 and Σbλn < ∞ for some λ > 0. Moreover, bn ln ln n → 0 as n → ∞. −2 2 8 → ∞ as n → ∞. (b2 ) ns−1 n bn → 0 and nsn bn (ln n) The first assumption in (b1 ) establishes the assumptions for Lemma 1 and Theorem B in Mack and Silverman (1982), and the second one is needed for the almost sure iid representation of the presmoothed distribution estimator. Condition (b2 ) is required for the asymptotic normality result. If the bandwidths −α −α −β are assumed to be sn = cs n + o (n ) and bn = cb n + o n−β for some cs , cb , α, β > 0, then an example of constants α and β for bandwidth sequences satisfying conditions (b1 ) and (b2 ) isα < 53 and 81 (α + 1) < β < 12 (1 − α). In particular, if sn = cs n−1/5 + o n−1/5 for some cs > 0, then the presmoothing

bandwidth should be of the form bn = cb n−β + o n−β with

3 20

< β < 52 .

Theorem 1 Under conditions (K), (h1 ), (h2 ), (m1 ), (b1 ), (Int1 ) and (Int2 ), then P ΛPb (y) − Λ (y) = Λb (y) − Λ (y) + Sn (y) P

where Λb (y) = Λ (y) +

1 n

n P

i=1

ηnP (y, Zi , δi , Ti ) with

ηnP (y, Zi , δi , Ti ) = g1P (y, Zi ) − g2P (y, Zi , Ti ) + g3P (y, Zi , δi ) ,

(13)

and m (Zi ) 1 {Zi ≤ y} , C (Zi ) Z y m (v) P g2 (y, Zi , Ti ) = 1 {Ti ≤ v ≤ Zi } dH ∗ (v) , 2 aH C (v) Z y δi − m (v) dv, Kbn (v − Zi ) g3P (y, Zi , δi ) = C (v) aH g1P (y, Zi ) =

7

(14) (15) (16)

ln n nbn

and supaH ≤y≤τ |Sn (y)| = O b2n +

!1/2 2

a.s.

Theorem 2 Under conditions (K), (h1 ), (h2 ), (m1 ), (b1 ), (Int1 ) and (Int3 ), then !1/2 ln n a.s. (17) sup ΛPb (y) − Λ (y) = O b2n + nbn aH ≤y≤τ Therefore, if bn → 0 and n(ln n)−1 bn → ∞ as n → ∞,

sup |ΛPb (y) − Λ(y)| → 0 with probability one.

aH ≤y≤τ

The asymptotic normality of ΛPb − Λ is an easy consequence of the representation in Theorem 1. For that result, we need a closed formula for the asymptotic expressions of the bias and variance of the estimator. Proposition 3 Under conditions (I), (K), (h1 ), (m2 ), (C1 ) and (C2 ), then

P

E Λb (y) − Λ (y) = dK α (y) b2n + o b2n , V ar

P Λb

! 1 Z y dH1∗ (u) 2 − 2bn eK (q1 (y) + q1 (aH )) + O bn , (y) − Λ (y) = n aH C 2 (u)

where dK =

Z

Z

2

u K(u)du and eK = y 1 m00 2

Z

uK(u)

Z

u

K(v)dv du,

−∞

(v) h∗ (v) + m0 (v) h∗0 (v) dv, C (v) aH m (y) (1 − m (y)) h∗ (y) . q1 (y) = C 2 (y) α (y) =

(18) (19) (20)

Next theorem presents the asymptotic normality for n1/2 (ΛPb − Λ). It follows from Theorem 1 and Proposition 3. The proof is similar to that of Corollary 3 in Iglesias-P´erez and Gonz´alez-Manteiga (1999), so it is omitted. Theorem 4 Under the assumptions (I), (K), (h1 ), (h2 ), (m2 ), (b1 ), (b2 ), (C1 ) and (C2 ), then for any y < τ :

d

a) If nb4n → 0, then n1/2 ΛPb (y) − Λ (y) −→ N (0, σΛ2 (y)).

d

b) If nb4n → B 4 , then n1/2 ΛPb (y) − Λ (y) −→ N (bΛ (y) , σΛ2 (y)), where 8

bΛ (y) = B 2 α (y) dK , σΛ2

(y) = γ(y) with

γ(y) =

Z

y

aH

m (v) h∗ (v) dv, C 2 (v)

(21)

and dK and α given in (18) and (19) respectively.

3

The presmoothed estimator of the distribution function

To obtain the asymptotic properties of the presmoothed estimator of the df FbP , we rely on the relation (4) and the results for ΛPb in Section 2. Therefore, the almost sure asymptotic representation of FbP is based on the one of ΛPb given in Theorem 1. Theorem 5 Under the conditions of Theorem 1 then P

FbP (y) − F (y) = F b (y) − F (y) + Rn (y) P

where F b (y) = F (y) + and sup aH ≤y≤τ

1 n

n P

i=1

(1 − F (y))ηnP (y, Zi , δi , Ti ) with ηnP given in (13),

2 |Rn (y)| = O bn +

ln n nbn

!1/2 2

a.s.

The rate of the uniform consistency of the presmoothed estimator of F is given in the following theorem. The proof is analogue to that of Theorem 2. Theorem 6 Under the conditions of Theorem 2 then sup aH ≤y≤τ

P Fb (y) − F (y) = O b2n +

ln n nbn

!1/2

a.s.

Therefore, if bn → 0 and n(ln n)−1 bn → ∞ as n → ∞, sup |FbP (y) − F (y)| → 0 with probability one.

aH ≤y≤τ

The next proposition gives the asymptotic expression of the bias and variance P of the iid representation F b . Using Theorem 5 and Proposition 3, the proof is straightforward. Proposition 7 Under conditions of Proposition 3, then 9

P

E F b (y) − F (y) = dK α (y) (1 − F (y)) b2n + o b2n P 1 V ar F b (y) − F (y) = (1 − F (y))2 n ! Z y dH1∗ (u) 2 × − 2bn eK (q1 (y) + q1 (aH )) + O bn . aH C 2 (u) with dK , eK , α and q1 given in (18)-(20). Note that, when the presmoothing bandwidth bn is close to zero, the expressions of the bias and variance of ΛPb and FbP , given in Propositions 3 and 7 respectively, reduce to the ones of the classical estimators ΛTn JW and FnT JW . The asymptotic normality for n1/2 (FbP − F ) follows from Theorem 5 and the P asymptotic expressions for the bias and variance of F b in Proposition 7. Theorem 8 Under the assumptions of Theorem 4, then for any y < τ :

d

a) If nb4n → 0, then n1/2 FbP (y) − F (y) −→ N (0, σF2 (y)).

d

b) If nb4n → B 4 , then n1/2 FbP (y) − F (y) −→ N (bF (y) , σF2 (y)), where bF (y) = B 2 (1 − F (y)) α (y) dK , σF2 (y) = (1 − F (y))2 γ(y), and dK , α and γ given in (18), (19) and (21) respectively.

4

The presmoothed estimator of the density function

The strong representation of the presmoothed distribution estimator is a key tool in the proof of the following results. Next theorem is an extension of Theorem 1 in J´acome and Cao (2007) to the LTRC model. It shows that, P − f can be expressed as a sum of iid variables. up to a remainder term, fs,b This result is an application of the strong representation of the presmoothed distribution estimator given in Theorem 5. The strategy of the proof is similar to that of J´acome and Cao (2007). Without presmoothing, it was already employed by Lo et al (1989) with censored data and by Gijbels and Wang (1993) under the LTRC model, so the details are omitted. Theorem 9 Assume conditions (K), (h1 ), (h2 ), (m1 ), (b1 ), (Int1 ) and (Int2 ). Then the presmoothed density estimator admits the following representation: P fs,b (y) = f (y) + βnP (y) + σnP (y) + ePn (y)

10

(22)

where βnP

(y) =

is essentially the bias, σnP (y) =

Z

f (y − sn v) K (v) dv − f (y)

(23)

Z n 1 P P ξi,n (y − sn v) K 0 (v) dv nsn i=1

(24)

P with ξi,n (y) = (1 − F (y)) ηbP (y, Zi , δi , Ti ), ηbP given in (13) is the stochastic P component of fs,b , and the remainder term satisfies

sup aH ≤y≤τ

1 P b2 + en (y) = O n

sn

ln n nbn

!1/2 2

a.s.

(25)

It should be observed that the bias part βnP is not random, and the variance part σnP is a sum of iid random variables. This is a very useful result to obtain asymptotic properties of the presmoothed density estimator, since it enables P to handle a sum of iid variables instead of the complicated structure of fs,b . Without presmoothing, this representation reduces to the one by Gijbels and P Wang (1993) and Zhou and Yip (1999). The strong consistency of fs,b follows now from Theorem 9. Theorem 10 Assume conditions (K), (h1 ), (h2 ), (m1 ), (b1 ), (f1 ), (Int1 ) and (Int3 ), then 1 P b2 + fs,b (y) − f (y) = O n

sup

sn

aH ≤y≤τ

ln n nbn

!1/2

a.s.

Consider the following function and constants depending on the kernel K:

AK (L) = cK =

ZZZ

Z

K (u) K (v) K (w) K (u + L (v − w)) dudvdw,

K 2 (v) dv = AK (0)

and the function Q2 (y) =

Z

y

aH

m (v) h∗ (v) dv. C 2 (v)

P

P Recall f s,b (y) = f (y) + βnP (y) + σnP (y) the iid representation of fs,b . Next P

proposition gives an asymptotic expansion of the bias and variance of f s,b . Several cases will be distinguished according to the rate at which the smoothing parameters tend to zero. 11

Proposition 11 Assume conditions (K), (h1 ), (C 1 ), (C2 ), (m2 ), (f2 ), (I). If the bandwidths satisfy sn → 0, bn → 0 and nsn → ∞, nbn → ∞, then the P bias and variance of f s,b admit the following representation: P 1 Bias f s,b = f 00 (y) dK s2n +[(1 − F (y)) α (y)]0 dK b2n +O b4n +O s4n , (26) 2

and, depending on the ratio bn /sn : (a) If

bn → 0 then sn

P

V ar f s,b =

(b) If

bn = Ln → L > 0, then sn

V ar

(c) If

τ1 (y) τ2 (y) bn sn cK − − 2 f 2 (y) (q1 (y) + q1 (aH )) eK + O nsn n n n (27)

P f s,b

τ2 (y) τ1 (y) sn . (28) [(1 − m (y)) AK (L) + m (y) cK ]− +O = nsn n n

bn → ∞, then sn

V ar

P fb

!

1 1 bn 1 . (29) (1 − m(y)) + m(y) − τ2 (y)+O = cK τ1 (y) nbn nsn n n

where τ1 (y) =

1 − F (y) 1 − C(y)

!2

m(y)h∗ (y) and τ2 (t) = f (y) [(1 − F (y)) Q2 (y)]0 .

√ P Next theorem presents the asymptotic normality for nsn (fs,b (y) − f (y)). It follows from Theorem 9 and the asymptotic expressions for the bias and P variance of f s,b in Proposition 11. The proof is similar to that of Corollary 3 in Iglesias-P´erez and Gonz´alez-Manteiga (1999) in the LTRC model, and that of Cao and J´acome (2004) without truncation, so it is omitted. Theorem 12 Assume conditions (K), (h1 ), (h2 ), (C1 ), (C2 ), (m2 ), (f2 ), (b1 ), (b2 ), (I), then √

P nsn (fs,b (y) − f (y)) → N a (y) , σ 2 (y)

12

where

a (y) =

and

0

1 5/2 00 C f (y) dK if ns5n → C 5 and nb5n → 0 2 1 C 5/2 f 00 (y) dK + C 1/2 B 2 [(1 − F (y)) α (y)]0 dK if ns5 → C 5 and nb5 → B 5 n n 2

σ 2 (y) =

5

if ns5n → 0

τ1 (y)cK

if bn /sn → 0

τ1 (y) [(1 − m (y)) AK (L) + m (y) cK ] if bn /sn → L > 0 τ1 (y)m(y)cK if bn /sn → ∞

Beneficial effect of presmoothing

It is easy to show, under a suitable choice of the bandwidth bn , the efficiency of the presmoothed estimators ΛPb and FbP with respect to ΛTn JW and FnT JW . P In fact, the asymptotic expressions of the mean squared error (AMSE) of Λb P and F b show a gain in second order efficiency towards the classical estimators. We just illustrate this for ΛPb , since for FbP the ideas are similar. P

Note that the mean squared error (MSE) of Λb is P

P

P

M SE(Λb ) = Bias2 (Λb ) + V ar(Λb ).

(30) P

From the asymptotic expressions of the bias and variance of Λb given in Proposition 3, we have that P

AM SE(Λb (y)) = d2K α2 (y)b4n +

1 bn γ(y) − 2 eK (q1 (y) + q1 (aH )). n n

(31)

Hence, the asymptotically optimal presmoothing bandwidth is bn,AM SE (y) =

P arg min AM SE(Λb (y)) b>0

=

eK (q1 (y) + q1 (aH )) 2d2K α2 (y)

!1/3

n−1/3 ,

P

and, therefore, the expression of AMSE(Λb ) with bn,AM SE becomes

P AM SEOP T (Λb (y))

3 1 = γ(y) − 4/3 n 2 13

e4K (q1 (y) + q1 (aH ))4 d2K α2 (y)

!1/3

n−4/3 .

On the other hand, for the classical estimator of the cumulative hazard function we have: 1 AM SE(ΛTn JW (y)) = γ(y) + O(n−3/2 ). n This expression comes from the iid representation of ΛTn JW − Λ together with the order of the moments of the remainder term (see, for example, Gijbels and P Wang (1993)). This makes clear the second order efficiency of Λb with respect to ΛTn JW .

6

Optimal bandwidth for the presmoothed distribution and chf estimators

One of the most important aspects involving nonparametric estimation is the choice of the smoothing parameter or bandwidth. For the sake of clearness, let us denote bΛ the bandwidth for the presmoothed estimator of the chf, ΛPb , and bF for FbP . The bandwidth is frequently chosen by minimizing a measure of the distance between the estimator and the true function, such as the mean integrated squared error (MISE): M ISEΛ (bn ) = E

Z

P Λb (y)

2

− Λ(y)

ω(y)dy .

(32)

The MISEΛ is obtained integrating over the entire line the MSE expression in (30), from which one can easily see the tradeoff of bias versus variance. In the definition of MISE a non-negative weight function ω was introduced. The role of ω is to eliminate endpoint effects. Using the expression (31) it is easy to obtain the following asymptotic expression of the mean integrated squared error: AM ISEΛ (bn ) = b4n d2K

Z

0

∞

1Z∞ γ(y)ω(y)dy α (y)ω(y)dy + n 0 2

bn Z ∞ −2 eK (q1 (y) + q1 (aH ))ω(y)dy. n 0

(33)

Hence, the optimal bandwidth in the sense of AMISE is the one minimizing (33), the so-called AMISE bandwidth: bAM ISE,Λ = arg min AM ISEΛ (bn ) = bn >0

eK Q 2d2K A

!1/3

n−1/3 ,

where Q=

Z

0

∞

(q1 (y) + q1 (aH ))ω(y)dy and A =

14

Z

0

∞

α2 (y)ω(y)dy,

and dK , eK given in (18). P

Analogously, from the asymptotic expressions of the bias and variance of F b given in Proposition 2, one can easily infer the expression of the optimal bandwidth for the presmoothed estimator of the distribution function: bAM ISE,F =

eK Q 2d2K A

!1/3

n−1/3 .

Whereas the asymptotic representation of MISE provides considerable insight on the effect of the bandwidth bn in the bias and variance of the estimator, it has the drawback that the minimizer depends on the unknown quantities Q and A. For only censored data, the choice of the bandwidth for the presmoothed chf and df estimators was studied by Cao et al (2005). They proposed a ”plug-in” bandwidth selector, which was obtained by plugging in the expression of the AMISE bandwidth some estimates of the corresponding Q and A. The authors b and A b by replacing in α and q the unknown funcconsidered the estimates Q 1 0 00 tions m, m , m with the Nadaraya-Watson estimator or m, and its first and second derivatives, and h and h0 with the Parzen-Rosenblatt kernel estimator and its first derivative. Hence, the practical implementation of this bandwidth selector required the choice of some pilot bandwidths. The criterion to choose properly the pilot bandwidths was also analyzed by the authors. Following the same ideas, a plug-in bandwidth selector for bAM ISE,Λ and bAM ISE,F can be carried out as in Cao et al (2005). A deeper analysis of this would be a rather complicated task, so we do not pursue this issue further. However, some ideas on the practical implementation of this bandwidth selector are given in Section 8. The simulations results point out that a plug-in bandwidth selector gives promising outcomes.

7

Optimal bandwidths for the presmoothed density estimator

When presmoothing, the choice of the smoothing parameters is crucial, since the efficiency of the presmoothed density estimator with respect to the classical one depends on the limit behavior of the ratio bn /sn (see J´acome and Cao (2007) for details when presmoothing in the untruncated setting). The mean integrated squared error (MISE), a deterministic distance between the true density and some estimate, is probably the measure most commonly 15

used in practice: M ISE (s, b) = E

Z

P f s,b

−f

2

ω =

Z

Bias

2

P f s,b

ω+

Z

P

V ar f s,b ω (34)

Both the upper and lower tails of the distribution of the lifetime Y are affected under the LTRC model. For that reason, a non-negative weighting function, ω, is introduced to eliminate endpoints effects. For that function, the following assumption is required: (ω) ω is compactly supported and C (y) ≥ ε > 0 and h∗ (y) ≥ > 0 for some ε, > 0 and every y in the support of ω. The asymptotic expression of the MISE, say AMISE, that will be given in (36), can be easily obtained just assuming condition (ω) and considering the P asymptotic expansion of the bias and variance of f s,b in (26) and (27)-(29) respectively. Observe that (ω) implies (C2 ) and (h2 ) in Proposition 11. Note that, from (6), the conditional probability m can also be expressed as follows: f (y) C (y) m (y) = . (1 − F (y)) h∗ (y) This makes easier to check that, without presmoothing (bn = 0), the AMISE expression when bn /sn → 0 extends Theorem 2.1 in S´anchez-Sellero (1999) in the LTRC model, Theorem 3.2 in Lo et al (1989) without truncation and Theorem 1 in Cao (1993) for complete data. The bandwidths minimizing (34) are called the MISE bandwidths. Analogously, the bandwidths minimizing the asymptotic expression of the MISE, say AMISE, are the so-called AMISE bandwidths, denoted by sAM ISE and bAM ISE . R

R

To reduce notation, we use f 002 ω = f 00 (v)2 ω (v) dv and so on. The expression of the AMISE bandwidths depends on the limit of the ratio bn /sn . We will only consider the case when bn /sn → L0 ≥ 0, since when bn /sn → ∞ the minimizers of AM ISE are both of order n−1/5 , which contradicts the above mentioned condition. For the sake of readability, define the following functions:

Z

2 1 00 f + L2 [(1 − F ) α]0 ω, 2 Z 1−F 2 ∗ c2 (L) = h m [(1 − m) AK (L) + mcK ] ω, C !1/5 c2 (L) . c0 (L) = 4c1 (L)

c1 (L) = d2K

16

(35)

When bn /sn = Ln → L0 ≥ 0, consider the reparametrization bn = Ln sn . The AMISE function can be written in terms of c1 and c2 as follows: AM ISE (sn , Ln ) = s4n c1 (Ln ) +

1 c2 (Ln ) . nsn

(36)

Minimizing (36) in sn gives an expression of sAM ISE as a function of Ln . Now, replacing that AMISE bandwidth in (36), a new minimization in Ln gives an approximation of L0 . Therefore, the limiting constant L0 can be obtained as follows: h

i1/5

5

n4/5 AM ISE c0 (L) n−1/5 , L . L≥0 L≥0 44/5 (37) This results in the following AMISE bandwidths: L0 = arg min c1 (L) c42 (L)

= arg min

Theorem 13 Assume conditions (K), (h1 ), (C 1 ), (m2 ), (f1 ), (I) and (ω). If the bandwidths satisfy sn → 0, bn → 0 and nsn → ∞, nbn → ∞, then the AMISE bandwidths are (a) If bAM ISE /sAM ISE → L0 > 0, sAM ISE = c0 (L0 ) n−1/5 and bAM ISE = L0 c0 (L0 ) n−1/5 .

(38)

(b) If bAM ISE /sAM ISE → L0 = 0 sAM ISE = c0 (0) n−1/5 and bAM ISE = b0 n−3/5

(39)

with

b0 =

2/5

R

f 002 ω 2 Z 1 − F cK d3K mh∗ ω C

R

eK f 2 (q1 + q1 (aH )) ω . R f 00 [(1 − F ) α]0 ω

The expression of b0 is derived minimizing the AMISE function with an extra term in the integrated variance (27). Remark 3 Consider the constant L0 defined in (37). When L0 > 0, the efficiency of the presmoothed estimator with respect to the classical one fsT JW is of first order, since in such a case:

P

i1/5 44/5 −4/5 h n c1 (L0 ) c42 (L0 ) 5 4/5 h i1/5 4 < n−4/5 c1 (0) c42 (0) = min AM ISE fsT JW . s>0 5

min AM ISE f s,b =

s,b>0

If L0 = 0, we are back in case (b), and the efficiency is just of third order. 17

The expressions (38) and (39) of the AMISE bandwidths suggest a plug-in bandwidth selector for s and b. It is defined replacing the integrals in c1 , c2 and b0 by estimates of them. The parameter L0 , that connects both AMISE bandwidths, can be obtained minimizing the estimation of c1 (L)c42 (L). When L0 = 0, sAM ISE in (39) coincides with the optimal AMISE bandwidth s for the kernel-type density estimator with no presmoothing fsT JW . A plug-in selector for that bandwidth, together withR the optimal expression of the pilot bandwidth needed to give an estimate of f 002 , were given by S´anchez-Sellero et al (1999). Note that, when L0 = 0, the function c2 can be estimated without smoothing, since its empirical estimate is root-n consistent (see Theorem 2.2 in S´anchez-Sellero et al (1999)), that is, Z

1 − FnT JW Cn

!2

∗ ωdH1n

−

Z

1−F C

2

h∗ mω = OP (n−1/2 )

∗ (y) = n−1 with H1∗ (y) = P (Z ≤ y, δ = 1|T ≤ Z) and H1n

n P

i=1

1{Zi ≤ y, δi = 1}.

Following the same ideas, when L0 > 0, several pilot bandwidths are needed to estimate the integrals appearing in c1 and c2 . The choice of those optimal pilot bandwidths can be carried out as in S´anchez-Sellero et al (1999), but we do not pursue this issue further. This bandwidth selector is expected to give quite good results in practice, as the small simulation in the next section suggests.

8

Simulations

For illustration, a small simulation study has been carried out to compare the performance of the new estimators with the classical ones. Since the key idea in presmoothing is the nonparametric estimation of the function m, we have considered two different models, according to the shape of m. The first model is the one in J´acome and Iglesias-P´erez (2007), in which both the lifetime and censoring variables follow Weibull distributions, Y ∈ W eibull (0.4, 4) and C ∈ W eibull (0.3, 3). For the truncation distribution, an exponential with mean 1 has been chosen. This results in a censoring proportion of 29.5% and the conditional probability of uncensoring is −1 m (y) = 1.264 (1.264 + y −1 ) (see Figure 1). The second model is that considered by Uzunogullari and Wang (1992). The 18

2.0

Density f, Model 1 Density f, Model 2 Function m, Model 1 Function m, Model 2

1.5

1.0

0.5

0.0

0

1

2

3

4

5

6

Fig. 1. Density function f (solid lines) and conditional probability m (dashed lines) for Model 1 (thin lines) and Model 2 (thick lines).

density function of interest is given by 1 (y − 1)3 −y− f (y) = (y − 1) + 1 exp − 3 3

2

!

for y ≥ 0,

and both C and T are simulated from exponential distributions with means 4 and 0.1 respectively. Now, the censoring proportion is 14.8%, and the condi −1 2 tional probability of uncensoring is m (y) = (y − 1) + 1 (y − 1)2 + 1.25 , a nearly constant function (see figure 1). In order to avoid boundary effects, the weight function ω discards 5% of the distribution in the upper tail for both models. Since in Model 2 there are high values of the density function f in a neighborhood of zero, in that model the weight function also discards 25% of the distribution in the lower tail. The function ω has been chosen uniform in these supports. A total of m = 500 samples of size n = 50, 100 and n = 200 have been simulated. The gaussian kernel was used for whatever kernel-type estimation required. 8.1 Efficiency of the presmoothed estimators To compare the presmoothed estimators of the distribution and cumulative hazard functions with the classical ones, let us consider the mean integrated squared errors (MISE) for the df estimators FbP and FnT JW as follows: 19

MODEL 2

MODEL 1

1.00 n=50 n=100 n=200

n=50 n=100 n=200

Relative efficiency REF

Relative efficiency REF

1.1

0.95

1.0

0.90

0.9

0.85

0.8 0

1

2 3 Presmoothing bandwidth bn

4

0

1

2 3 Presmoothing bandwidth bn

4

Fig. 2. Relative efficiency, in terms of MISE, of the presmoothed estimator FbP with respect to the PL estimator FnT JW for Models 1-2 and sample sizes n = 50, 100 and 200. MODEL 2

MODEL 1 1.00 n=50 n=100 n=200

0.99

Relative efficiency REL

Relative efficiency REL

1.00

0.98

0.97

n=50 n=100 n=200

0.98

0.96

0.94

0.92

0.90

0.96 0

1

2 3 Presmoothing bandwidth bn

0

4

1

2 3 Presmoothing bandwidth bn

4

Fig. 3. Relative efficiency, in terms of MISE, of the presmoothed estimator ΛPb with respect to the classical estimator Λn for Models 1-2 and sample sizes n = 50, 100 and 200.

M ISEFP (bn )

=E

Z

FbP

−F

2

ω and

M ISEFT JW

=E

Z

FnT JW

−F

2

ω ,

and denote the MISE of the estimators of the cumulative hazard function with M ISEΛP and M ISEΛT JW respectively. Now, the relative efficiency of the presmoothed estimators with respect to the classical ones, depending on the presmoothing bandwidth bn , is defined as follows: REF (bn ) =

M ISEFP (bn ) M ISEFT JW

and REΛ (bn ) =

M ISEΛP (bn ) . M ISEΛT JW

(40)

Values of RE(bn ) lower than 1 mean that the presmoothed estimator constructed with bandwidth bn is more efficient, in terms of MISE, than the classical one. In fact, the lower than 1 it is, the more efficient presmoothing is. 20

MODEL 2

0.9

0,96

1,00

2.5

0,97

MODEL 1

1,50

2.0

Bandwidth b

0.5

0.35 0.40 Bandwidth s

0.5

1,20

1,10

10 1,

0.30

0.45

1,05

0,98 20 1,

0.25

1.0

1,00

0.3

0.1

1.5

1, 1,2 30 0

0.50

0.0 0.10

1,1 0

1,1 0

Bandwidth b

0.7

1,05

0.15

0.20

0.25

Bandwidth s

Fig. 4. Relative efficiency REM ISE in a grid of bandwidths (s, b) for Model 1 (left panel) and Model 2 (right panel).

We have approximated the relative efficiencies in (40) by Montecarlo using a total of m = 500 samples of sizes, n = 50, 100 and 200, in a wide range of possible bandwidths bn . These functions are plotted in Figure 2 for the distribution function and Figure 3 for the cumulative hazard function. Note that, for values of bn close to zero, the relative efficiency tends to 1, since in such a case the presmoothed estimators tend to the classical ones. As it is shown in Figures 2 and 3, the presmoothed estimators are more efficient than their classical counterparts for almost any bandwidth bn when n is small, although this efficiency decreases as n is larger. In any case, there is a wide range of bandwidths bn for which presmoothing yields better estimates regardless the simple size n. In order to asses the importance of the suitable choice of the bandwidths s and b for the presmoothed estimator of the density function, the MISE function was approximated in a grid of values (s, b) by the sample mean of the integrated squared error (ISE) calculated over m = 500 samples of size n = 50. The optimal MISE bandwidth of the density estimator fsT JW was obtained minimizing its corresponding MISE function. Finally, the relative efficiency of the presmoothed density estimator has been considered as follows: REM ISE (s, b) =

P M ISE fs,b

mins>0 M ISE (fsT JW )

.

(41)

Values of REM ISE (see Figure 4) lower than 1 mean that presmoothing with the corresponding bandwidths sn and bn is more efficient than estimating the density function with the classical estimator using its optimal bandwidth sTMJW ISE . It is worth mentioning that, in both models, there is a wide range of bandwidths for which presmoothing gives more efficient estimates. Note that, for Model 2, the choice of the bandwidth bn seems not to be important. The reason is that the function m is almost constant, and therefore any bandwidth bn yields good estimates of m, and hence good presmoothed estimates of f . 21

8.2 A bandwidth selector for the distribution function As pointed out, presmoothed estimators are more efficient than the classical ones for a wide interval of values for bn . Nevertheless, it is essential for the practical use of these estimators to use a proper bandwidth. Since the choice of the pilot bandwidths required for the plug-in selector proposed in Section 5 is out of the scope of this paper, we show the performance of the presmoothed estimators ΛPb and FbP with a slightly simplified plug-in bandwidth. It consists in estimating the unknown functions m, m0 , m00 with a parametric estimator of the regression function m and its first and second derivative, and h∗ and h∗0 with a reference density and its first derivative. This makes possible to get rid of the problem of the pilot bandwidths. If it is assumed that m belongs to a parametric family, then m(y) ≡ m(y, θ0 ), where m(·, ·) is a known continuous function and θ0 = (θ01 , . . . , θ0k )T ∈ Θ is an unknown parameter. For the estimation of the parameter θ0 we have used a maximum likelihood approach. Possible candidates for m can be found in Cox and Snell (1989) and Dikta (1998). We have considered the logistic regression model, since it has become, in many fields, the standard method of analysis when the outcome variable is dichotomous. For h∗ , the reference density was the Gaussian. So, from now on, we will consider the following bandwidth plug-in bandwidth selector for both bΛ and bF : ! b 1/3 eK Q b b= n−1/3 , (42) 2d2K Ab where n 1X 1 b c(Zi )(1 − m c(Zi ))ω(Zi ), Q= (Cn (Zi ) + )−1 m

n i=1

Ab =

Z

0

∞

n

2

b (y)ω(y)dy with α b (y) = α

Z

y 1m c00 2

aH

b ∗ (v) + m b ∗0 (v) c0 (v) h (v) h dv. Cn (v)

Table 1. Mean integrated squared error (MISE) of the PL estimator FnT JW , the semiparametric estimator FnSZ and the presmoothed estimator FbP , for Models 1 and 2, and sample sizes n = 50 and n = 100. The presmoothed estimator has been computed with a plug-in bandwidth selector.

M ISE FnT JW Model 1 n = 50 n = 100 Model 2 n = 50 n = 100

M ISE FnSZ

M ISE FbP

9.49798 × 10−3

8.62079 × 10−3

8.147552 × 10−3

4.74977 × 10−3

4.31943 × 10−3

4.22462 × 10−3

1.13625 × 10−2

1.06319 × 10−2

1.03884 × 10−2

6.33151 × 10−3

6.06153 × 10−3

5.95351 × 10−3

22

Table 2. Mean integrated squared error (MISE) of the classical estimator ΛTn JW , P the semiparametric estimator ΛSZ n and the presmoothed estimator Λb , for Models 1-2, and sample sizes n = 50 and n = 100. The presmoothed estimator has been computed with a plug-in bandwidth selector.

M ISE ΛTn JW Model 1 n = 50

0.59043

n = 100 Model 2 n = 50 n = 100

M ISE ΛSZ n 0.58223

M ISE ΛPb 0.56943

0.53530

0.53007

0.52138

0.24941

0.23605

0.23296

0.13528

0.13004

0.12559

In order to investigate the practical performance of this plug-in bandwidth selector, we approximated the MISE of the presmoothed estimators ΛPb and FbP using 500 samples of size n = 50 and n = 100 drawn from Models 1-2, and the MISE of the classical estimators ΛTn JW and FnT JW . For comparison reasons, the semiparametric estimators proposed by Sun and Zhu (2000) have also been considered. To compute ΛSZ and FnSZ , the logistic regression was n chosen as the parametric estimator of m. MODEL 2

MODEL 1

Product-limit Semiparametric Presmoothed

0.006

Product-limit Semiparametric Presmoothed

0.010

0.008

0.004 0.006

0.002

0.004

0.002

0.000 0.000

0.0

0.5

1.0

1.5

2.0

2.5

3.0

0.0

3.5

0.5

1.0

1.5

2.0

Fig. 5. Mean squared error of the PL estimator FnT JW , the semiparametric FnSZ with a logistic model for m, and the presmoothed estimator FbP with a plug-in bandwidth selector for Models 1-2 and sample size n = 50.

Table 1 shows the MISE of the PL estimator FnT JW , the semiparametric estimator FnSZ and the presmoothed estimator FbP with the plug-in bandwidth selector (42). It should be observed that the presmoothed estimator behaves better than any of the others for any model and sample size. In fact, it is, in terms of MISE, about 11.1-14.2% and 6.0-8.6% more efficient than FnT JW for models 1 and 2 respectively, and 2.2-5.5% and 1.8-2.2% than FnSZ . Note that the amount of improvement depends on the sample size (it is lower when n increases). Table 2 contains similar results for the cumulative hazard function. 23

Finally, Figure 5 presents the mean squared error (MSE) of the three df estimators, approximated on a grid of points by Montecarlo using 500 samples of size n = 50. It can be observed the quite acceptable pointwise performance of FbP with respect to the PL and semiparametric estimators. Specifically, the presmoothed estimator has a lower MSE than FnT JW in the complete interval of computation. The improvement with respect to FnSZ is clear for inner points, while it vanishes in the boundary of the interval. In summary, the good performance of the presmoothed estimators suggests that presmoothing is a competitive method that outperforms the classical estimators in this context. 8.3 A bandwidth selector for the presmoothed density estimator P A plug-in bandwidth selector for fs,b is expected to give quite acceptable results in practice. The classical kernel density estimator fsT JW with the plugin bandwidth selector sbT JW in S´anchez-Sellero et al (1999) has been considered for comparison reasons. P can be obtained, as explained in Section The plug-in bandwidth selector for fs,b 7, replacing the integrals in c1 , c2 and b0 by estimates of them. The function c2 can be estimated empirically, as in S´anchez-Sellero et al (1999). But, for c1 , the functions f 00 and [(1 − F )α]0 have to be approximated, which implies that several pilot bandwidths are required. For f 00 , consider the pilot bandwidth g given in Theorem 2.3 in S´anchez-Sellero et al (1999), since it is the one minimizing the mean squared error of the curvature of the true density (integrated squared second derivative). Given that the choice of a pilot bandwidth for the function [(1 − F )α]0 is out of the scope of this paper, in this simulation study we have estimated 1 − F by means of the presmoothed estimator of the survival function, 1 − FgP1 , with a bandwidth g1 obtained by a cross-validation procedure. With respect to α, we have considered

αn (y) =

Z

y 1m c00 2

aH

b ∗ (v) + m b ∗0 (v) c0 (v) h (v) h dv, Cn (v)

b are parametric estimates of m and h, respectively, and C (y) c and h where m n c be the is the empirical estimator of the function C in (3). In particular, let m b the maximum likelihood polynomial regression estimator of degree two, and h R density estimator under a normal distribution. Finally, the integral f 2 (q1 + q1 (aH ))ω in b0 has been approximated by n b2 c(Zi )) f (Zi )(1 − m 1X δi ω(Zi ) n i=1 Cn (Zi )

with fb the presmoothed density estimator with bandwidths sbT JW and g1 . 24

Table 3. Integrated Squared Error (ISE) of the classical and presmoothed density estimators with the corresponding plug-in bandwidth selector. MODEL 1

MODEL 2

Classical

Presmoothed

Classical

Presmoothed

Mean

0.0184

0.0174

0.0269

0.0262

Percentile 25

0.0084

0.0081

0.0154

0.0148

50

0.0150

0.0139

0.0241

0.0232

75

0.0236

0.0234

0.0341

0.0322

With the aim of exhibiting the good behavior of this bandwidth selector for P P fs,b , we have computed the ISE of the density estimators fs,b and fsT JW for a total of m = 500 samples of size n = 50 with the corresponding plug-in bandwidths. The results can be seen in Table 3. The most remarkable fact is P that, although the above-mentioned plug-in bandwidth selector for fs,b can be improved, the ISE of the presmoothed estimator is lower than the ISE of the classical density estimator for most samples, that is, presmoothing using this plug-in bandwidth selector is more efficient than using the classical density estimator. The above results should suffice to demonstrate the advantage of the presmoothing procedure, and suggest to consider the presmoothed density estimator using a plug-in bandwidth selector as a competitive method when estimating the density function with data subject to left truncation and right censoring.

9

Real data analysis

The new density estimator for LTRC data introduced in this paper has been applied to a real life example where both censoring and truncation are present. The problem, that has been analyzed in several studies (see Andersen et al [1], pag. 14, among others), concerns the population mortality in Fyn county (Denmark) suffering from insulin-dependent diabetes mellitus. The variable of interest, Y , is the lifetime of the patient, in this case, the time since the diagnosis until the patient dies. The data set corresponds to 1499 patients, 783 males and 716 females, reported on July 1st, 1973. It was obtained by recording all insulin prescriptions in the National Health Service files for five months (including the date above), and the survival status of each patient was assessed by January 1st, 1982. For this reason, we can only observe data of patients alive at the start of the study, and 25

Classical, Male Presmoothed, Male Classical, Female Presmoothed, Female

0.025

0.020

0.015

0.010

0.005

0.000

0

10

20

30

40

50

60

Survival time (in years)

Fig. 6. Classical (dashed lines) and presmoothed (solid lines) estimates of the density function of the survival time (in years) for male (thick lines) and female (thin lines) patients.

the information regarding the patients not alive or lost before that time is not available. This gives place to left truncation, which produces, if it is not taken into account, a biased estimation of the density function. On the other hand, the lifetime may be subject to right censoring, mainly because at the end of the study there may be patients still alive. Hence, with the above-mentioned notation, the censoring variable C, is the time from the diagnosis to the end of follow-up, Z is the the observed survival time and δ indicates the survival status. The global censoring proportion is 67.24%, 3 patients because of lost of follow-up, and 1005 because they were alive at the end of the study. The truncation variable is T , the time since de diagnosis of the diabetes to the entry of the study. Using a Gaussian kernel, we have computed the presmoothed density estimator, presented in Section 4, for the lifetime of male and female patients independently. The reason is that gender has a strong influence on the survival time (males have a higher mortality than females, see Iglesias-P´erez (2003)). Both density estimates are shown in Figure 6. The solid line takes P the presmoothed density estimator fs,b with the plug-in bandwidth selector introduced in Section 7, and the dashed line the classical estimator, fsT JW , with the plug-in bandwidth selector proposed by S´anchez-Sellero et al (1999). 26

Both estimates of the density function, regardless the gender, are similar. The reason is that the presmoothing bandwidth is quite small (bbPM ale = 0.6212 and b bPF emale = 0.1614), due to, in part, the large sample size. When the bandwidth P reduces to the classical bn is close to zero, the presmoothing estimator fs,b T JW . estimator fs Nevertheless, the effect of presmoothing is clear in a deeper insight, mainly in the right tail, where the presmoothed estimator takes larger values. The product-limit estimator FnT JW , on which fsT JW is based, does not take value 1 at the right tail if the last observation is censored. However, the presmoothed estimator FbP , is also a step function, but with jumps located at any of the observations, not only at the censored ones. This fact makes the presmoothed estimator FbP to take values closer to 1 at the right tail, which has important P implications when estimating the density function. This is the reason why fs,b describes better the tail behavior.

10

Appendix. Proof of the main results

Proof of Theorem 1 The term ΛPb (y) − Λ (y) can be decomposed as follows: ΛPb (y) − Λ (y) = P1 (y) − P2 (y) + P3 (y) + R1 (y) + R2 (y) + R3 (y), (43) where the summands that represent the main part of the iid representation are Z

m (v) d (Hn∗ (v) − H ∗ (v)) , Z y mb (v) − m (v) ∗ C (v) P (y) = dHn (v) . Z 3 C (v) aH ∗ y m (v) (Cn (v) − C (v)) dH (v) , P2 (y) = aH C 2 (v) P1 (y) =

y aH

The other terms will be proved to be negligible: Z

m (v) (C (v) − Cn (v)) d (Hn∗ − H ∗ ) (v) , 2 C (v) aH ! Z y 1 1 dHn∗ (v) , R2 (y) = − m (v) (Cn (v) − C (v)) C 2 (v) Cn (v) C (v) aH ! Z y dHn∗ (v) dHn∗ (v) − (mb (v) − m (v)) . R3 (y) = Cn (v) C (v) aH R1 (y) =

y

27

For R1 , U-statistics theory gives, under condition (Int1 ),

(ln n)1/2+ε sup |R1 (y)| = O a.s. for some ε > 0. n aH ≤y≤τ

(44)

The function C in (2) can be decomposed as a difference of two df, hence: sup aH ≤y≤bH

|Cn (y) − C(y)| = O(n−1/2 (ln ln n)1/2 ) a.s.

(45)

Applying (45), Lemma A.1 in Zhou and Yip (1999) and SLLN, under condition (Int1 ) the term R2 is ln ln n ln n sup |R2 (y)| = O n aH ≤y≤τ

!

a.s.

(46)

With respect to R3 , applying Theorem B in Mack and Silverman (1982), Lemma A.1 in Zhou and Yip (1999) and (45), we have

ln n sup |R3 (y)| = O b2n + n aH ≤y≤τ

!1/2

ln ln n n

!1/2

ln n a.s.

(47)

For the functions P1 , P2 and P3 , it is easy to show that

P1 (y) − P2 (y) =

n 1X (g P (y, Zi ) − g2P (y, Zi , Ti )), n i=1 1

n 1X g3P (y, Zi , δi ) + R4 (y), P3 (y) = n i=1

with g1P , g2P and g3P given in (14)-(16), and

R4 (y) =

Z

y

aH

−

Z

n 1X δj − m (v) d (Hn∗ − H ∗ ) (v) Kbn (v − Zj ) n j=1 C (v) h∗ (v) y

aH

(mb (v) − m (v)) (h∗n (v) − h (v)) ∗ dHn (v) = R41 (y) + R42 (y). C (v) h∗ (v)

For the second term in R4 , under conditions (K), (h1 ), (h2 ), (m1 ), (b1 ) and (Int2 ), applying Lemma 1 and Theorem B in Mack and Silverman (1982) and SLLN, we have

2 sup |R42 (y)| = O bn +

aH ≤y≤τ

28

!1/2 2 ln n a.s.

n

(48)

In order to handle the first term in R4 , we use U-statistics theory and similar arguments as those in the proof of Theorem 3.4 in J´acome and Cao (2007). Hence, it can be shown that

ln n (ln ln n)1/2 b2 + O sup |R41 (y)| = O n n n3/2 bn aH ≤y≤τ

!1/2 +O

ln ln n 1/2

nbn

!

a.s. (49)

Collecting (44) and (46)-(49), under condition (b1 ) it follows that

2 sup |R1 (y) + R2 (y) + R3 (y) + R4 (y)| = O bn +

aH ≤y≤τ

ln n nbn

!1/2 2

a.s.

This concludes the proof.

Proof of Theorem 2 Consider the decomposition (43). Condition (Int1 ) and the law of iterated logarithm give

ln ln n sup |P1 + P2 | = O n aH ≤y≤τ

!1/2 a.s.

For P3 , we apply Theorem B in Mack and Silverman (1982) and, under condition (Int3 ), the SLLN:

sup |P3 | = O b2n +

aH ≤y≤τ

!1/2 ln n a.s.

nbn

Both rates, together with (44), (46) and (47), prove the theorem.

Pro of of Prop osition 3 The expression of the bias comes from

h

i

h

i

E g1P (y, Z) = E g2P (y, Z, T ) = h

i

Z y m (v) aH

C (v)

dH ∗ (v) ,

E g3P (y, Z, δ) = α (y) dK b2n + o b2n . For the variance, straightforward calculations give 29

(50) (51)

V ar

V ar

h

g1P

h

g3P

i

(y, Z) = i

Z y m2 (u) aH

(y, Z, δ) =

E[g22 (y, Z, T )]

C 2 (u)

Z y

aH

∗

dH (u) −

Z y m (v) aH

C (v)

∗

!2

dH (v)

,

q1 (v)dv − 2bn eK (q1 (y) + q1 (aH )) + O b2n ,

= 2E(g1 (y, Z)g2 (y, Z, T )), Cov(g1 (y, Z)g3 (y, Z, δ)) = O(b2n ), Cov(g2 (y, Z, T )g3 (y, Z, δ)) = O(b2n ). This completes the proof.

The iid representation of FbP −F , given in Theorem 5, is based on that of ΛPb −Λ through a exponential transformation. For that purpose, a slight change in FbP must be carried out. The next lemma proves that this change is negligible. Lemma Consider 1−

Under condition

Z

τ aH

FebP

(y) =

Y

i:Zi ≤y

!

mb (Zi ) . 1− nCn (Zi ) + 1

dH1∗ (v) < ∞, then C 2 (v)

sup FbP (y) − FebP (y) = O n−1 (ln n)2 .

aH ≤y≤τ

(52)

Proof. The absolute value can be bounded as follows: X P Fb (y) − FebP (y) =

X mb (Zi ) mb (Zi ) ≤ 2 2 i:Zi ≤y n Cn (Zi ) i:Zi ≤y (nCn (Zi ) + 1) nCn (Zi )

1 ≤ n

! C (Z ) 2 Z y i max aH ≤Zi ≤y Cn (Zi ) aH

mb (v) ∗ dH (v) . C 2 (v) n

The SLLN and Lemma A.1 in Zhou and Yip (1999) prove the lemma.

Pro of of Theorem 4 Consider the following decomposition: 2 1 P Λb (y) − Λ (y) e−Λn1 (y) h i 2 P P e − ln 1 − Fb (y) + Λb (y) e−Λn2 (y) , (53)

FebP (y) − F (y) = (1 − F (y)) ΛPb (y) − Λ (y) −

with

30

min ΛPb (y) , Λ (y) ≤ Λn1 (y) ≤ max ΛPb (y) , Λ (y) ,

min ΛPb (y) , − ln 1 − FbP (y)

≤ Λn2 (y) ≤ max ΛPb (y) , − ln 1 − FbP (y)

The first term in the right hand side of (53) yields the dominant part of the iid decomposition of FbP (y) − F (y). Taking into account (52), it suffices to show that the two last terms in (53) are negligible. The terms exp [−Λn1 (y)] and exp [−Λn2 (y)] are bounded. So, using (17), we have for the second term in (53): sup aH ≤y≤τ

1 ΛPb

2

2

(y) − Λ (y)

exp [−Λn1 (y)] = O (nbn )−1 (ln n)3 a.s. (54)

Applying a Taylor expansion, the third term in (53) is

2 Z y C Z (i) sup n i:Z(i) ≤y Cn Z(i) aH

1 ln 1 − FebP (y) + ΛP (y) ≤ b

mb (v) dHn (v) . C 2 (v)

For Lemma A.1 in Zhou and Yip (1999) and the SLLN, we have:

sup ln 1 − FebP (y) + ΛPb (y) = O n−1 (ln n)2 .

aH ≤y≤τ

(55)

The proof is finished collecting (54) and (55).

Pro of of Theorem 10 Using integration by parts, we have

P fs,b

Z Z y−v 1 y−v 1 dF (v) K d FbP − F (v) + K (y) = sn sn sn sn Z i 1 h P Fb (y − sn v) − F (y − sn v) K 0 (v) dv + f (y) + βnP (y) , = sn

and therefore 1 P sup FbP (y) − F (y) sup fs,b (y) − f (y) = sn aH ≤y≤τ aH ≤y≤τ + sup βbP (y) .

Z

|K 0 (v)| dv

aH ≤y≤τ

The rate follows from supaH ≤y≤τ βnP (y) = O (b2n ) and Theorem 6 in J´acome and Iglesias-P´erez (2007), that states sup aH ≤y≤τ

P Fb (y) − F (y) = b2n +

31

ln n nbn

!1/2

a.s.

.

P

Pro of of Prop osition 11 Consider f s,b (y) = f (y)+βnP (y)+σnP (y), with βnP P and σnP given in (23) and (24), the iid representation of fs,b . The derivation of the bias is straightforward, taking into account (50) and (51). For the variance, we proceed as follows:

P

V ar f s,b (y) n n n 1 P 1 P 1 P ω3 (y, Zi , δi ) ω2 (y, Zi , Ti ) − ω1 (y, Zi ) + = V ar − n i=1 n i=1 n i=1 n n n 1 P 1 P 1 P ρ3 (y, Zi , δi ) ρ2 (y, Zi , Ti ) + ρ1 (y, Zi ) − + n i=1 n i=1 n i=1

(56)

with, for j = 1, 2, 3

ωj (y, ·) = ρj (y, ·) =

Z

Z

Ks (y − v) f (v) gj (v, ·) dv, Ks (y − v) (1 − F (v)) dgj (v, ·) . P

For the asymptotic expression of the variance of f s,b , we have used the following lemmas.

Lemma With the LTCR notation,

E [1 {T ≤ x ≤ Z} 1 {T ≤ y ≤ Z} |T ≤ Z] = α−1 1 − H (x ∨ y)−

L (x ∧ y) .

where x ∨ y = max(x, y) and x ∧ y = min(x, y). Lemma With the LTCR notation, under assumption (I ), #

"

m (Z) 1 {T ≤ x ≤ Z} 1 {Z ≤ y} |T ≤ Z = α−1 L (x) 1 {x ≤ y} E C (Z)

Z y m (u) x

C (u)

h (u) du.

Proof. First, we derive the following relation:

P [T ≤ t, Z ≤ z|T ≤ Z] = α−1 P [T ≤ t, Z ≤ z, T ≤ Z]

(57)

= α−1

(58)

Z tZ z 0

0

1 {t´≤ z´} dFT,Z (t´, z´) ,

where FT,Z (t, z) denotes the df of (T, Z) . So, the expectation is 32

"

m (Z) 1 {T ≤ x ≤ Z} 1 {Z ≤ y} |T ≤ Z E C (Z) # " m (Z) −1 1 {T ≤ x} 1 {x ≤ Z ≤ y} =α E C (Z)

#

and, under assumption (I), we obtain that #

"

m (Z) 1 {T ≤ x} 1 {x ≤ Z ≤ y} E C (Z) Z y m (u) h (u) du. = L (x) 1 {x ≤ y} x C (u)

Lemma With the LTCR notation, under assumption (I ) #

"

m (Z) 1 {T ≤ x ≤ Z} |T ≤ Z E Ks (y − Z) (1 − F (Z)) C (Z) Z m (u) h (u) 1 {x ≤ u} du. = α−1 L (x) Ks (y − u) (1 − F (u)) C (u)

Proof. It is inmediate by applying (58) and assumption (I) .

Lemma With the LTCR notation, under assumption (I )

E [Kbn (y − Z) (δ − m (y)) 1 {T ≤ x ≤ Z}} | T ≤ Z] −1

0

= α bn L (x) m (y) h (y)

Z

1 {w ≤ (y − x) /bn } ωK (ω) dω + O b2n .

Proof. Using the function

P [δ = 1, T ≤ Z | Z = z, T = t] P [T ≤ Z | Z = z, T = t] = 1 {t ≤ z} P [δ = 1 | Z = z, T = t]

m(z, t) = E [δ | T ≤ Z, Z = z, T = t] = (I)

= 1 {t ≤ z} P [δ = 1 | Z = z]

and 33

m(z) = P [δ = 1 | T ≤ Z, Z = z] = (I)

=

L(z)P [δ = 1 | Z = z] L(z)

P [δ = 1, T ≤ Z | Z = z] P [T ≤ Z | Z = z]

we obtain, under assumption (I), that m(z, t) = 1 {t ≤ z} m(z). This latter expression, together with (58), assumption (I) and a Taylor expansion, let us to write E [Kbn (y − Z) (δ − m (y)) 1 {T ≤ x ≤ Z}} | T ≤ Z] = E [Kbn (y − Z) (m (Z, T ) − m (y)) 1 {T ≤ x ≤ Z}} | T ≤ Z] = α−1

= α−1

Z

Z

Kbn (y − u) Kbn (y − u)

= −α−1 L (x) −1

Z

Z

Z

(m (u, t) − m (y)) 1 {t ≤ x ≤ u} dL (t) h (u) du [m (u) 1 {t ≤ u} − m (y)] 1 {t ≤ x ≤ u} dL (t) h (u) du

K (ω) (m (y − bn ω) − m (y)) h (y − bn ω) 1 {x ≤ y − bn ω} dω 0

= bn α L (x) m (y) h (y)

Z

ωK (ω) 1 {ω ≤ (y − x) /bn } dω + O b2n ,

which concludes the proof. As a consequence of the previous lemmas, together with some changes of variable, Taylor expansions and applications of assumptions such as the symmetry of the kernel K, standard although tedious and long-winded calculations yield the asymptotic expressions of the variances and covariances of the functions ωj and ρj for j = 1, 2, 3: Z

Z

2 y m m2 ∗ 2 ∗ h − f (y) V ar (ω1 (y, Z)) = f (y) h + O (sn ) , aH C aH C 2 ! Z " 2 # Z y y m mh∗ Z v mh∗ 2 −1 ∗ L − V ar (ω2 (y, Z, T )) = f (y) 2α h + O (sn ) , aH C 2 aH C aH C 2

2

V ar (ω3 (y, Z, δ)) = f (y)

Z

y

y

aH

q1 − 2bn f 2 (y) (q1 (y) + q1 (aH )) eK

m2 (y) ∗ 1 h (y) cK V ar (ρ1 (y, Z)) = (1 − F (y))2 2 sn C (y) m2 (y) ∗ 2 h (y) + O (sn ) , − (1 − F (y))2 2 C (y) !2

1 − F (y) ∗ h (y) m (y) V ar (ρ2 (y, Z, T )) = C (y) 34

!

L (y) (1 − H (y)) − 1 + O (sn ) , C 2 (y)

1 (1 − F (y))2 q1 (y) AK (L) + O (sn ) + o (bn ) s n if bn /sn → L ≥ 0

V ar (ρ3 (y, Z, δ)) = 1 (1 − F (y))2 q1 (y) cK + O (bn ) bn if bn /sn → ∞

For the covariances, we have

2

Cov (ω1 (y, Z) , ω2 (y, Z, T )) = f (y) 2

Z

y

aH

−f (y)

! m Z v mh∗ h L C aH C 2

Z

y

aH

mh∗ dv C

!2

Cov (ω1 (y, Z) , ρ1 (y, Z)) = f (y) (1 − F (y)) h∗ (y)

+ O (sn ) , m (y) C (y)

!

Z y m ∗ m (y) − h + O (sn ) , × 2C (y) aH C m (y) ∗ Cov (ω1 (y, Z) , ρ2 (y, Z, T )) = −f (y) (1 − F (y)) h (y) C (y) Z y mh∗ + O (sn ) , × aH C m (y) ∗ h (y) Cov (ω2 (y, Z, T ) , ρ1 (y, Z)) = f (y) (1 − F (y)) C (y) ! Z y Z y mh∗ mh∗ −1 × α + O (sn ) , L− aH C 2 aH C m (y) h∗ (y) Cov (ω2 (y, Z, T ) ρ2 (y, Z, T )) = f (y) (1 − F (y)) C (y) ! Z y Z y ∗ ∗ 1 − H (y) mh m L− × α−1 h∗ + O (sn ) , C (y) aH C 2 aH C 1 f (y) (1 − F (y)) q1 (y) + O (sn ) + o (bn ) 2

if bn /sn → L ≥ 0

Cov (ω3 (y, Z, δ) , ρ3 (y, Z, δ)) = and Cov (ρ1 (y, Z) , ρ2 (y, Z, T )) =

1 f (y) (1 − F (y)) q1 (y) + O (bn ) 2 if bn /sn → ∞ !2

1 − F (y) m (y) h∗ (y) C (y) 35

!

1 L (y) − 1 +O (sn ) . 2 C (y)

The remaining covariances involved in (56) are of order o(sn ) + o(bn ). Finally, P with standard calculations the variance V ar f s,b (y) is obtained. This concludes the proof.

11

Acknowledgement

We are very grateful to professor Per Kragh Andersen (department of Biostatistics, University of Copenhagen) for providing the Fyn diabetes data, which were collected by Dr. Anders Green. We also acknowledge the economic support of the Grant MTM2005-00429 (FEDER funding included) of the Spanish Ministerio de Educaci´on y Ciencia and XUGA Grant PGIDT03PXIC10505PN for the first author, and Grant MTM2005-01274 (FEDER funding included) of the Spanish Ministerio de Educaci´on y Ciencia for the second one.

References [1] P.K. Andersen, O. Borgan, R.D. Gill, N. Keiding, Statistical Models Based on Counting Process. Springer-Verlag, New York, 1993. [2] M.A. Arcones, E. Gin´e, On the law of the iterated logarithm for canonical U -statistics and processes, Stochastic Process. Appl. 58, (1995) 217-245. [3] R. Cao, Bootstrapping the mean integrated squared error, J. Multivar. Anal. 45, (1993) 137-160. [4] R. Cao, M.A. J´ acome, Presmoothed kernel density estimator for censored data, J. Nonparametr. Stat. 16, (2004) 289–309. [5] R. Cao, I. L´ opez-de-Ullibarri, J. Janssen, N. Veraverbeke, Presmoothed KaplanMeier and Nelson-Aalen estimators, J. Nonparametr. Stat. 17, (2005) 31–56. [6] D.R. Cox and E.L. Snell. Analysis of binary data. 2nd Edition, Chapman and Hall, London, 1989. [7] G. Dikta, On semiparametric random censorship models. J. Statist. Plann. Inference, 66, (1998) 253-279. [8] I. Gijbels, J.L. Wang, Strong representations of the survival function for truncated and censored data with applications, J. Multivariate Anal. 47, (1993) 210–229. [9] M.C. Iglesias-P´erez, W. Gonz´ alez-Manteiga, Strong representation of a generalized product-limit estimator for truncated and censored data with some applications. J. Nonparametr. Stat. 10, (1999) 213–244.

36

[10] M.C. Iglesias-P´erez, Estimaci´ on de la funci´ on de distribuci´ on condicional en presencia de censura y truncamiento: Una aplicaci´ on al estudio de la mortalidad en pacientes diab´eticos. Estad´ıstica Espa˜ nola, 45, (2003) 275-301. [11a] M.A. J´ acome, R. Cao, Almost sure asymptotic representation for the presmoothed distribution and density estimators for censored data. To appear in Statistics (2007). [11b] M.A. J´ acome, R. Cao, Bandwidth selection for the presmoothed density estimator with censored data. Submitted, available at http://www.udc.es /dep/mate/Dpto Matematicas/Investigacion/ie publicacion/Jacome Cao BSPDE.pdf (2007). [12] M.A. J´ acome, M.C. Iglesias-P´erez, Presmoothed estimation with left truncated and right censored data. Submitted, available at http://www.udc.es /dep/mate/Dpto Matematicas/Investigacion/ie publicacion/PresmLTRC.pdf (2007). [13] E.L. Kaplan, P. Meier, Nonparametric estimation from incomplete observations. J. Amer. Statist. Assoc. 53, (1958) 457-481. [14] S.H. Lo, Y.P. Mack, J.L. Wang, Density and hazard rate estimation for censored data via strong representation of the Kaplan-Meier estimator, Probab. Theory Related Fields, 80, (1989) 461-473. [15] D. Lynden-Bell, A method of allowing for known observational selection in small samples applied to 3CR quasars. Monthly Notices Roy. Astronom. Soc., 155, (1971) 95-188. [16] Y.P. Mack, B.M. Silverman, Weak and strong uniform consistency of kernel regression estimates. Z. Wahrsch. Verw. Gebiete, 61, (1982) 405-415. [17] E.A. Nadaraya, On estimating regression. Theory Probab. Appl. 10, (1964) 186190. [18] E. Parzen, On estimation of a probability density function and mode, Ann. Math. Statist. 33, (1962) 1065–1076. [19] M. Rosenblatt, Remarks on some nonparametric estimates of a density function, Ann. Math. Statist. 27, (1956) 832-837. [20] C. S´ anchez-Sellero, W. Gonz´ alez-Manteiga, R. Cao, Bandwidth selection in density estimation with truncated and censored data, Ann. Inst. Statist. Math. 51, (1999) 51-70. [21] W. Stute and J.L. Wang, A strong law under random censhorship, Ann. Statist. 21, (1993) 1591–1607. [22] L. Sun, L. Zhu, A semiparametric model for truncated and censored data, Statist. Probab. Lett. 48, (2000) 217–227. [23] W.Y. Tsai, N.P. Jewely, M.C. Wang, A note on the product limit estimator under right censoring and left truncation, Biometrika 17, (1987) 31–56.

37

[24] U. Uzunogullari, J.L. Wang, A comparison of hazard rate estimators for left truncated and right censored data, Biometrika, 79, (1992) 297-310. [25] G.S. Watson, Smooth regression analysis, Shankya Series A, 26, (1964) 359372. [26] Y. Zhou, A note on the TJW product-limit estimator for truncated and censored data. Statist. Probab. Lett., 69, (1996) 261-280. [27] Y. Zhou, P. Yip, A strong representation of the product-limit estimator for left truncated and right censored data, J. Multivariate Anal. 69, (1999) 261–280.

38

Reports in Statistics and Operations Research 2004 04-01 Goodness of fit test for linear regression models with missing response data. González Manteiga, W., Pérez González, A. Canadian Journal of Statistics (to appear). 04-02 Boosting for Real and Functional Samples. An Application to an Environmental Problem. B. M. Fernández de Castro and W. González Manteiga. 04-03 Nonparametric classification of time series: Application to the bank share prices in Spanish stock market. Juan M. Vilar, José A. Vilar and Sonia Pértega. 04-04 Boosting and Neural Networks for Prediction of Heteroskedatic Time Series. J. M. Matías, M. Febrero, W. González Manteiga and J. C. Reboredo. 04-05 Partially Linear Regression Models with Farima-Garch Errors. An Application to the Forward Exchange Market. G. Aneiros Pérez, W. González Manteiga and J. C. Reboredo Nogueira. 04-06 A Flexible Method to Measure Synchrony in Neuronal Firing. C. Faes, H. Geys, G. Molenberghs, M. Aerts, C. Cadarso-Suárez, C. Acuña and M. Cano. 04-07 Testing for factor-by-curve interactions in generalized additive models: an application to neuronal activity in the prefrontal cortex during a discrimination task. J. Roca-Pardiñas, C. Cadarso-Suárez, V. Nacher and C. Acuña. 04-08 Bootstrap Estimation of the Mean Squared Error of an EBLUP in Mixed Linear Models for Small Areas. W. González Manteiga, M. J. Lombardía, I. Molina, D. Morales and L. Santamaría. 04-09 Set estimation under convexity type assumptions. A. Rodríguez Casal.

2005 05-01 SiZer Map for Evaluating a Bootstrap Local Bandwidth Selector in Nonparametric Additive Models. M. D. Martínez-Miranda, R. Raya-Miranda, W. González-Manteiga and A. González-Carmona. 05-02 The Role of Commitment in Repeated Games. I. García Jurado, Julio González Díaz. 05-03 Project Games. A. Estévez Fernández, P. Borm, H. Hamers 05-04 Semiparametric Inference in Generalized Mixed Effects Models. M. J. Lombardía, S. Sperlich

2006 06-01 A unifying model for contests: effort-prize games. J. González Díaz 06-02 The Harsanyi paradox and the "right to talk" in bargaining among coalitions. J. J. Vidal Puga 06-03 A functional analysis of NOx levels: location and scale estimation and outlier detection. M. Febrero, P. Galeano, W. González-Manteiga 06-04 Comparing spatial dependence structures. R. M. Crujeiras, R. Fernández-Casal, W. González-Manteiga 06-05 On the spectral simulation of spatial dependence structures. R. M. Crujeiras, R. Fernández-Casal 06-06 An L2-test for comparing spatial spectral densities. R. M. Crujeiras, R. Fernández-Casal, W. González-Manteiga. 2007 07-01 Goodness-of-fit tests for the spatial spectral density. R. M. Crujeiras, R. Fernández-Casal, W. González-Manteiga. 07-02 Presmothed estimation with left truncated and right censores data. M. A. Jácome, M. C. Iglesias-Pérez

Previous issues (2001 – 2003): http://eio.usc.es/pub/reports.html