Communicated by Herbert Robbins, May 14, 1984. ABSTRACT. The paper ..... gale {IZn  &SnI:nl, s n s n2c} andusing Lemma 3 we have. E{U' c,[Nc s< n2J} ...
Proc. Nati. Acad. Sci. USA Vol. 81, pp. 59065909, September 1984 Statistics
Riskefficient estimation of the mean exponential survival time under random censoring (censored data/exponential distribution/sequential estimation)
JOSEPH C. GARDINERt
AND
V.
SUSARLAt
tMichigan State University, East Lansing, MI 48824; and tState University of New York, Binghamton, NY 13901
Communicated by Herbert Robbins, May 14, 1984
Since 6 is unknown, the minimum risk R° and optimal sample size no are also unknown, and thus we are led to consider a sequential scheme for estimating 6. In this article, we propose one such procedure in which the sample size is given by a random stopping number N(c), and the associated risk R * ELN(c) satisfies R*/R° * 0; that is, our scheme is asymptotically riskefficient. Sequential procedures analogous to the one outlined here have been considered, in the absence of censoring, by several researchers, beginning with the pioneering work of Robbins (1) for the estimation of the mean of a normal population. This was later extended by Starr (2) and Starr and Woodroofe (3). Sequential point estimation of the exponential mean is considered by Starr and Woodroofe (4). Gardiner and Susarla (5) made the first examination of the problem of sequential estimation of the mean survival time in the presence of censoring when both the underlying survival time and censoring distributions are unspecified. When censoring is absent, similar procedures for estimation of functionals of an unspecified distribution were discussed by Sen and Ghosh (6) and by Ghosh and Mukhopadhayay (7), with some extensions and refinements by Chow et al. (8, 9). Throughout the rest of the paper, the terms involving the random variable [E," 1 6i = 01 are left out without any further indication, because P[l;,=1 6i = 0] goes to zero at an exponential rate and all our scale factors will be like n~a with a > 0.
The paper proposes a sequential estimator 0 ABSTRACT of the parameter 0 of an exponential distribution when the data are censored. It is shown that 0 is asymptotically risk efficient when the loss is measured by the squared error of estimation of 0 plus a linear function of the number of observations and that 0 is asymptotically normal as the cost per observation goes to zero.
=
1. Introduction
In longitudinal investigations, the estimation of the mean survival time is of basic importance. This is usually based on data gathered from a sample of n identical units in a clinical trial or life test. However, it is often the case in survival studies that the lifetimes are not all observable, due to random withdrawals and consequent loss to followup. Accordingly, for each survival time X we envisage a competing censoring time Y, but the only datum available is (Z, 6) where Z = min(X, Y) and 6 = 1 whenever X < Y, and 8 = 0 otherwise. We assume X to be independent of Y. Suppose {(Z,, Si):1 c i s n} is a random sample of size n, the Xi having a common exponential survival function F(t) = exp(t/0), t > 0, with 0(>0) unknown, and the censoring times Y, having a common (unknown) survival function G. We consider the sequence of estimators {6n} of 6 given by n
if
I
2i , 1
2. The sequential procedure
i=1
OnI~~~
0~
=
n
if I 6s = 0. i=l
Let m(.1) be an initial sample size. Motivated by Eq. 1.4 we define the stopping number Nc [N(c)] by
[1.1]
6,, is seen to converge a.s. to 6 as n * oo.
NC
The loss incurred in estimating 6 by on is taken to be
Ln(a, c)
=
a(6On  6)2 + cn
[1.2]
=
min{n 2
m:n
' (c)
(an +
nV)}
[2.1]
where y > 0 is a constant and dn is an appropriate estimator of 0.2(6) 02/E6 = 0N(J;7FG)1. Notice that by the central limit theorem we have from Eq. 1.1 =
where a and c are positive constants. With the definition of Eq. 1.1 we can show that
E(On6)2
= (62/E8)n1 + o(n1),
n1/2(O,,
[1.3]
whence the "risk" is
R,(a, c)
=
EL,(a, c)
=
(a 02/E6)n1 + cn +

2cno and no

(a62/cE6)"2.
[2.2]
An appropriate estimator of a2(0) based on {(Zi, 6,):l c i n} is 2,, where
o(n'1).
an =
To minimize this risk (with respect to n), we must take a sample of size no with subsequent minimum risk R' where (for small c)
R°
a) o N[O, a2(6)].
[1.4] I
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.
[2.3]
with the bar denoting the sample mean and [A] denoting the indicator of the set A. Observe that for each fixed 6, on converges a.s. to 0.2(0) as n  oo. Our proposed sequential scheme will take N(c) observations and estimate 6 by ON(c), with resulting risk R*
5906
RIZn n={ZIS[5[A 0],
0,
5907
Now
.2
BR*/2 l as c 4 0.
bn(9n  E8),
2 = an(Z EZ) 

[3.5]
where
Furthermore, THEOREM 2. As mptotic normality of ON(c) As c 4 0, for each fixed 6, {VN(c) (ON(c)  6)/6N(c)} converges in distribution to a standard normal random variable. The proofs of these theorems are outlined in the next section. We begin with a demonstration of the expansion (Eq. 1.3) and a lemma concerning the rate of growth of N(c) as c 4 0.
=
(Zn
+
EZ)6n3[&n # 0]
and
bn = (EZ)2(E6)V3[2 +8,nE8 + (E82]83[8n Since an as n a
3. Proofs
00,
E(V k[V > 4]) < K(np) k,
6 = UnVn

[3.1]
where Un = (Zn  068n); Vn = An [8n
0] 01
[3.2]
By using Lemma I and the MarcinkiewiczZygmundHolder inequality, we have Lemma 2. LEMMA 2. For any p 2 1, EIUnI2P  0(nP) and EIVnjP _
0(1).
Before proceeding further we point out that Eq. 1.3 follows from the equality On  6 = (E8), U {1  Vn(86,  E where U, and Vn are defined by Eq. 3.2. The next lemma gives a convergence rate (as c 4 0) for the stopping number Nc of Eq. 2.1. First, note that Nc 2 (a//2(1+ ) a.s., so that Nc T 00 a.s. as c 4 0. In the sequel, [x] denotes the greatest integer x and kj, k2, ... denote constants independent of c, but possibly depending on G and 6. LEMMA 3. Let 0 < e < 1 be arbitrary. Then, as c 4 0
P{Nc ' [no(1

P{Nc > [ng(l
e)]} = 0(c), and +
E)]}
=
0(c).
[3.3]
Proof: Write b = (a/c)"2 and nc = [b"2('+')] n2c = [nC(  E)]. On the set [Nc s n2C], n 2 ban for some n E {nic,..... n2C}. Therefore, n.=nlc
p U
(c5~ 
where o2 = 02(6)
=
=[max njc¢rnsn2c
b2n22

21
2
Zn EZI
>
+ P max
77]
njc:snstqc
I nE86
>
7R] [3.6]
for some 7 > 0. Each of the two terms in Eq. 3.6 is handled in the same fashion. Observe that {IZ,  EZI:nic n < n2c} is a reverse submartingale to which the Kolmogorov maximal inequality applies. Thus,
k2EIZnl
P[max Zn  EZI 2 71]7
EZI4+4Y  O(C),
nlc,5n2cc
where k, is a constant depending only on y. Combining Eqs. 3.4 through 3.6 yields the first statement of Lemma 3. The second is proven in much the same way. We may utilize the expansion in Eq.' 3.5 to obtain the uniform integrability of {NA/nc0 < c < co} for some cO sufficiently small'; from this it will follow that E{Nc/nt}  1 as c 4 0. To this end, note from Eq. 2.1 that 0.N(c)/0. < Ncnc c ON(c)1/0 + [N(c) 
Then, since Nc T
oo a.s. (as c
i]f V1
+
(bor)l.
4 0), we obtain Nc/no* 1 a.s.
Furthermore, (Nc/nj2 c
8{[4N(C)_1/a]2
+
or2[N(c) ]2i
+
(h)2}.
Thus, in order to show that E[supc.c (N /nNc2] mdA/02  I) for sufficiently large m. However, from Eq. 3.5 and the convergences an ao, bn . bo a.s. it suffices to establish the finiteness of E(supnlmlZn  EZI) and E(suPnm13n E81). The second is trivial, while the first is bounded by 4 times EIZm ' EZJ2, since {fZZn  EZI:n m} is a reverse submartingale. Proof of Theorem 2: Recall Eqs. 3.1 and 3.2. From the ,¶ 62E8) for each 6 > 0. central limit theorem n"12Un A0N Also, Vn  (E)' a.s. Hence, n 2(6,  6)XN(0, 2). We have noted that &2,, a02 a.s. (as n * oo) and Nc oo asc 4 0. Hence, by Eqs. 3.8, 3.13, and 3.15 below, we get
2)]c
V/NIY (O'N(c)
02/E6,
{c p n=nP U [1I2_ c
02
n3cl}

[3.13]
= 0,
where n3, = [nG(1 + E)]. Therefore, in order to verify Eq. 3.7 we are left with proving that
lim(cn"c'E{(6N(C) )2[n2c 40~
~
~
~
However, from Eq. 1.3 from the inequality E{(ON(C)s E112{( N(C)
~
0, with nGn = 1=1 Gi, needs to be imposed.
Proc. Nadl. Acad Sci USA 81 (1984)
Statistics: Gardiner and Susarla A final comment concerns the damping factor n introduced in the definition (Eq. 2,1) of the stopping random variable N,. Without this factor, the sequential estimator ON may not be asymptotically risk efficient, as the following argumdnt suggests. Define
N,,
=
min{n 2
m:n >
(a)1a"}
[4.1]
where an is defined by Eq. 2.3. With these definitions, it can be seen that if the censoring distribution G has a positive mass p at zero, then the risk R *1 of the sequential procedure oNr, is at least 02pm and R *,1/R * oo as c 0, and hence N,1 is asymptotically risk inefficient. Thuss, the factor n tin the definition of Nc appears to be important.
5909
This research was supported in part by the National Institutes of Health Grant 1RO1GM28030 and by the Office of Naval Research under Contract N0001479C0522 (to J.C.G.); and by the National Institutes of Health Grant lR01GM28405 (to V.S.). 1. Robbins, H. (1959) Probability and Statistics, H. Cramer Volume (Almquist & Wiksell, Stockholm, Swedeh), pp. 235245. 2. Starr, N. (1966) Ann. Math. Stat. 37, pp. 11731185. 3. Starr, N. & Woodroofe (1969) Proc. Natl. Acad. Sci. USA 61, 285288. 4. Starr, N. & Woodroofe (1972) Ann. Math. Stat. 43, 11471154. 5. Gardiner, J. & Susarla, V. (1983) Commun. Stat. Sequential Analysis 2, 201223. 6. Sen, P. K. & Ghosh, M. (1981) Sankhya Ind. J. Stat. Ser. A, 43, 331343. 7. Ghosh, M. & Mukhopadhayay (1978) Commun. Stat. Part A 8, 637652. 8. Chow, Y. S. & Yu, K. F. (1981) Ann. Stat. 9, 184189. 9. Chow, Y. S. & Martinsek, A. T. (1982) Ann. Stat. 10, 909914.