Efficient estimation for additive hazards regression ...

7 downloads 0 Views 1MB Size Report
Mar 12, 2012 - employ the copula model and an efficient estimation procedure is developed for inference. Simulation studies ... 0 λ0k(s)ds, the cumulative baseline hazard function of Tk and Zk(t) = ∫ t. 0 Xk(s)ds, ...... ENOCHS Edgar E. .... A three level linearized compact difference scheme for the Cahn-Hilliard equation .
Editorial Board

Supported by NSFC

Honorary Editor General Editor General Editor-in-Chief Executive Associate Editor-in-Chief

ZHOU GuangZhao (Zhou Guang Zhao) ZHU ZuoYan

Institute of Hydrobiology, CAS

YANG Le (Yang Lo)

Academy of Mathematics and Systems Science, CAS

LI WenLin

Academy of Mathematics and Systems Science, CAS

CAO DaoMin Academy of Mathematics and Systems Science, CAS

LI JianShu The Hong Kong University of Science and Technology

XI NanHua Academy of Mathematics and Systems Science, CAS

CHEN ZhiMing Academy of Mathematics and Systems Science, CAS

LI KeZheng Capital Normal University

XIAO Jie Tsinghua University

LIN FangHua New York University

XIN ZhouPing The Chinese University of Hong Kong

LIU KeFeng University of California at Los Angeles

XING ChaoPing Nanyang Technological University

Members

CHENG ChongQing Nanjing University DU DingZhu University of Texas at Dallas DUAN HaiBao Academy of Mathematics and Systems Science, CAS E WeiNan Princeton University FAN JianQing Princeton University FENG KeQin Tsinghua University FENG Qi Academy of Mathematics and Systems Science, CAS GAO XiaoShan Academy of Mathematics and Systems Science, CAS GE LiMing Academy of Mathematics and Systems Science, CAS GRAHAM Fan Chung University of California at San Diego HONG JiaXing Fudan Unviersity JI LiZhen University of Michigan LAU Ka-Sing The Chinese University of Hong Kong

LIU ZhangJu Peking University MA XiaoNan Université Denis Diderot - Paris 7 MA ZhiMing Academy of Mathematics and Systems Science, CAS MOK NgaiMing The University of Hong Kong

XU Fei Academy of Mathematics and Systems Science, CAS YAU Shing-Tung Harvard University YE XiangDong University of Science and Technology of China

QIN HouRong Nanjing University

YUAN YaXiang Academy of Mathematics and Systems Science, CAS

SHI ZhongCi Academy of Mathematics and Systems Science, CAS

ZHANG JiPing Peking University

SHU Chi-Wang Brown University

ZHANG Ping Academy of Mathematics and Systems Science, CAS

SIU Yum-Tong Harvard University WANG FengYu Beijing Normal University WANG QiHua Academy of Mathematics and Systems Science, CAS WANG XuJia Australian National University WANG YueFei Academy of Mathematics and Systems Science, CAS

ZHANG ShouWu Columbia University ZHANG WeiPing Nankai University ZHOU XiangYu Academy of Mathematics and Systems Science, CAS ZHU XiPing Sun Yat-Sen University ZUO Kang Johannes Gutenberg-University Mainz

Editorial Staff CHAI Zhao

YANG ZhiHua

ZHANG RuiYan

[email protected]

[email protected]

[email protected]

SCIENCE CHINA Mathematics

. ARTICLES .

April 2012 Vol. 55 No. 4: 763–774 doi: 10.1007/s11425-012-4381-3

Efficient estimation for additive hazards regression with bivariate current status data TONG XingWei1,∗ , HU Tao2 & SUN JianGuo3,4 1School

of Mathematical Sciences, Beijing Normal University, Beijing 100875, China; of Mathematical Sciences, Capital Normal University, Beijing 100048, China; 3School of Mathematics, Jilin University, Changchun 130012, China; 4Department of Statistics, University of Missouri, Columbia, MO 65211, USA Email: [email protected], [email protected], [email protected]

2School

Received January 5, 2010; accepted April 1, 2011; published online March 12, 2012

Abstract This paper discusses efficient estimation for the additive hazards regression model when only bivariate current status data are available. Current status data occur in many fields including demographical studies and tumorigenicity experiments (Keiding, 1991; Sun, 2006) and several approaches have been proposed for the additive hazards model with univariate current status data (Lin et al., 1998; Martinussen and Scheike, 2002). For bivariate data, in addition to facing the same problems as those with univariate data, one needs to deal with the association or correlation between two related failure time variables of interest. For this, we employ the copula model and an efficient estimation procedure is developed for inference. Simulation studies are performed to evaluate the proposed estimates and suggest that the approach works well in practical situations. An illustrative example is provided. Keywords function MSC(2010)

bivariate current status data, copula model, counting processes, efficient estimation, joint survival

62N01, 62F12

Citation: Tong X W, Hu T, Sun J G. Efficient estimation for additive hazards regression with bivariate current status data. Sci China Math, 2012, 55(4): 763–774 , doi: 10.1007/s11425-012-4381-3

1

Introduction

This paper discusses efficient estimation for the additive hazards regression model when only bivariate current status data are available. The additive hazards model is one of the most commonly used regression models in failure time data analysis and methods developed for its inference when univariate failure time data are available [14, 15, 24]. Current status data arise in many fields including demographical studies, epidemiology and tumorigenicity experiments [13, 20]. In these situations, the failure time of interest is not observed but known only to be smaller or larger than an observation or a monitoring time. Bivariate current status data occur if there exist two related failure times of interest and only current status data are available for one or both failure times. A number of authors have considered regression analysis of univariate current status data. For example, Keiding [13] provided several examples of univariate current status data and some general discussion about the analysis. Huang [10] considered the use of the proportional hazards model and studied the efficient estimation approach, while Huang and Rossini [11] investigated the use of the proportional odds ∗ Corresponding

author

c Science China Press and Springer-Verlag Berlin Heidelberg 2012 

math.scichina.com

www.springerlink.com

764

Tong X W et al.

Sci China Math

April 2012

Vol. 55

No. 4

model for the analysis and developed a sieve estimation approach. Lin et al. [15] and Martinussen and Scheike [17] discussed the analysis using the additive hazards model and proposed an estimating equation approach and an efficient estimation approach, respectively. Jewell et al. [12] considered the estimation of univariate cumulative distribution functions. Several methods have been proposed for the analysis of bivariate current status data. In this case, in addition to the difficulties that one has to face in dealing with univariate data, another major issue that one needs to address is the association or correlation of the two failure times of interest. Among others, [22] and [4] investigated the estimation of the association parameter and the independence of the two survival variables, respectively. [5] provided some Bayesian approaches for the problems. [23] used the copula model with the marginal proportional hazards model. The remainder of the paper is organized as follows. We begin in Section 2 with introducing some notation and describing models for the failure times of interest. We will assume that the two failure times follow the additive hazards model marginally and their joint survival function can be described by a copula model. In Section 3, we derive the efficient score function and the information bound for regression parameters and Section 4 presents efficient estimation procedures for these parameters. Simulation results obtained for assessing the proposed estimation approach are given in Section 5 and indicate that the approach works well for practical situations. Section 6 gives an illustrative example and Section 7 contains some concluding remarks.

2

Notation and models

Consider a survival study and let T1 and T2 denote two related failure times of interests and X1 (t) and X2 (t) p-dimensional vectors of covariates that may depend on time and affect T1 and T2 , respectively. Also let λk (t) and Sk (t) denote the marginal hazard and survival functions of Tk , respectively, and S(t1 , t2 ) = P (T1 > t1 , T2 > t2 ) the joint survival function of T1 and T2 , k = 1, 2. In the following, we will assume that the marginal hazard function λk (t) can be described by the additive hazards model [14, 15]. More specifically, we consider the situation where λk (t) has the form λk (t) = λ0k (t) + Xk (t) β

(1)

given Xk , where λ0k (t) is an unknown baseline hazard function and β denotes the vector of regression parameters, k = 1, 2. Note that in the model above, without loss of generality, it is assumed that the covariate effects are the same. If they are different, one can easily define a common β through the introduction of extra type-specific covariates [7, 8]. t t Define Λ0k (t) = 0 λ0k (s)ds, the cumulative baseline hazard function of Tk and Zk (t) = 0 Xk (s)ds, k = 1, 2. Then the cumulative hazards functions are Λk (t) = Λ0k (t) + Zk (t)β and we have Sk (t) = exp{−Λ0k (t) − Zk (t)β}.

(2)

For the joint distribution of T1 and T2 , we assume that it follows a copula model given by S(t1 , t2 ) = Cα (S1 (t1 ), S2 (t2 )),

(3)

where Cα is a genuine survival function on the unit square and α ∈ R is a global association parameter. The copula family has gained considerable attention in modeling bivariate failure times in recent years because of its desirable features (see [3, 6, 9, 18, 22]). Different copula functions Cα (u, v) can be selected for different considerations. For example, one of the commonly used models is the Clayton model given by Cα (u, v) = (u1−α + v 1−α − 1)1/(1−α) . A more general class of copula models is the Archimedean copula family given by Cα (u, v) = φ−1 α (φα (u)+ φα (v)), where φ is a decreasing convex function defined on [0, 1] with φ(1) = 0. In this paper, we use the copula model with finite, say 1, parameters. Another attractive feature of the copula model is that the dependence structure is modeled separately from the marginal distributions. Specifically, the parameter 11 α measures the global association and relates to the Kendall’s τ through τ = 4 0 0 Cα (u, v)dudv − 1.

Tong X W et al.

3

Sci China Math

April 2012

Vol. 55

No. 4

765

The efficient score function and information bound

In this section, we will derive the efficient score function and the information bound for the parameter θ = (β  , α) (see [2]) based on bivariate current status data. Specifically suppose that for failure times T1 and T2 , there exists an observation or monitoring time C and one only observes {C, X1 , X2 , δ1 = I(T1  C), δ2 = I(T2  C)}, where C ∈ (0, M0 ]. Here M0 is a constant such that P (T1  M0 ) > 0 and P (T2  M0 ) > 0 and for real data, M0 can be taken to be the largest observation time. That is, we only observe that if the survival events represented by T1 and T2 have occurred before or after the monitoring time C. Generally, it is common to use two censoring times C1 and C2 separately. But as Wang and Ding [22] noted, if T1 and T2 represent the events from the same individuals, then C1 = C2 = C is likely to be the case in most applications. Furthermore, C may be discrete and continuous variables. In the following, we will derive the efficient score function and the information bound for θ. It will be assumed that C is independent of T1 and T2 given the covariates. Define counting processes N00 (t) = (1 − δ1 )(1 − δ2 )I(C  t), N10 (t) = δ1 (1 − δ2 )I(C  t),

N01 (t) = (1 − δ1 )δ2 I(C  t),

N11 (t) = δ1 δ2 I(C  t).

Then it can be easily shown that the intensity function of Njm has the form Y (t)λc (t)Sjm (θ, t) (see [15]), j = 0, 1 and m = 0, 1. Here Y (t) = I(C  t), λc (t) denotes the hazard function of C, S00 (θ, t) = P (T1  t, T2  t) = 1 − S1 (t) − S2 (t) + Cα (S1 (t), S2 (t)), S10 (θ, t) = P (T1  t, T2  t) = S1 (t) − Cα (S1 (t), S2 (t)), S01 (θ, t) = P (T1 < t, T2 > t) = S2 (t) − Cα (S1 (t), S2 (t)),

(4)

S11 (θ, t) = P (T1 > t, T2 > t) = Cα (S1 (t), S2 (t)). Furthermore, the log-likelihood contribution can be written as l(θ, Λ01 , Λ02 ) =

1  1   j=0 m=0

M0

log{Sjm (θ, t)}dNjm (t).

(5)

0

To derive the efficient score function, define Du = ∂Cα (u, v)/∂u|u=S1 ,v=S2 , Dv = ∂Cα (u, v)/∂v|u=S1 ,v=S2 and Dα = ∂Cα (S1 , S2 )/∂α. Also for j = 0, 1 and m = 0, 1, define the scalars ajm1 = −(−1)1−j [1 − m + (−1)1−m Du ]S1 , ajm2 = −(−1)1−m [1 − j + (−1)1−j Dv ]S2 , ajm = ajm1 + ajm2 , m+j Dα , Bjm = a−1 jm (−1)

the vectors Zjm1 = (Z1 , Bjm ) , Zjm2 = (Z2 , Bjm ) , Ajm = (ajm1 , ajm2 ) , the matrix Zjm = (Zjm1 , Zjm2 ) and dMjm (t) = dNjm (t) − Y (t)λc (t)Sjm (t)dt, which is a martingale. It is easy to verify that Sjm (θ, t) = (1 − j)(1 − m) + (−1)1−j (1 − m)S1 + (−1)1−m (1 − j)S2 + (−1)m+j Cα (S1 , S2 ),

766

Tong X W et al.

Sci China Math

April 2012

Vol. 55

No. 4

∂Sk (t)/∂β = −Sk Zk , ∂Cα (S1 , S2 )/∂β = −Du S1 Z1 − Dv S2 Z2 , ∂Sjm /∂θ = ajm1 Zjm1 + ajm2 Zjm2 . We show in Appendix A.1 that the efficient score function and the information bound for θ have the forms 1  1  M0  (Zjm − HG−1 )Ajm dMjm (t) l˙θ∗ = Sjm j=0 m=0 0 and I(θ) = E l˙θ⊗2 ∗ =



1 1  

M0

E

−1

((Zjm − HG

H=

j,m

1

j=0

1

m=0

Ajm = 0 and

l˙θ∗ =

)Ajm )

0

j=0 m=0

respectively, where a⊗2 = aa for a vector a,  −1 E(Ajm Ajm Sjm Y λc ), G= By noting that

⊗2

1

j=0

1  1   j=0 m=0

M0 0

 j,m

1

m=0

−1 Sjm Y

 λc dt ,

(5)

−1 E(Zjm Ajm Ajm Sjm Y λc ).

Zjm Ajm = 0, one can rewrite l˙θ∗ as

(Zjm − HG−1 )Ajm dNjm (t). Sjm

(7)

In the next section, we will apply l˙θ∗ to an estimation procedure for θ.

4

Estimation of parameters

Consider a failure time study that consists of n independent subjects and in which each subject gives rise to two related failure times of interest. As before, suppose that only current status data are available and given by (Ci , X1i , X2i , δ1i , δ2i )ni=1 , the n i.i.d. copies of (C, X1 , X2 , δ1 , δ2 ). Let Ajmi , Zjmi , Yi , Sjmi and Njmi be Ajm , Zjm , Y , Sjm and Njm defined in the previous section but with respect to subject i. To estimate θ, based on (7), it is natural to apply the empirical version of the efficient score function l˙θ∗ , √ multiplied by n, given by n 1 1  1    M0 (Zjmi − Hn G−1 n )Ajmi √ dNjmi (t) U (θ, Λ01 , Λ02 ) = n i=1 j=0 m=0 0 Sjmi

if Λ01 and Λ02 are known. Here n

Gn = and

1  −1 Ajmi Ajmi Sjmi Yi λci n i=1 j,m n

1  −1 Hn = Zjmi Ajmi Ajmi Sjmi Yi λci , n i=1 j,m

are the empirical estimators of the functions H and G, respectively, where λci denotes the hazard function of Ci . In practice, of course, Λ01 and Λ02 are unknown. For this, we will assume that there exist consistent ˆ 01 and Λ ˆ 02 with the convergence rate o(n1/4 ). Some comments on this will be given below. For estimates Λ estimation of θ, we consider two situations. First, suppose that the monitoring times Ci ’s are independent

Tong X W et al.

Sci China Math

April 2012

Vol. 55

No. 4

767

of covariates. That is, λci is the same for all subjects. In this case, one can define the estimate θˆ1 of θ as ˆ 01 , Λ ˆ 02 ) = 0, where the solution to the estimated empirical efficient score equation U1 (θ) = U (θ, Λ 1 1  n ∗ ∗ − Hn∗ G∗−1 1    M0 (Zjmi n )Ajmi U1 (θ) = √ dNjmi (t) ∗ Sjmi n i=1 j=0 m=0 0 ∗ ∗ and Zjmi , Hn∗ , G∗n , A∗jmi and Sjmi are Zjmi , Hn , Gn , Ajmi and Sjmi , respectively, with Λ01 and Λ02 replaced by their estimates. Note that in this situation, the hazard function λci is canceled out in U (θ, Λ01 , Λ02 ). In Appendix A.2, we prove that as n → ∞,

 01 , Λ  02 ) = U (θ0 , Λ01 , Λ02 ) + op (1) U (θ0 , Λ

(8)

and U (θ0 , Λ01 , Λ02 ) converges in distribution to the normal random vector with mean zero and covariance matrix I(θ0 ), where θ0 denotes the true value of θ. It follows that as n → ∞, we have √ n(θˆ1 − θ0 ) → N (0, I −1 (θ0 )) and a consistent estimate of I(θ0 ) is given by n 1 1  −1 nG  ⊗2 1    M0 (Zjmi − H n )Ajmi ˆ I(θ1 ) = dNjmi (t), n i=1 j=0 m=0 0 Sjmi

(9)

ˆ n, G ˆ n , Aˆjmi and Sˆjmi are Zjmi , Hn , Gn , Ajmi and Sjmi , respectively, with Λ01 and Λ02 where Zˆjmi , H replaced by their estimates and θ replaced by θˆ1 . The derivation of this consistent estimate is given in Appendix A.3. Now we consider the situation where the Ci ’s may depend on covariates. That is, λci may depend on Xi . In this case, U (θ, Λ01 , Λ02 ) depends on λci through Hn and Gn . For this, we assume that the censoring time Ci follows the proportional hazards model with the hazards function λci given by λci (t) = λc0 (t) exp(Xi γ), where λc0 (t) is a baseline hazard function. Note that on the Ci ’s, one has right-censored data or complete data and thus can easily estimate γ by the partial likelihood estimate γˆ (see [1]). Under the model given above, we have n

Gn = and

1  −1 Ajmi Ajmi Sjmi Yi exp(Xi γ)λc0 n i=1 j,m n

Hn =

1  −1 Zjmi Ajmi Ajmi Sjmi Yi exp(Xi γ)λc0 . n i=1 j,m

For the estimation of θ, as before, it is natural to use the solution to the estimating equation U2 (θ) = ˆ 01 , Λ ˆ 02 , γˆ ) = 0, where U (θ, Λ n 1 1  ∗ ∗ − Hn∗ G∗−1 1    M0 (Zjmi n )Ajmi U2 (θ) = √ dNjmi (t). ∗ Sjmi n i=1 j=0 m=0 0 ∗ ∗ , Hn∗ , G∗n , A∗jmi and Sjmi are Zjmi , Hn , Gn , Ajmi and Sjmi with Λ01 , Λ02 and γ Here, as before, Zjmi replaced by their estimates. In this case, the term λc0 is also canceled out since it appears on Hn and Gn simultaneously. Let θˆ2 denote the solution defined above. Then it can be shown that as θˆ1 , θˆ2 is consistent and n1/2 (θˆ2 − θ0 ) converges in distribution to the normal random vector with mean zero and covariance matrix I(θ0 ), which can be consistently estimated by  n 1 1    −1  ⊗2 1    M0 (Z jmi − Hn Gn )Ajmi ˆ I2 (θ2 ) = dNjmi (t), n i=1 j=0 m=0 0 Sjmi

768

Tong X W et al.

Sci China Math

April 2012

Vol. 55

No. 4

ˆn, G ˆ n , Aˆjmi and Sˆjmi are Zjmi , Hn , Gn , Ajmi and Sjmi , respectively, with Λ01 , Λ02 where again Zˆjmi , H and γ replaced by their estimates and θ replaced by θˆ2 ˆ 0k of Λ0k To implement the estimation procedures proposed above, one needs a consistent estimate Λ −1/4 ˆ ˆ such that supt |Λ0k − Λ0k | = op (n ). For this, a simple and easy approach is to base the estimation on the marginal data {(Ci , δik , Xik )ni=1 }. For example, when C is discrete, it has been shown that the resulting maximum likelihood estimate of the marginal cumulative hazard function has the n1/2 convergent rate [21]. In the following, we will take the commonly used sieve estimation approach. Among others, Huang and Rossini [11] discussed the sieve estimation for the proportional odds model with univariate current status data and showed that the resulting estimate of the nonparametric component has the convergence rate n1/3 . By using the arguments similar to those in [11] or directly verifying the Theorem 2 in [19], one can show that the sieve estimate of Λ0k has the same convergence rate.

5

Simulation studies

This section reports some results obtained in the simulation studies carried out to examine the performance of the proposed estimation procedures for practical situations. In the study, it was assumed that X1 = X2 = X with X generated from the Bernoulli distribution with success probability 0.5. For the survival times T1 and T2 , we took λ01 (t) = 1 and λ02 (t) = 2t and used the commonly used Clayton family to define the joint survival function. That is, we assumed that the joint survival distribution of T1 and T2 has the form S(t1 , t2 ) = {S11−α (t1 ) + S21−α (t2 ) − 1}1/(1−α) . This gives that S1 (t) = exp(−t − Xβt), S2 (t) = exp(−t2 − Xβt) and τ = (α − 1)/(α + 1). To generate the monitoring time C, for the case that C is independent of the covariate, it was assumed that C follows the uniform distribution U (0, 1). For the case that C may depend on X, C was generated from the exponential distribution with λc (t) = 0.5 + 0.5X. Since the Kendall’s τ has a more direct interpretation of the association than α, we focus the estimation of τ below instead of α. Table 1 presents the simulation results on estimation of β and τ for the situation where C is independent of X with the true values β0 = −0.5, 0 or 0.5 and α0 = 0.5 or 2, which corresponds to negative correlation τ0 = −1/3 or positive correlation τ0 = 1/3, respectively. The table includes the estimated bias (Bias) of the proposed estimates given by the averages of the estimates minus the true values, the sample standard deviations (SSD) of the estimates, the averages of the estimated standard errors (ESE), and the empirical 95% coverage probabilities (CP). These results indicate that the proposed estimates seem to be unbiased and the estimated error is close to the sample standard deviation. Also the empirical coverage probability seems to be reasonable and as expected, the results become better when the sample size increases. For the situation where C is correlated with X, the results are similar to those given in Table 1. For example, with β0 = 0, α0 = 0.5 and n = 100, we obtained (Bias, SSD, ESE, CP) = (0.042, 0.378, 0.381, 0.943) and (0.011, 0.162, 0.159, 0.934) for βˆ and τˆ, respectively. With β0 = 0, α0 = 0.5 and n = 200, the corresponding results are (Bias, SSD, ESE, CP) = (0.008, 0.244, 0.249, 0.956) and (−0.006, 0.118, 0.111, 0.942), respectively. We also examined some other copula models and obtained similar results.

6

An illustrative example

In this section, we illustrate the proposed methodology using a set of bivariate current status data arising from a 2-year rodent carcinogenicity study of chloroprene conducted by the National Toxicology Program. The original study consists of a control group and three dose groups of chloroprene at concentrations of 12.8, 32, and 80 ppm. In each group, 50 male and 50 female rats were exposed to chloroprene by inhalation 6 hours per day, 5 days per week for up to 2 years. The occurrence of different types of tumors in various sites was determined through a pathologic examination at the time of animal death. Many of the animals died prior to 2 years due to natural causes or to moribund sacrifice. All surviving animals were sacrificed at the end of 2 years. For the analysis here, we focus on the data from male rats in the

Tong X W et al.

Table 1 ⎛ ⎞ β ⎝ ⎠ τ

Sci China Math

April 2012

Vol. 55

769

No. 4

Simulation results when the censoring variable is independent of covariate Sample size = 100

Bias

Sample size = 200

SSD

ESE

CP

Bias

SSD

ESE

CP

−0.5

−0.0221

0.2922

0.2630

0.932

−0.0145

0.1884

0.1833

0.947

−1/3

0.0253

0.1385

0.1151

0.941

0.0155

0.1062

0.1047

0.936

0

0.0136

0.2866

0.2730

0.959

0.0142

0.1791

0.1724

0.941

−1/3

0.0269

0.1397

0.1382

0.942

0.0204

0.1072

0.1046

0.938

0.5

0.0173

0.2869

0.2834

0.948

0.0184

0.1953

0.1985

0.949

−1/3

0.0106

0.1104

0.1211

0.943

0.0098

0.0856

0.0884

0.961

−0.5 1/3

0.0103

0.3312

0.3075

0.934

0.0636

0.2483

0.2446

0.945

−0.0107

0.1482

0.1435

0.953

0.0079

0.1094

0.1064

0.952

0.3315

0.3095

0.933

−0.0094

0.2155

0.2077

0.943

1/3

−0.0187

0.1528

0.1556

0.943

−0.0064

0.1194

0.1123

0.953

0.5

0.0136

0.3705

0.3590

0.951

0.0142

0.2641

0.2663

0.936

1/3

−0.0178

0.1224

0.1146

0.947

−0.0116

0.0887

0.0826

0.943

0

0.02610

control and high-dose groups with respect to adrenal and lung tumors. [5] provided more details about the experiment. Let T1 and T2 denote the ages of onset of adrenal and lung tumors, respectively, and C the death time of the animal. Also let X denote the group indicator equal to 1 for the animals in the high-dose group and 0 for those in the control group. In both groups, there were many mice that survived at the end of the study, thus we have right censored data for the death time C. Assume that T1 and T2 follow the joint survival function specified by models (1) or (2) and (11) and C follows the proportional hazards model. First, the partial likelihood approach gave γˆ = 0.7588 with the estimated standard error of 0.2676, giving a p-value equal to 0.0046 for testing γ = 0. This implies that the high-dose chloroprene significantly increased the death rate and the estimation procedure based on U2 for β should be used. To estimate the dose effect, we considered both Clayton and Gumbel models for the joint survival function with different baseline hazard functions for T1 and T2 [5]. Also we considered two situations with respect to treatment effects and the obtained estimates are shown in Table 2. First, we assumed that the chloroprene had the same effect on the two tumors. The proposed estimation procedure gave βˆ > 0 with p-values very close to zero under both copula models. These suggest that the chloroprene had significant effect in increasing the incidence rates of adrenal and lung tumors. For the association Table 2

Application results

Clayton copula model Parameters Same effect

Standard error

p-value

0.01281

3.3465e−010

0.08047

τ

−0.29820

0.12090

0.0136

0.10510

0.02363

8.6780e−006

0.08224

0.01975

3.1265e−005

−0.29940

0.12380

0.0156

β1 Different effect

Estimation

β

β2 τ

Gumbel copula model Parameters Same effect

Standard error

p-value

0.01348

1.3278e−011

0.09120

τ

−0.31420

0.12290

0.0106

0.09060

0.02185

3.3767e−005

0.08352

0.02136

9.2253e−005

−0.31070

0.13270

0.0192

β1 Different effect

Estimation

β

β2 τ

770

Tong X W et al.

Sci China Math

April 2012

Vol. 55

Adrenal tumors

Lung tumors

1.0

1.0

0.9 NPMLE of control NPMLE of high dose Proposed method

0.8 0.7 0.6

NPMLE of control NPMLE of high dose Proposed method

0.9 Survival function

Survival function

No. 4

0.8 0.7 0.6

0.5 0.5 16

18

20 Months

22

24

Figure 1 Comparison of the estimated survival function in the control group and the high dose group for the adrenal tumors under the independent monitoring situation.

16

18

20 Months

22

24

Figure 2 Comparison of the estimated survival function in the control group and the high dose group for the lung tumors under the independent monitoring situation

between the two types of tumors, it can be seen that τˆ is negative with the p-value < 0.02. This indicates that the incidence rates between the two types of tumors was significantly negatively correlated. For the situation where the chloroprene was assume to have different effects on the two tumors, the results gave similar conclusions. To give a graphical idea about the chloroprene effect on tumor incidence and also check the model (1) or (2), we obtained the nonparametric maximum likelihood estimates of the survival functions of T1 and T2 based on the univariate current status data from the animals in the control and dose groups separately [20]. The corresponding estimates under model (1) or (2) based on the univariate current status data were also obtained and all estimates are presented in Figure 1 (for adrenal tumor) and Figure 2 (for lung tumor). These figures suggest that the additive hazards model (1) or (2) seems to be reasonable and the chloroprene increased the tumor incidence rates.

7

Concluding remarks

This paper considered the bivariate current status data where the marginal hazards functions of the two failure times of interests are additive and their joint survival function follows a copula structure. Instead of the proposed two step approach, one could directly maximize the log-likelihood function (5) to obtain the maximum likelihood estimates of (θ0 , Λ01 , Λ02 ). But the maximization could be very difficult in the use of the sieve method [11]. This is because the maximization must be performed over a high-dimensional space, whose dimensionality will grow to infinity with n tending to infinity. In addition, the computation is much complex. In this paper we viewed the two baseline cumulative functions as the nuisance parameters and presented an efficient estimation procedure. In the proposed approach, a copula model was applied to model the joint survival function of the two correlated failure times of interest. The proposed estimates of regression parameters are consistent and have asymptotically a normal distribution. The approach allows the monitoring variable to depend on covariates and the simulation study suggests that it works well for practical situations. The advantages of this method lie in two folds: (i) One does not need to estimate θ0 and Λ0k (k = 1, 2) simultaneously, which could be very complex. The estimates of the nuisance functions only need to be consistent with n1/4 convergent rate. (ii) The proposed estimators are also efficient. In this paper, we used the additive hazards model as the marginal hazards function of Tk rather than the proportional hazards model [23] since the additive model describes a different aspect of the association between the failure time and covariates and is more plausible in many applications [14, 15].

Tong X W et al.

Sci China Math

April 2012

Vol. 55

No. 4

771

One direction for future research is when there exist more than two correlated failure times and only current status data are available for each of the failure variables. In this case, one faces inference problem for the additive hazards model with multivariate current status data. One possible approach for this is to specify a model for the joint survival function of the correlated failure times and to develop an efficient estimation procedure as in the previous sections. Another direction for future research is the development of methods for assessing the models employed in the proposed methodology. For the marginal model (1) or (2), one could apply the graphical tool used in Section 6 but the checking of the model (3) is generally quite difficult. In the preceding discussion, it was assumed that there exists one common monitoring time C. It is worth mentioning that in some situations, one may face different C for T1 and T2 . The direct extension of our proposed estimation for this case is very difficult and a new estimation procedure needs to be developed. Acknowledgements The authors wish to thank the editor and three referees for their critical and constructive comments that greatly improved the paper. This work was partly supported by National Natural Science Foundation of China (Grant No. 10971015, 11131002), Key Project of Chinese Ministry of Education (Grant No. 309007) and the Fundamental Research Funds for the Central Universities.

References 1 Andersen P K, Gill R D. Cox’s regression model for counting processes: A large sample study. Ann Statist, 1982, 10: 1100–1120 2 Bickel P, Klaassen C, Ritov Y, et al. Efficient and Adaptive Estimation for Semiparametric Models. Baltimore: Johns Hopkins University Press, 1993 3 Clayton D G. A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika, 1978, 65: 141–151 4 Ding A A, Wang W. Testing independence for bivariate current status data. J Amer Statist Assoc, 2004, 99: 145–155 5 Dunson D B, Dinse G E. Bayesian models for multivariate current status data with informative censoring. Biometrics, 2002, 58: 79–88 6 Genest C, Rivest L P. Statistical inference procedures for bivariate Archimedean copulas. J Amer Statist Assoc, 1993, 88: 1034–1043 7 Goggins W B, Finkelstein D M. A proportional hazards model for multivariate interval-censored failure time data. Biometrics, 2000, 56: 940–943 8 Guo S W, Lin D Y. Regression analysis of multivariate grouped survival data. Biometrics, 1994, 50: 632–639 9 Hougaard P. Analysis of Multivariate Survival Data. New York: Springer, 2000 10 Huang J. Efficient estimation for the proportional hazards model with interval censoring. Ann Statist, 1996, 24: 540–568 11 Huang J, Rossini A J. Sieve estimation for the proportional-odds failure-time regression model with interval censoring. J Amer Statist Assoc, 1997, 92: 960–967 12 Jewell N P, van de Laan M, Lei X. Bivariate current status data with univariate monitoring times. Biometrika, 2005, 92: 847–862 13 Keiding N. Age-specific incidence and prevalence: A statistical perspective. J Roy Statist Soc A, 1991, 154: 371–412 14 Lin D Y, Ying Z L. Semiparametric analysis of the additive risk model. Biometrika, 1994, 81: 61–71 15 Lin D Y, Oakes D, Ying Z. Additive hazards regression for current status data. Biometrika, 1998, 85: 289–298 16 Lin D Y, Ying Z L. Semiparametric and nonparametric regression analysis of longitudinal data. J Amer Statist Assoc, 2001, 96: 103–126 17 Martinussen T, Scheike T H. Efficient estimation in additive hazards regression with current status data. Biometrika, 2002, 89: 649–658 18 Oakes D. Bivariate survival models induced by frailties. J Amer Statist Assoc, 1989, 84: 487–493 19 Shen X, Wong W H. Convergent rates of sieve estimates. Ann Statist, 1994, 22: 580–615 20 Sun J. The Statistical Analysis of Interval-Censored Failure Time Data. New York: Springer, 2006 21 Tong X, Chen M H, Sun J. Regression analysis of multivariate interval-censored failure time data with application to tumorigenicity experiments. Biometrical J, 2008, 50: 364–374 22 Wang W, Ding A A. On assessing the association for bivariate current status data. Biometrika, 2000, 87: 879–893 23 Wang L, Sun J, Tong X. Efficient estimation for the proportional hazards model with bivariate current status data. Lifetime Data Anal, 2008, 14: 134–153 24 Zeng D L, Cai J W, Shen Y. Semiparametric additive model for interval-censored data. Statistica Sinica, 2006, 16: 287–302

772

Tong X W et al.

Sci China Math

April 2012

Vol. 55

No. 4

Appendix A.1

Derivation of the efficient score and the information bound

Using the notation defined in the previous sections and taking the derivatives of the log-likelihood l(θ, Λ01 , Λ02 ) with respect to θ, one can easily obtain the score function for θ as l˙θ =

1  1  

1  1 

Zjm1 ajm1 + Zjm2 ajm2 dNjm (t). Sjm

0

j=0 m=0

Note that

M0

Sjm (t) = 1 and ∂Sjm /∂θ =

j=0 m=0

2 

Zjmw ajmw ,

w=1

yielding that 1  1  2 

Zjmw ajmw = 0

j=0 m=0 w=1

and l˙θ =

1  1   j=0 m=0

=

0

1  1   j=0 m=0

M0

M0

0

Zjm1 ajm1 + Zjm2 ajm2 dNjm (t) Sjm Zjm1 ajm1 + Zjm2 ajm2 dMjm (t). Sjm

Suppose that Λ0k is a function of parameter η and set ∂Λ0k /∂η = bk , k = 1, 2. Then the score function for Λ0k has the form l˙Λ01 ,Λ02 (b1 , b2 ) =

1  1   j=0 m=0

=

0

1  1   j=0 m=0

M0

M0

0

ajm1 b1 + ajm2 b2 dNjm (t) Sjm ajm1 b1 + ajm2 b2 dMjm (t). Sjm

To derive the efficient score function for θ, by definition, we need to find functions b∗1 and b∗2 such that for any b1 and b2 , we have l˙θ∗ ⊥ l˙Λ01, Λ02 (b1 , b2 ), where l˙θ∗ = l˙θ − l˙Λ01 ,Λ02 (b∗1, b∗2 ) =

1  1   j=0 m=0

0

M0

ajm1 (Zjm1 − b∗1 ) + ajm2 (Zjm2 − b∗2 ) dMjm (t). Sjm

That is, E[{l˙θ − l˙Λ01 ,Λ02 (b∗1 , b∗2 )}l˙Λ01 ,Λ02 (b1 , b2 )] = 0, or 1  1 

 E

j=0 m=0

=

M0

0

1  1  j=0 m=0

ajm1 (Zjm1 − b∗1 ) + ajm2 (Zjm2 − b∗2 ) dMjm (t) Sjm 

E 0

M0



M0 0

 ajm1 b1 + ajm2 b2 dMjm (t) Sjm

ajm1 (Zjm1 − b∗1 ) + ajm2 (Zjm2 − b∗2 ) (ajm1 b1 + ajm2 b2 )Y λc dt Sjm

 = 0.

This yields that 1  1  j=0 m=0

E

ajm1 (Zjm1 − b∗1 ) + ajm2 (Zjm2 − b∗2 ) ajm1 Y λc = 0 Sjm

(A.1)

Tong X W et al.

Sci China Math

April 2012

Vol. 55

No. 4

773

and 1  1 

E

j=0 m=0

ajm1 (Zjm1 − b∗1 ) + ajm2 (Zjm2 − b∗2 ) ajm2 Y λc = 0. Sjm

(A.2)

Solving the equations above, we obtain that (b∗1 , b∗2 ) = HG−1 . Thus the efficient score for θ has the form l˙θ∗ =

1  1   j=0 m=0

and the information for θ is I(θ) = E l˙θ⊗2 ∗ = E =

1 

1 

j=0 m=0

A.2

M0

0

(Zjm − HG−1 )Ajm dMjm (t) Sjm

 1  1  

E 0

j=0 m=0 M0

M0

0

 (Zjm − HG−1 )Ajm dMjm (t) Sjm

−1 ((Zjm − HG−1 )Ajm )⊗2 Sjm Y λc dt.

Proof of (8)

Define n 1 1  1    M0 (Zjmi − HG−1 )Ajmi U0 (θ0 , Λ01 , Λ02 ) = √ dNjmi (t) n i=1 j=0 m=0 0 Sjmi

n 1 1  1    M0 (Zjmi − HG−1 )Ajmi =√ dMjmi (t). n i=1 j=0 m=0 0 Sjmi

Then it is easy to show that U (θ0 , Λ01 , Λ02 ) = U0 (θ0 , Λ01 , Λ02 ) + op (1)

(A.3)

ˆ 01 , by noting that Hn and Gn are the empirical versions of H and G. Note that we can rewrite U (θ0 , Λ ˆ Λ02 ) as n  1  1  M0 ∗ ∗  (Zjmi − Hn∗ G∗−1 n )Ajmi ˆ 01 , Λ ˆ 02 ) = √1 dMjmi (t) U (θ0 , Λ ∗ n i=1 j=0 m=0 0 Sjmi

n 1 1  ∗ ∗ − Hn∗ G∗−1 1    M0 (Zjmi n )Ajmi +√ Sjmi Yi λc dt ∗ n i=1 j=0 m=0 0 Sjmi

ˆ 01 , Λ ˆ 02 ) + V2n (θ0 , Λ ˆ 01 , Λ ˆ 02 ). = V1n (θ0 , Λ

(A.4)

For the first term, let hjmi (Λ01 , Λ02 ) = (Zjmi − HG−1 )Ajmi /Sjmi , hjmiu =

∂hjmi (u, v) |u=Λ01 ,v=Λ02 , ∂u

hjmiv =

∂hjmi (u, v) |u=Λ01 ,v=Λ02 . ∂v

ˆ 01 , Λ ˆ 02 ), the bounded property of the second derivatives Applying the Taylor series expansion of V1n (θ0 , Λ −1/4 ˆ function of hjmi and supt |Λ0k (t) − Λ0k (t)| = op (n ) for k = 1, 2, we have that n   M0  ˆ 01 , Λ ˆ 02 ) = U0 + √1  01 − Λ01 ) + hjmiv (Λ  02 − Λ02 )]dMjmi (t) + op (1). [hjmiu (Λ V1n (θ0 , Λ n i=1 j,m 0 Then applying Lemma A.1 of [16], we have ˆ 01 , Λ ˆ 02 ) = U0 (θ0 , Λ01 , Λ02 ) + op (1). V1n (θ0 , Λ

(A.5)

774

Tong X W et al.

Sci China Math

April 2012

Vol. 55

No. 4

ˆ 01 , Λ ˆ 02 ) and noting that Again using the Taylor series expansion of V2n (θ0 , Λ   ∗ /∂θ = 0 and j,m A∗jmi = 0, we obtain ∂ j,m Sjmi V2n

 j,m

∗ Zjmi A∗jmi =

n 1 1  1    M0  01 − Λ01 ) + ajm2i (Λ  02 − Λ02 )]Yi λc dt + op (1) = √ hjmi [ajm1i (Λ n i=1 j=0 m=0 0



M0

= 0

2  k=1

1  1 n    0k − Λ0k ) √1 (Λ hjmi ajmki Yi λc dt + op (1). n i=1 j=0 m=0

It then follows immediately from (A.0) and (A.1) that V2n = op (1). This, together with (A.2)–(A.4), completes the proof. A.3

Deviation of the asymptotical covariance matrix of θ1

Define θ˜ to be the solution to U0 (θ, Λ01 , Λ02 ) = 0 when Λ01 and Λ02 are known. Then it can be easily shown that n √ 1  n(θ − θ0 ) = −φ−1 √ ei (θ0 ) + op (1), n i=1 where

1  1  

M0

(Zjmi − HG−1 )Ajmi (θ)dNjmi and φ = E∂ei /∂θ|θ=θ0 . Sjmi j=0 m=0 0 √  −1 I(θ0 )φ−1 ) in distribution. It is clear that cov(ei (θ0 )) = I(θ0 ), and therefore n(θ−θ 0 ) converge to N (0, φ n Then from (A.2) and (8), to prove (9), one only needs to show that φ = −I(θ0 ) or n−1 i=1 ∂ei /∂θ|θ=θ0 → −I(θ0 ) in probability.  n For this, note that j,m (Zjmi − HG−1 )Ajmi = 0. This yields that − n1 i=1 ∂ei /∂θ|θ=θ0 = V1n + V2n + V3n , where  n 1 1  1    M0 ∂[(Zjmi − HG−1 )Ajmi ]  −1 V1n = − Sjmi dNjmi (t)  n i=1 j=0 m=0 0 ∂θ θ=θ0   n 1   M0 ∂[(Zjmi − HG−1 )Ajmi ]  −1 Sjmi dMjmi (t), =−  n i=1 j,m 0 ∂θ θ=θ0 ei (θ) =

V2n

V3n

n 1 1  1    M0 (Zjmi − HG−1 )Ajmi = [Zjmi Ajmi ] dMjmi (t), 2 n i=1 j=0 m=0 0 Sjmi n 1 1  1    M0 (Zjmi − HG−1 )Ajmi = [Zjmi Ajmi ] Yi λc dt. n i=1 j=0 m=0 0 Sjmi

It is easy to see that the first two terms V1n and V2n above converge to zero in probability based on the central martingale theorem. For the third term, it can be written as ⊗2 n 1 1  1    M0 (Zjmi − HG−1 )Ajmi V3n = Sjmi Yi λc dt n i=1 j=0 m=0 0 Sjmi +

n 1 1  1    M0 (Zjmi − HG−1 )Ajmi  Ajmi G−1 H  Yi λc dt. n i=1 j=0 m=0 0 Sjmi

It is apparent that the first term in V3n converges in probability to the information bound I(θ0 ), while M the second term converges in probability to 0 0 (H − HG−1 G)G−1 H  dt = 0. Therefore, V3n converges to I(θ0 ) in probability, which completes the proof.

Information for authors SCIENCE CHINA Mathematics, a peer review mathematical journal cosponsored by Chinese Academy of Sciences and National Natural Science Foundation of China, and published monthly in both print and electronic forms by Science China Press and Springer, is committed to publishing high-quality, original results in both basic and applied research. Categories of articles: Reviews summarize representative results and achievements in a particular topic or an area, comment on the current state of research, and advise on the research directions. The author’s own opinion and related discussion are requested. Articles report on important original results in all areas of mathematics. Brief reports present short reports in a timely manner of the latest important results. Authors are recommended to use the online submission services. To submit a manuscript, please visit www.scichina.com, log on at JoMaSy© (Journal Management System), get an account, and follow the instructions to upload the text and image/table files. Authors should also submit such accompanying materials as a short statement on the research background, area/subarea and significance of the work, a brief introduction to the first and corresponding authors including their mailing address, post code, telephone number, fax number, and email address. Authors may suggest several referees (please supply full names, addresses, phone, fax and email), and/or request the exclusion of specific reviewers. All submissions will be reviewed by referees selected by the editorial board. The decision of acceptance or rejection of a manuscript is made by the editorial board based on the referees’ reports. The entire review process may take 60 to 90 days, and the editorial office will inform the author of the decision as soon as the process is completed. Authors should guarantee that their submitted manuscript has not been published before, and has not been submitted elsewhere for print or electronic publication consideration. Submission of a manuscript is taken to imply that all the named authors are aware that they are listed as co-authors, and they have seen and agreed to the submitted version of the paper. No change in the order of listed authors can be made without an agreement signed by all the authors. Once a manuscript is accepted, the authors should send a copyright transfer form signed by all authors to Science China Press. Authors of one published paper will be presented one sample copy. If offprints and more sample copies are required, please contact the managing editor and pay the extra fee. The full text in Chinese and in English opens freely to the readers in China at www.scichina.com, and the full text in English is available to overseas readers at www. springerlink.com.

Subscription information ISSN print edition: 1674-7283 ISSN electronic edition: 1869-1862

Subscription rates: For information on subscription rates please contact: Customer Service China: [email protected] North and South America: [email protected] Outside North and South America: [email protected] Orders and inquiries: China Science China Press 16 Donghuangchenggen North Street, Beijing 100717, China Tel: 86-10-64034559 or 86-10-64034134 Fax: 86-10-64016350 Email: [email protected] North and South America Springer New York, Inc. Journal Fulfillment P.O. Box 2485 Secaucus, NJ 07096, USA Tel: 1-800-SPRINGER or 1-201-348-4033 Fax: 1-201-348-4505 Email: [email protected] Outside North and South America Springer Distribution Center Customer Service Journals Haberstr. 7, 69126 Heidelberg, Germany Tel: 49-6221-345-0, Fax: 49-6221-345-4229 Email: [email protected] Cancellations must be received by September 30 to take effect at the end of the same year. Changes of address: Allow for six weeks for all changes to become effective. All communications should include both old and new addresses (with postal codes) and should be accompanied by a mailing label from a recent issue. According to § 4 Sect. 3 of the German Postal Services Data Protection Regu lations, if a subscriber’s address changes, the German Federal Post Office can inform the publisher of the new address even if the subscriber has not submitted a formal application for mail to be forwarded. Subscribers not in agreement with this procedure may send a written complaint to Customer Service Journals, Karin Tiks, within 14 days of publication of this issue. Microform editions are available from: ProQuest. Further information available at http://www.il.proquest.com/uni. Electronic edition: An electronic version is available at springerlink.com. Production: Science China Press 16 Donghuangchenggen North Street, Beijing 100717, China Tel: 86-10-64034559 or 86-10-64034134 Fax: 86-10-64016350 Printed in the People’s Republic of China Jointly Published by Science China Press and Springer

Mathematics CONTENTS

Vol. 55 No. 4 April 2012

Progress of Projects Supported by NSFC Riemann boundary value problems and reflection of shock for the Chaplygin gas ................................................................................ CHEN ShuXing & QU AiFang

671

Articles Shortening filtrations ................................................................................................................................................................................ ENOCHS Edgar E. Representation dimension for Hopf actions ............................................................................................................................................. SUN JuXiang & LIU GongXiang Groups with the same order and degree pattern......................................................................................................................................... KOGANI-MOGHADDAM Roya & MOGHADDAMFAR Ali Reza Variational minimizing parabolic and hyperbolic orbits for the restricted 3-body problems ................................................................... ZHANG ShiQing The reason of Hopf's and Oleinik’s proofs for countability of shocks being wrong ................................................................................. LI BangHe Affinely equivalent Kähler-Finsler metrics on a complex manifold ......................................................................................................... YAN RongMu Self-maps of p-local infinite projective spaces ......................................................................................................................................... LIN XianZu Optimal variational principle for backward stochastic control systems associated with Lévy processes ............................................... TANG MaoNing & ZHANG Qi Efficient estimation for additive hazards regression with bivariate current status data ........................................................................... TONG XingWei, HU Tao & SUN JianGuo Jackknifed random weighting for Cox proportional hazards model ......................................................................................................... LI Xiao, WU YaoHua & TU DongSheng Checking for normality in linear mixed models ....................................................................................................................................... WU Ping, ZHU LiXing & FANG Yun A three level linearized compact difference scheme for the Cahn-Hilliard equation .............................................................................. LI Juan, SUN ZhiZhong & ZHAO Xuan Analysis of a moving collocation method for one-dimensional partial differential equations ................................................................. MA JingTang, HUANG WeiZhang & RUSSELL Robert D. A direct product decomposition of QMV algebras ................................................................................................................................... LU Xian, SHANG Yun & LU RuQian Image space analysis for variational inequalities with cone constraints and applications to traffic equilibria ........................................ LI Jun & HUANG NanJing Negacyclic codes over Galois rings of characteristic 2a ........................................................................................................................... ZHU ShiXin & KAI XiaoShan

687 695 701 721 727 731 739 745 763 775 787 805 827 841 851 869

Brief reports Algorithms for computing the global infimum and minimum of a polynomial function ......................................................................... XIAO ShuiJing & ZENG GuangXing

math.scichina.com Indexed by:

SCI-CD MR Z Math MathSciNet

881

www.springer.com/scp