Bayesian inference for semiparametric binary regression

2 downloads 0 Views 388KB Size Report
Michael A. Newton and Rick Chappell are Assistant Professors of Statistics and ... Aranda-Ordaz (1981), Guerrero and Johnson (1982), Stukel (1988), Taylor ...
University of Wisconsin Department of Statistics

Bayesian inference for semiparametric binary regression

Michael A. Newton Claudia Czado Rick Chappell Technical Report 905: Second Revision, April 1995 (Original, August 1993: First Revision, August 1994)

Department of Statistics University of Wisconsin{Madison 1210 West Dayton Street Madison WI 53706-1685

Phone: (608) 262-0086 Internet: [email protected]

Bayesian inference for semiparametric binary regression Michael A. Newton, Claudia Czado, Rick Chappell April, 1995 Abstract We propose a regression model for binary response data which places no structural restrictions on the link function except monotonicity and known location and scale. Predictors enter linearly. We demonstrate Bayesian inference calculations in this model. By modifying the Dirichlet process, we obtain a natural prior measure over this semiparametric model, and we use Polya sequence theory to formulate this measure in terms of a nite number of unobserved variables. A Markov chain Monte Carlo algorithm is designed for posterior simulation, and the methodology is applied to data on radiotherapy treatments for cancer.

Keywords: Dirichlet process, Polya sequence, logistic regression, Markov chain Monte

Carlo, latent variables, link function

1 INTRODUCTION Binary response data, measured with covariates, are often modeled by assuming that the probability of a positive response, after suitable transformation, is linear in the covariates. This transformation, or link function, connects the probability to a linear predictor, and is usually assumed to be a known function, as in logistic or probit regression. Logistic regression is particularly popular because the regression parameters can be interpreted in terms of changes in log odds, and because of a certain invariance regarding retrospective and Michael A. Newton and Rick Chappell are Assistant Professors of Statistics and Biostatistics at the University of Wisconsin{Madison. Claudia Czado is Associate Professor of Statistics at York University, North York, Ontario. 

2

Binary regression

3

prospective studies (Armitage 1971). However, models that allow some parametric exibility in the link function can signi cantly improve ts, and many authors have studied testing and estimation in such one and two parameter models. See Prentice (1976), Pregibon (1980), Aranda-Ordaz (1981), Guerrero and Johnson (1982), Stukel (1988), Taylor (1988), Czado and Santner (1992a, 1992b), Cheng and Wu (1994), and Atkinson (1987) for a summary. To our knowledge, there are no methods available for inference when a completely nonparametric assumption is made about the link function, although nonparametric extensions of logistic regression have been studied. An interesting model has been advocated by Follmann and Lambert (1989). This model is formed by including a random intercept having unknown distribution into the linear predictor. Methods of nonparametric mixture estimation are then used to t the model. In a di erent approach, Hastie and Tibshirani (1987) keep the link xed but allow a smooth nonparametric predictor in place of the parametric linear predictor. This paper develops a semiparametric model for binary regression. The model maintains linearity of the predictor, but allows an arbitrary link function, subject to identi ability. Our approach to inference is Bayesian, and while the derived estimators are liable to have good frequentist properties, we do not develop such arguments. Rather, we construct a prior measure and demonstrate posterior computations over the space of regression parameters and link functions. The novel component of the prior is a variant of the Dirichlet process (Ferguson 1973)|one supported on link functions having xed location and scale. To enable computations, we recast the prior as a measure over a high-dimensional space of latent variables. This representation is an application of Polya sequence theory (Blackwell and MacQueen, 1973). We construct a Markov chain Monte Carlo (MCMC) algorithm to simulate the posterior. (For a recent review of MCMC, see the discussion papers of Smith and Roberts, 1993, Besag and Green, 1993, and Gilks et al., 1993.) The algorithm we propose is a cycle of ve Metroplis-Hastings chains. We apply the proposed methods to data from clinical trials to assess radiotherapy treatments for cancer. Chappell Nondahl and Fowler (1992) develop a regression model for the binary response indicating whether or not the therapy has successfully controlled local growth of a tumor. This is a particularly interesting example because the model is derived from cell kinetics, and the link function has an interpretation in terms of cell growth. Logistic regression ts poorly, and posterior computations suggest asymmetries in this link. Posterior computations enable us to study two scienti cally relevant quantities: the probability of local control as a function of treatment time and the time-dose tradeo parameter. The latter is

Binary regression

4

a ratio of regression parameters having an interpretation independent of the link function, and hence inference that is unconditional on the link is desirable. Section 2 presents the semiparametric binary regression model. The prior measure and its reformulation are presented in Section 3. In Section 4, we develop the MCMC algorithm, and then apply the method to the radiotherapy data in Section 5. Results are compared to a parametric link analysis. Proofs are in Appendix A and details of one MCMC step are in Appendix B.

2 SAMPLING MODEL 2.1 Background Consider binary regression data. Responses fyig, each a 0 ? 1 variable, are recorded with covariate vectors fxi = (1; xi;1; xi;2; : : : ; xi;l )t g, for i indexing each case in a sample of size n. Examples abound. In Section 5, yi indicates whether or not a radiotherapy treatment has successfully controlled local growth of a cancer tumor, and xi gives information about the way the treatment is administered. Assuming independent responses and xed covariates, a natural model for the probability of positive response is:

P (yi = 1) = F [i( )]

(1)

where i ( ) = xti is the linear predictor associated with a regression parameter vector = ( 0; 1; : : : ; l )t . The linear predictor is linked to the probability of positive response by a right-continuous, non-decreasing function F , called the link function, having range [0; 1]. (F is called the inverse link function in literature on generalized linear models.) Note that if many binary responses are recorded at each covariate value, we have a model for binomial proportions as a special case. In a standard analysis, F is considered to be a known cumulative distribution function, thus allowing relatively simple treatment of the nite regression parameter . For example, logistic regression obtains if F (t) = (1 + e?t )?1, and probit regression corresponds to the standard normal cumulative for F . Czado (1992) studied parametric link families in the context of generalized linear models which allow for a single or both-tail modi cation. For binary regression she gives the following two-parameter link family:

F (t) = G[h(i( ); 1; 2 )]

(2)

Binary regression

5

where G is the logistic or probit distribution and

8 < 1 h(; 1; 2 ) = : (( + 1) ? 21)= 1 ?((? + 1) ? 1)=

2

if   0; if  < 0:

The parameters 1 and 2 modify the right and left tails of the link, respectively. As 1 ( 2 ) increases, the right (left) tail of the link becomes lighter. The link is skewed if 1 6= 2 and 1 = 2 = 1 corresponds to no link modi cation. One parameter submodels arise if only one tail is modi ed, i.e. one of the link parameters is set to 1. Often a single tail modi cation is sucient; however, the example considered in Section 5 requires modi cation of both tails. Link family (2) is exible and simpler than the one proposed by Stukel (1988) and the parametrization is locally orthogonal. In models involving parametric link families (see references in Secton 1) inference for link and regression parameters follows established theory. Albert and Chib (1993) describe Bayesian inference when the link is in the symmetric family of normal or t distributions. Czado (1993a, 1993b) develops Bayesian inference for model (2). Gelfand and Kuo (1991) show nonparametric posterior computations for the bioassay model, i.e. a binary regression model involving only a single regresser. Recently, Mallick and Gelfand (1994) have studied Bayesian inference in a mixture model allowing more general parametric exibility in the link function. Their work applies to the larger class of generalized linear models.

2.2 Our proposal We propose to model the binary regression data by (1) where F is an arbitrary distribution function, subject to identi ability. Speci cally, denote by F the set of all distribution functions: right-continuous functions such that limx!?1 F (x) = 0 and limx!1 F (x) = 1. For d > 0 and p 2 (0; 1), we restrict F to the set Fd;p of F 2 F for which: 1. F ?1(1=2) = 0 2. F ?1[(1 ? p)=2] =  ? d 3. F ?1[(1 + p)=2] =  for some  2 (0; d). Here, F ?1(t) = inf fx : F (x)  tg is the left-continuous inverse of the right-continuous distribution function F . Each distribution function F 2 Fd;p corresponds to one value of , but each  2 (0; d) corresponds to an in nite set of F 's. All F 2 Fd;p have

Binary regression

6

median 0. The central interval cutting o area (1 ? p)=2 in each tail of F is of length d for all F 2 Fd;p. For example, the interquartile range is of length d for all F 2 Fd;1=2. The reason for this restriction of F can be seen by inspection of (1). If we allow F to be completely arbitrary, we see two sources of confounding. First, the intercept term 0 is confounded with the location of F . Second, the overall scale of the regression parameters is confounded with the scale of F . It might seem natural to restrict F to have a known mean and variance, rather than median and central range; however, it appears to be more dicult to apply Bayesian inference under the former restriction. In any case, we think it is important for the sampling model to be objectively identi able (Diaconis and Freedman, 1986). Distinct elements in the model correspond to di erent probability measures for the data. Parameter interpretation, computation, and asymptotic analysis are adversely a ected in non-identi able models. A critical observation is that model (1) is equivalent to one in which independent, F ?distributed variables fuig are censored by the linear predictor to produce the observed data:

yi = 1[ui  i( )] :

(3)

The link function is precisely the distribution function of these latent random variables.

3 A SEMIPARAMETRIC PRIOR The model is (3) where ui are iid F 2 Fd;p, for xed d and p. We assume prior independence of F and and place an absolutely continuous prior measure on . (In the example, we use a di use normal prior with independent components each having mean zero and variance 1000.) The rst problem is how to construct a prior on Fd;p.

3.1 Dirichlet process

Consider again the set F of all distribution functions on the real line. Due to important early work by Ferguson (1973, 1974) and others, the Dirichlet process has become the most well-studied probability measure used by Bayesians as a prior on this set. For a discussion of problems and progress, see Diaconis and Freedman (1986). The hyperparameter indexing this process is a positive nite measure m on the real line, formed from a positive constant a and a distribution function G by m(?1; t] = aG(t). Rather than de ning the Dirichlet

Binary regression

7

process by nite-dimensional distributions, we present a constructive de nition (Sethuraman and Tiwari 1982). Such a random element of F is, for all t 2 R,

F (t) =

1 X i=1

wi1[vi  t]

where vi are iid G, and the weights wi are the result of a stick-breaking exercise based on a set of iid Beta random variables bi having density proportional to (1 ? b)a?1 . Precisely, w1 = b1 , and for i  2, wi = bi Qij?=11 (1 ? bj ). In population genetics, the joint distribution of the ordered weights is called the Poisson-Dirichlet distribution (Kingman, 1977). The Dirichlet process is often criticized because realizations are almost surely discrete, as is clear from the Sethuraman{Tiwari construction. Much recent work indicates that this discreteness actually provides a useful tool for modeling. Random variables sampled from a Dirichlet-distributed F are prone to ties, with the number of unique values dependent on the hyperparameter a. Although ties may not be expected in precisely measured data, ties in unobserved random variables form the basis of exible random e ects models. See for example Erkanli, Muller, and West (1992), Escobar and West (1992), Liu (1993), MacEachern, and Bush (1993), and West, Muller, and Escobar (1993). Ferguson (1973) showed that there is positive probability for realizations of the Dirichlet process to be arbitrarily close (in a certain metric) to any distribution that is dominated by the base measure m. When using the Dirichlet process as a prior for an unknown distribution, it is not that we believe the unknown distribution is necessarily discrete. We insist only that positive prior probability is assigned to neighborhoods of any distribution. In the context of our regression problem, a suitable nonparametric prior will assign positive probability to neighborhoods of any link function.

3.2 Centrally standardized Dirichlet process The Dirichlet process is not supported on Fd;p (i.e. median and scale are random), and so it must be modi ed to act as a prior for the link function. The following construction is related to work of Doss (1985) and Hjort (1986). Start with a distribution function G, perhaps the logistic, having some mass outside the interval (?d; d), and a constant a > 0. Note the positive nite measure m determined by a  G. Let h be a probability density function supported on (0; d). The uniform is a natural choice. Let  be a random draw from h, and observe the partition of the line into four

Binary regression

8

disjoint intervals,

A1() = (?1;  ? d] A2 () = ( ? d; 0] A3 () = (0; ] A4 () = (; 1): Four measures are induced upon restricting m to each of these intervals: that is, for any Borel set B , mj (B ) := m(Aj () \ B ). Next, construct four conditionally independent Dirichlet processes. The j th process Fj has hyperparameter measure mj , given . Finally, paste these four processes together F = 1 ?2 p (F1 + F4 ) + p2 (F2 + F3 ): We call F a centrally standardized Dirichlet process with parameters m, p, d, and h, and write F  D(m; p; d; h).

Proposition 1 If F  D(m; p; d; h), then F 2 Fd;p with probability one. Being supported on Fd;p, this process is a suitable prior for the unknown link.

It is not clear how sensitive the posterior process will be to the choice of hyperparameters m, d, p, and h. In our worked example, the base measure m is the logistic distribution G times the constant a = 1. Having d = G?1((1 ? p)=2) ? G?1(p=2) is natural, and ensures that increasing a corresponds to increasing our conviction about the logistic link (when  = d=2 is xed). Decreasing a creates a more di use prior, but a = 0 is not allowed. The choice of p and h may be less crucial. We restrict attention to the uniform distribution on (0; d) and x the interquartile range p = 1=2. Figure 1 shows properties of this prior for two choices of a. Being a particular mixture, the centrally standardized prior inherits many properties from the Dirichlet process. For example, the law of F , conditioned on a number of random draws from F , is still a centrally standardized Dirichlet.

Proposition 2 If F  D(m; p; d; h) and given F , u1; u2; : : : ; un are iid observations from F , then given fuig, F  D(m; p; d; h), where m

= m+

n X i=1 4

u

i

Y ?[m(Aj ())] :  j =1 ?[m (Aj ( ))] Here u is the measure placing point mass at u, and ?() is the gamma function. Figure 2 illustrates the discontinuous h computed from a small set of ui's. h () / h()

Binary regression

9

3.3 Polya sequences Blackwell and MacQueen (1973) discovered a fundamental connection between the Dirichlet process and the sampling of balls from an urn. Imagine an urn containing a nite measure m on a set of `colors'. Though it is most natural to think of a nite set of colors, and the measure m(u) equal to number of balls in the urn of color u, a continuum of colors is certainly allowed. Let G equal the probability distribution corresponding to m upon normalization. An urn sampling scheme, called a Polya sequence, describes how to produce a sequence of colors u1; u2; : : : ; un. Sample u1 from G. Next change m to m + u1 (that is, add a point mass at u1), and thus change G. Repeat the process to get u2, and so on. With a nite number of colors, this procedure amounts to drawing a ball from the urn and then replacing two of the same color before resampling. Even if the original distribution of colors is continuous, there clearly is a tendency for ties among the ui. Each draw is seen to be a mixture of previously drawn ui and a potentially drawn variable from the original distribution; the mixing probabilities are determined by m. Whenever the draw is from the original distribution, we call it a wild card. Each wild card starts a new cluster of common latent variables. Blackwell and MacQueen's result says that if F is a Dirichlet process with measure m, and a sample u1; u2; : : : ; un is iid from F , given F , then marginally (i.e. having integrated out F ) these fuig are equal in distribution to the rst n steps of a Polya sequence based on m. The importance of this result is that for computation, reference can be made to a space of nite, rather than in nite, dimensions. Essentially, the random F has been integrated out.

3.4 Modi ed Polya sequences A four-urn analog of the Polya sequence result holds for centrally standardized Dirichlet processes. Suppose that in the prior for F , the distribution function G has a density function g. Upon sampling   h, four urns are constructed (the sets Aj () from the Section 3.2). Associated with each urn is a constant aj = m(Aj ()) = mj (?1; 1), and a probability density

gj (t) = g(t)1[t 2 Aj ()](a=aj ):

(4)

There will be n draws from the set of urns. On the ith draw, urn ki is chosen, the fkig being iid on f1; 2; 3; 4g with probabilities f(1 ? p)=2; p=2; p=2; (1 ? p)=2g. Upon chosing urn

Binary regression

10

ki, a random variable vi, distributed gk , is made available to be drawn from the urn. (It may not be drawn, however.) Also an integer i is chosen from the following distribution on 1; 2; : : : ; i: 8 < i = : i w:p: ak =(ak + li ? 1) j w:p: 1=(ak + li ? 1) if j < i and kj = ki: Here l counts how many samples have been drawn from urn k up to time i, l = Pi 1[k = i

i

i

i

i

i

i

j =1

j

ki]: If the random index i equals i, then the available vi becomes a wild card and is actually drawn from the urn. If i < i, a previously drawn variable must redrawn from the urn, creating a tie in the output. In our representation, available vi for which i < i are never drawn.

Proposition 3 If F  D(m; p; d; h) and if given F , u1; u2; : : : ; un are iid F , then marginally (u1; u2; : : : ; un) =d (v 1 ; v 2 ; : : : ; v ) n

(5)

where indices i are de ned recursively;

8
i for which h = i. Further, we keep the fuk : k 6= ig xed by having vl = vi . If i < j , then joining i to the cluster indexed by j means that this larger cluster is now indexed by i. Therefore, h = i for h = i and fh : h = j g. To maintain drawn latent variables, put vi = v and draw a new variable vj from the urn density. As above, if i does not start its own cluster in s, then the remaining components in s are identical to those in s. Otherwise, we must juggle indices a bit to keep the drawn latent variables xed. Speci cally, if l > i denotes the second element in i's cluster in s, then h = l for all h > i such that h = i. Finally, vl = vi. While the speci c modi cations to produce s from s in the case above depend on the relative magnitude of i and j and on whether or not i = i, the result is generally the same. One new proposal v is drawn from urn kj and case i joins the cluster with j . Because c > 1, the total number of clusters remains constant, although the urn distribution may change. Also if s is a possible proposal from s, then s is a possible proposal from s . Since there are c ways to select j leading to the same proposed joining, we have in the MH ratio q(s; s) = (c ? 1) gk (vold ) q(s; s) c gk (v) where the particular indices of vold and v depend on the speci cs above, and gk and gk are urn densities. Combining this with the posterior density in (7) we get nice cancellation in the MH ratio: r = z(ki; kj ) where 8 < z(l; k) = : (al + nl ? 1)=(ak + nk ) if l 6= k 1 else : The remaining cases tend to occur less often. When i exists already in a cluster of size 1, and j 6= i, adding i to j 's cluster reduces the number of clusters by one. The proposed state in this case follows exactly the reasoning above, except that we do not need to worry about the rest of i's cluster. As we see below, the MH ratio is also rather di erent, because up to now we have no way to create new clusters, and hence no way to get s back from s. The proposal mechanism creates new clusters if j is selected such that j = i. In this case, we select one of the four urns k at random and a draw vi from that urn. Therefore, by the same computation as above, if ci = 1 and j 6= i, we have the MH ratio (c + 1) r = 4a z(ki ; kj ): k Furthermore, if c > 1 and j = i,  r = 4cak z(ki; k ):

j

i

j

i

i

j

j

i

j

i

i

i

j

Binary regression

22

The last case is where i is in a cluster of size ci = 1 and j = i. Thus we draw an urn k and attempt to move i to that urn. Here r = (ak =ak )z(ki; k): i

Acknowledgements An earlier version of this work was presented in June 1992 at the Fifth Purdue Symposium on Statistical Decision Thoery and Related Topics. A. P. Dawid's insightful question on identi ability allowed us to reformulate the more appropriate solution presented here. We are grateful to the British Institute of Radiology for permission to use their data and to referees and an associate editor for valuable suggestions on an earlier draft. The second author was supported in part by research grant OGP0089858 of the Natural Sciences and Engineering Research Council of Canada.

References

Albert, J. and S. Chib (1993), \Bayesian analysis of binary and polytomous response data", Journal of the American Statistical Association, 88 669{679. Arando-Ordaz, F. J. (1981), \On two families of transformations to additivity for binary response data", Biometrika, 68 357{363. Armitage, P. (1971), Statistical Methods in medical research, Blackwell, Oxford. Arratia, Richard, A. D. Barbour, and Simon Tavare (1992), \Poisson process approximations for the Ewens sampling formula", The Annals of Applied Probability, 2 519{535. Atkinson, A. C. (1985), Plots, transformations, and regression, Clarendon, Oxford. Besag, J. and P. J. Green (1993), \Spatial statistics and Bayesian computation (with discussion)", Journal of the Royal Statistical Society, Series B, 55 25{38. Blackwell, D. (1973), \Discreteness of Ferguson selections", Annals of Statistics, 1 356{358. Chappell, R., D. M. Nondahl, and J. F. Fowler, \Modeling dose and local control in radiotherapy", Journal of the American Statistical Association. To appear. Cheng, K. F. and J. W. Wu (1994), \Testing goodness of t for a parametric family of link functions", Journal of the American Statistical Association, 89 657{664.

Binary regression

23

Czado, C. and T. J. Santner (1992), \The e ect of link misspeci cation on binary regression inference", Journal of Statistical Planning and Inference, 33 213{231. (1992), \Orthogonalizing link transformation families in binary regression analysis", Canadian J. of Stat., 20 51{62. Czado, C. (1992), \On link selection in generalized linear models", In Advances in GLIM and Statistical Modelling, Proceedings of the GLIM92 conference and the 7th International Workshop on Statistical Modelling. Lecture Notes in Statistics 78, Springer Verlag, New York. (1993). \Bayesian inference of binary regression models with parametric link". Under review. (1993). \Parametric link modi cation of both tails in binary regression". Under review. Diaconis, P. and D. Freedman (1986), \On the consistency of Bayes estimates", Annals of Statistics, 14 1{26. Doss, H. (1985), \Bayesian nonparametric estimation of the median: Part I: Computation of the estimates", Annals of Statistics, 13 1432{1444. Doss, Hani (1991). \Bayesian nonparametric estimation for incomplete data via successive substitution sampling". Preprint. Erkanli, A. and D. Stangl (1993), \A Bayesian analysis of ordinal data using mixtures", Technical Report 93-01, Institute of Statistics and Decision Sciences, Duke University. Erkanli, Alaatin, Peter Muller, and Mike West (1992), \Curve tting using Dirichlet process mixtures", Technical Report A09, Institute of Statistics and Decision Sciences, Duke University. Escobar, Michael D. and Mike West (1992), \Computing Bayesian nonparametric hierarchical models", Technical Report A20, Institute of Statistics and Decision Sciences, Duke University. Escobar, M. D. (1994), \Estimating normal means with a Dirichlet process prior", Journal of the American Statistical Association, 89 268{277.

Binary regression

24

Feller, W. (1971), An introduction to probability theory and its applications, Vol II, Wiley, New York, 2nd edition. Ferguson, T. S. (1973), \A Bayesian analysis of some nonparametric problems", Annals of Statistics, 1 209{230. (1974), \Prior distributions on spaces of probability measures", Annals of Statistics, 2 615{629. Fieller, E. C. (1954), \Some problems in interval estimation", Journal of the Royal Statistical Society, Series B, 16 175{185. Follmann, D. A. and D. Lambert (1989), \Generalizing logistic regression by nonparametric mixing", Journal of the American Statistical Association, 84 295{300. Gelfand, A. E. and L. Kuo (1991), \Nonparametric bayesian bioassay including ordered polytomous response", Biometrika, 78 657{666. Gelfand, A. E. and A. F. M. Smith (1990), \Sampling-based approaches to calculating marginal densities", Journal of the American Statistical Association, 85 398{409. Geman, S. and D. Geman (1984), \Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images", IEEE Trans. Pattern Anal. Machine Intell., 6 721{ 741. Geyer, C. J. (1992), \Practical Markov chain Monte Carlo", Statistical Science, 7 473{482. Gilks, W. R., D. G. Clayton, D. J. Spiegelhalter, N. G. Best, A. J. McNeil, L. D. Sharples, and A. J. Kirby (1993), \Modelling complexity: Applications of Gibbs sampling in medicine (with discussion)", Journal of the Royal Statistical Society, Series B, 55 39{52. Guerrero, V. M. and R. A. Johnson (1982), \Use of the Box-Cox transformation with binary response models", Biometrika, 69 309{314. Hastie, T. and R. Tibshirani (1987), \Non-parametric logistic and proportional odds regression", Appl. Statist., 36 260{276. Hastings, W. K. (1970), \Monte Carlo sampling methods using Markov chains and their applications", Biometrika, 57 97{109.

Binary regression

25

Hjort, N. (1986). Contribution to the discussion of Diaconis and Freedman (1986). Kingman, J. F. C. (1977), \The population structure associated with the Ewens sampling formula", Theoretical Population Biology, 11 274{283. Liu, Jun (1993), \Nonparametric hierarchical Bayes via sequential imputations", Technical Report R-429, Department of Statistics, Harvard University. MacEachern, Steven and Christopher Bush (1993), \A semi-parametric Bayesian model for randomized block designs", Technical Report A22, Institute of Statistics and Decision Sciences, Duke University. MacEachern, S. N. (1992), \Estimating normal means with a conjugate style Dirichlet process prior", Technical Report 487, Department of Statistics, The Ohio State University. Mallick, B. and A. E. Gelfand (1994), \Generalized linear models with unknown link functions", Biometrika. To appear. Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller (1953), \Equation of state calculations by fast computing machines", J. Chem. Phys., 21 1087{1092. Pregibon, D. (1980), \Goodness of link tests for generalized linear models", Journal of the Royal Statistical Society, Series C, 29 15{24. Prentice, R. L. (1976), \A generalization of the probit and logistic methods for dose response curves", Biometrics, 32 761{768. Rezvani, M., J. Fowler, J. Hopewell, and C. Alcock (1993), \Sensitivity of human squamous cell carcinoma of the larynx to fractionated radiotherapy", Brit. J. Radiol., 66 245{ 255. Sethuraman, J. and R. C. Tiwari (1982), \Convergence of Dirichlet measures and the interpretation of their parameter", Stat'l. Decision Th. and Related Topics III, in two volumes; Shanti S. Gupta, James O. Berger (Ed); Academic Press, 3(2) 305{315. Smith, A. F. M. and G. O. Roberts (1993), \Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods (with discussion)", Journal of the Royal Statistical Society, Series B, 55 3{24.

Binary regression

26

Stukel, T. (1988), \Generalized logistic models", Journal of the American Statistical Association, 83 426{431. Tanner, M. and W. Wong (1987), \The calculation of posterior distributions by data augmentation (with discussion)", Journal of the American Statistical Association, 81 528{550. Taylor, J. (1988), \The cost of generalizing logistic regression", Journal of the American Statistical Association, 83 1078{1083. Tierney, L. (1991), \Markov chains for exploring posterior distributions", Technical Report No. 560, School of Statistics, University of Minnesota. West, Mike, Peter Muller, and Michael D. Escobar (1993), \Hierarchical priors and mixture models with applications in regression and density estimation", Technical Report A02, Institute of Statistics and Decision Sciences, Duke University.

Binary regression

27

Table 1: Likelihood calculations for the two-parameter generalized logistic regression model: The test statistics are asymptotically 2 . Parentheses after the statistics contain degrees of freedom and p?value. The rst row gives the maximized loglikelihood. The next two contain statistics for the likelihood ratio and score tests, respectively.

Generalized Logit Logit Right Tail Left Tail Both Tails

Max LL -510.0 LR Test vs logit link Score Test vs logit link

-508.1 -510.0 -503.8 3.9 (1,.05) .04 (1,.85) 12.5 (2,.002) 6.9 (1,.01) .03 (1,.88)

8.6 (2,.01)

Binary regression

28

Table 2: Regression parameter estimates (estimated standard errors): The rst three columns of numbers refer to the parametric link model. The last column gives Monte Carlo estimates of posterior means and marginal posterior standard deviations under the semiparametric model. The last row gives 95% con dence intervals for  (and an equi-tailed posterior interval for the semiparametric case).

Link Parameters 1 2

Generalized Logit SemiLogit Right Tail Both Tails parametric -

.05 (.46) -

-.62 (.21) -4.0 (3.3)

-

-.71 (.19) -1.0 (.19) 5.9 (1.9) .66 (.86) -4.2 (1.2) .71 (.42,1.3)

-1.6 (1.7) -1.5 (.85) -2.1 (.93) 12 (4.6) .90 (1.6) -8.6 (3.2) .75 (.45,1.4)

-20.8 (17) -5.9 (4.7) -8.1 (6.2) 66 (50) 14 (12) -42 (32) .65 (?1; 1)

-3.7 (2.0) -1.0 (.50) -1.3 (.57) 12 (4.3) 2.1 (1.4) -6.6 (2.6) .56 (.38, .79)

Regression Parameters Intercept -.94 (.98) sII sIII nf  df =100 df  nf  df =300 t=100

time-dose tradeo

Binary regression

29

Figure 1: Pointwise 90% Prior Probability Bands for the Link Function Based on the Centrally Standardized Dirichlet Prior. The centering measure is a times the logistic distribution (shown as the solid line). The xed interquartile range is d = 2 log 3, matching the logistic. These bands are computed by simulating 1000 links from the prior. The dashed line in each panel shows the complementary-loglog link function, standardized to match the logistic.

1.0

a=5

probability

0.8

0.6 0.4 0.2

0.0 1.0

a=20

probability

0.8

0.6

0.4

0.2

0.0 -6

-4

-2

0

2

linear predictor

4

6

Binary regression

30

Figure 2: Posterior for . Based on a sample of n = 20 latent variables indicated at the top of the plot, shown is the posterior density for , the third quartile. The dashed line gives the uniform prior.



• ••• •••• •

• ••••

••

••



3

4

2.0

density

1.5

1.0

0.5

0.0 -1

0

1

2

linear predictor

Binary regression

31

Figure 3: Likelihood Contours for Parametric Analysis of BIR Data. Coordinates correspond to values of ( 2 ; 1), contours are in units of loglikelihood from the maximum. Solid squares indicate MLEs under di erent submodels. 1.5

logistic

1.0 left tail

-5 right tail

0.5 -3 right tail

0.0 -1 -0.5 both tails

-1.0 -4

-2

0 left tail

2

Binary regression

32

Figure 4: Output Monitoring. The horizontal axis in each plot indicates the cycle number in the subsampled Markov chain. Panel (a) shows the proportion of latent variables in the rst urn. Panel (b) shows the same for the third urn. As expected, these proportions uctuate around .25 (A priori they are multinomial.) Panels (c) and (d) show time series of regression coecients{the intercept parameter (c), and the total dose parameter (d). 0.5

0.5

(a)

0.4 0.3

0.3

0.2

0.2

0.1

0.1

0.0

0.0 0

1000

(b)

0.4

2000

3000

4000

(c)

0

1000

2000

3000

4000

2000

3000

4000

(d)

30 25

0

20 15

-5

10 5

-10

0 0

1000

2000

3000

4000

0

1000

Binary regression

33

Figure 5: Posterior Summaries. Panels (a) and (b) study the posterior distribution of the third quartile parameter . The autocorrelation function of the time series of 4000 subsampled  values is shown in (b). The histogram in (a) estimates the marginal posterior density of . The uniform prior is indicated as a at line. Information about the number of clusters among the latent variables fuig is in panels (c) and (d). Panel (d) shows that very little autocorrelation exists for this summary of the state. Panel (c) estimates the marginal posterior distribution of this cluster count. 0.8

1.0

(a) autocorrelation

(b)

density

0.6

0.4

0.2

0.0

0.8 0.6 0.4 0.2 0.0

0.5

1.0

1.5

2.0

0

20

40

60

80

100

60

80

100

lag

theta 1.0

(d) autocorrelation

(c)

probability

0.15

0.10

0.05

0.8 0.6 0.4 0.2 0.0

0.0 5

10

15

20

number of clusters

0

20

40

lag

Binary regression

34

Figure 6: Time-Dose Trade-O . Panel (a) shows the time series of this ratio from the subsampled Markov chain of length 4000. Panel (b) shows that relatively little autocorrelation exists in this series. A histogram estimator of the marginal posterior density of  is shown in (c). 1.0

(a)

(b) autocorrelation

1.2

gy/day

1.0 0.8 0.6 0.4

0.8 0.6 0.4 0.2 0.0

0.2 0

1000

2000

3000

4000

0

20

40

6

density

5

60

80

lag

cycle

(c)

4 3 2 1 0 0.2

0.4

0.6 gy/day

0.8

1.0

1.2

100

Binary regression

35

Figure 7: Response Probability. For a treatment schedule having a total dose of 65 gy over 19 fractions applied to a stage I tumor, plotted are estimates of the probability of local control as a function of treatment time. The solid line is the estimated posterior mean for the semiparametric model. The dotted line is the t under logistic regression and the dashed line is the t under the two parameter link model. The shaded area is a Monte Carlo approximation to a pointwise 90% posterior region. 1.0

probability

0.8

0.6

0.4

0.2 0

20

40 time (days)

60

80