model selection with vague prior information

0 downloads 0 Views 698KB Size Report
proper priors in model selection and hypothesis testing. Special attention is paid to the intrinsic and fractional methods as tools devised to produce proper priors ...
Los valores entre paréntesis están basados en las SST. Los restantes valores en el nivel del mar Rev.R.Acad. Cien. Exact. Fis.Nat. (Esp) Vol. 92, N.o 4, pp 289-298, 1998 Monográfico: Problemas complejos de decisión

MODEL SELECTION WITH VAGUE PRIOR INFORMATION (AsymptoticslBayes factor/Hypothesis testing/Intrinsic priorslModel comparison) ELÍAS MORENO*,

*

F.

JAVIER GIRÓN** AND M. LINA MARTÍNEZ**

Departamento de Estadística e 1.0. Facultad de Ciencias. Universidad de Granada, Campus de Fuente Nueva, s/n. 18071 Granada.

Spain. **

Departamento de Estadística e 1.0. Facultad de Ciencias.. Universidad de Málaga. Campus de Teatinos, s/n. 29071 Málaga.

Spain.

ABSTRACT In the Bayesian approach, the Bayes factor is the main too} for mode} selection and hypothesis testing. When prior information is weak, "default" or "automatic" priors, which are typicaIly improper, are commonly used but, unfortunately, the Bayes factor is defined up to a multiplicative constant. In this paper we revise sorne recent but alreadypopular methodologies, intrinsic and lractional, to deal with improper priors in model selection and hypothesis testing. Special attention is paid to the intrinsic and fractional methods as tools devised to produce proper priors to com" pute actual Bayes factors.

Lo anterior se ilustra con ejemplos de problemas de contraste de hipótesis, en particular con el problema de Behrens-Fisher.

1.

Introduction

Suppose we observe a data vector x = (XI' Xz, ••• , xll ) and assume that they are U.d. replications from either model

MI : {.t;(xI8¡), 1r¡(8¡)} or Mz: {J;(xI82 ), 1rz(8z)}. The model selection problem is that of choosing between one of these two models. For a given prior P on the set {MI' M 2 }, and under the 0-1 loss function, it is easily seen that the optimal decision consists in choosing MI if and only if the inequality

Sorne illustration to hypothesis testing problems with more than one population are given, in particular the Behrens-Fisher problem is considered.

RESUMEN En el enfoque bayesiano el factor de Bayes es la herramienta principal para la selección de modelos y para el contraste de hipótesis. Cuando la información a priori es escasa, generalmente 'se suelen utilizar distribuciones impropias, pero desafortunadamente, en este caso, el factor de Bayes queda definido salvo una constante multiplicativa. En este artículo se pasa revista a algunas de las recien" tes metodologías, a saber intrínseca y fraccional, para solventar el problema de la utilización de distribuciones a priori impropias en los problemas de selección de modelos y contraste de hipótesis, que sin embargo ya son habituales en la práctica. Se presta atención especial a los métodos intrínseco y fraccional, como herramientas diseñadas para proporcionar distribuciones a priori propias con las que calcular el factor de Bayes.

p(M¡lx) p(Mzlx)

~

(1)

1,

holds. The ratio of the posterior probabilities of the models in (1) is given by

p(M¡lx) m¡(x) P(M¡) p(Mzlx) = m2 (x) P(Mz)' where

m¡(x)

= fe¡.t;(xI8¡) 1r¡(8¡) d8

p

(2)

i = 1,2, is the mar-

ginal distritubion of the data under Mi' Other interesting loss functions for model selection can be found in San Martini and Spezzaferri (1984) and Bernardo and Smith (1994). The first factor appearing in the right hand side of (2), is known as the Bayes factor (BF) of model MI against M 2• It is usually denoted as Bdx). It is the ratio of posterior to prior odds and it can be seen as a measure of evidence given by the data in favor of (or against) MI' The logarithm of B 12 (x) is called by Good (1984) the weight 01 the evidence provided by the data.

Problemas complejos de decisión: Elías Moreno et al.

290

Rev.R.Acad.Cien.Exact.Fis.Nat. (Esp), 1998; 92

The BF does not depend on the prior P and this is seen as an attractiveproperty by many authors (Box 1980, Kass and Raftery 1995, Sawa 1987). For nested models, that is for models that satisfy (i)

e1

e 2,

e

an important property of the BF is that It IS consistent (O'Hagan 1994). That means that for any 81 E e¡ and 82 E e 2 , lim

.[Pe ] B12 (X)

Jl-'?OO·

1

=

00,

lim[Pe ] B12 (x) .2

11,......-700

= o.

Consequently, for any prior P, it follows that asymptoticaIly p(M¡lx) = 1 when sampling from MI and p(M2Ix) = 1 when sampling from M2• Therefore, by using BF's we always choose the true model, asymptotically. On the other hand, O'Hagan (1995) argued that the BF is highIy sensitive to the prior distribution on the parameter in the model, even when the sample size increases. In fact, an approximation to the marginal density m¡(x) is given by

-210g

¡;(xI02{x)) where d¡ is the dimension of the parameter space e¡ and O¡(x) is the MLE of O¡. Notice that the BIC corrects the likelihood ratio by the factor exp{- (d¡ - dJ log -Jn}. Other methods have been proposed, see, for instance, Aitkin (1991), Berger and Pericchi (1996a, 1996b, 1997a, 1997b, 1998), De Santis and Spezzaferri (1996, 1997, 1999) Geisser and Eddy (1979), Kass and Wasserman (1995), Moreno, Bertolino and Racugno (1997, 1998a, 1998b, 1999), Smith and Spiegelhalter (1980), and Verdi" nelli and Wasserman (1995). The simplest method is simply to set an initial condition on the BF to determine the arbitrary constant c¡lc2' The idea was proposed by Smith and Spiegelhalter (1980) for nested model situations. They consideran «imaginary training sample» Xo having the smallest sample size for which < m¡(xo) < 00, i = 1, 2 and such that it would most favor the simplest model. Then, B¡~(xo) is set equal to one and so determines the constant c¡lc 2 • This yields

°

c¡ _

c2

where o¡ is the MLE of O¡ and V¡ is the modal dispersion matrix. This fact is particularly disturbing when the prior information on the parameters is weak, as assumed throughout this paper, and the prior n¡(O) is replaced by a default prior n~ (OJ The difficulty arises because the default prior is typically improper, so that it must be written as

where h¡(O) is a function whose integral diverges. Therefore, the normalizing constant c¡ cannot be specified and, as a consequence, the BF with improper priors is now written as B~(x)

5- fe/;(xI01) h¡(0¡) dO¡ C2

fe, f2(xI0 2 ) hz(OJ d02 '

which is defined up to the multiplicative constant C¡!C 2' To avoid such an indetennination is far from trivial. The Bayesian Information Criterion (BIC) proposed by Schwarz (1978), uses an asymptotic expansion of the marginal densities of the data, and the ápproximate Bayes factor is obtained by simply deleting the term that contains the priors. More precisely, the approximate Bayes factor B~2(X) is given by

~( xI01(x))

fe?(xoI02)~(02)d02

-

f~{xoI8¡)h¡(81)d8¡ ,

so that the resulting approximation, say BIS; (x), to the BF is

Unfortunately this procedure is biased toward the simpler model since the BF is set egual to one for a sample point that favors the simpler one. Furthermore, even when the imaginary training sample is considered in a symmetricsituation with respect to the two models, the resulting BF does not necessarily work well. Let us illustrate this point with the followingsimple example. Example 1. Suppose we are interested in testing the hypotheses H I : X has a density N(xIO, 1) with O :::;; 0, versus H 2 : X has a density N(x lo, 1) with O.;:::: O. Assuming the conventional uniform prior for O, the models to be compared are,

M¡ : {N(xlol'

1),

n~(Ol) = c1l(li¡'; O)(O¡)},

M2 : {N(xI82, 1), n~(02) = C21(e2~o)(02)}' For a given data set x be

= (x¡, x2,..., xJ, the BF turns out to

Problemas complejos de decisión: Elías Moreno et al.

Rev.R.Acad. Cien. Exact. Fis.Nat. (Esp), 1998; 92

- el>(XII ..Jñ) el>(XII ..Jñ) where X11

= ..!.n "" x· Li=l

1

and el> represents the cumulative

standard normal distribution. An obvious condition on the BF would be that for XII = 0, both models are equalIy likely. This yields Bl~ (O, n)

=

1 - el>(xlI ..Jñ)

(_ C) el> xll"V n

Its asymptotic behavior is as follows. When sampling from Pe = N( 8, 1), the distribution of ti> (XII is, for any u, (O < u < 1), given by

-Iñ)

HII(u) = Pe[el>(xll..Jñ) ~ u] = Pe(xll..Jñ ~ -I(U)]

= el>(cP-l(u)

sorne values of (XII' n), the resulting values of BlZ(xll , n) are given in the second column of Table l. The posterior probabilities of MI' computed by using the prior P(M I ) = P(M 2) = 112, are given in the third column of Table l. For comparison purposes the Schwarz's criterion or statistic is also computed and displayed in the fourth column of Table l. The corresponding posterior probabilities are also given in the fifth column of the Table 1.

= 1,

and therefore c l = c 2• The resulting BF is _ ) B1Z ( x ll ' n

291

=

(xn,n)

B12 (xll , n)

p(M1Ixn , n)

B(2(Xn , n)

pS(Mllxn , n)

(-3, 1)

739,8

0,999

90,0

0,99

(-2, 1)

43,0

0,977

7,4

0,88

(-1, 1)

5,3

0,841

1,6

0,63

(O, 1)

1,0

0,50

1,0

0,50

Table 1.

Values ofHayes factors and posterior probabilities.

Table 1 shows that for sample mean values less than three standard deviations from the population mean 82 of

the model {N (xJ8 z' 1), 82 ~ O}, B1Z(xn , n) is unreasonably large. For instance, the evidence provided by (-2, 1) in favour of model MI is equal to 43, which seems to be extremely large. Notice that for (~2, 1) the likelihood ratio statistic is equal to 7.4.

- 8..Jñ).

Thus,

From now on, we will adher to the following principIe stated by Berger and Pericchi (l996b): Therefore,

Principie. Methods that correspond to the use of plausible default (proper) priors are preferable to those that do not correspond to any possible actual Bayesian analysis.

where U is a random variable uniform (O, 1) distributed, and

To impose a condition on BlZ (XII' n), as we did in the preceding example, defines the arbitraryconstant c¡lc 2 in the Bayes factor but there is no guarantee that the resulting number corresponds to an actual Bayes factor for sorne reasonable prior distribution.

°= 00

lim [Pe=o] B1Z(xll , n)

Il~-

°

{if 8 < 0, 1'f 8 > .

That is, consistency holds for values of 8 different from zero, but consistency does not hold when sampling from the N(xIO, 1) distribution. In this example the likelihood ratio coincides with the Schwarz's criterion and it is easy to see that {if 8 < 0, lim [Pe] B~(xlI' n) 1 = if 8 = 0,. 00

ll-t

oa

°

In our opinion, when we use improper priors there is no base to determine the constants c I andc2 as long as they are «normalizing» constants of non-integrable functions. Notice that in this problem the Schwarz criterion does correspond to an actual Bayes factor. Kass and Wasserman (1992) suggested that for nested situations the Schwarz criterion does correspond, asymptotically, to an actual Bayes factor.

which shows a more reasonable asymptotic behavior than that of B1Z(xlI , n).

In this paper we will focus only on methods that deal with improper priors for model selection that are based on «partial Bayes factors», which may be considered as a second order correction of the me criterion, and we will discuss them in the light of the aboye PrincipIe.

Furthermore, B1Z(xlI , n) as a measure of the evidence provided by the data in favor of MI is not acceptable. For

The paper is organized as follows. Section 2 brief1y summarizes the partial, the intrinsic and the fractional

.

.

if 8> 0,

292

Bayes factors. Section 3 discusses intrinsic and fractional priors. These are priors for which the corresponding Bayes factors are, respectively, asymptotically equivalent to the intrinsic Bayes factor (IBF) and the fractional Bayes factor (FBF). Section 4 gives sorne applications of the intrinsic and fractional methodologies to sorne important testing problems related to more than one population. Section 5 gives sorne extensions and lists sorne open problems, and in Section 6 sorne concluding remarks are given.

2.

Rev.R.Acad. Cien. Exact. Fis.Nat. (Esp), 1998; 92

Problemas complejos de decisión: Elías Moreno el al.

Partial, Intrinsic and Fractional Bayes factors

The partial Bayes factor (PBF) was introduced by Leamer (1978)and the basic idea is as follows. The sampie x = (Xl' X 2, ... , X ll ) is split into two part (x(l), x(n - 1)). Part x(l), called training sample, is intended for converting the improper prior into a proper posterior, that is

where L is the number of minimal training samples contai" ned in the sample. Other ways of «averaging» the PBF's are possible. For instance, by taking a geometric mean, the geometric intrinsic Bayes factor results in

B:i(x)

which iscalled by O'Hagan (1995) a «partial Bayes facton>. Equivalently, (4) can be written as

The PBF corrects the original Bayes factor Bl~(X) with the correction term B~(x(l)), and by so doing the arbitrary constant c¡!c 2 cancels out in (5). It should be noted that for a given sample x we can typically consider different training samples x(l). Therefore, there are a muItiplicity of PBF, one for each training sample. The difficuIty is to know which of them is to be chosen. To avoid this difficulty Berger and Pericchi (1996b) first suggest considering aH the possible training samples with minimal size. That is, all subsamples x(l) such that O < m¡N(x(l)) < 00 for which there is no proper subsampIe that satisfies these inequalities. They termed this training sample a minimal training sample. Second, they take the arithmetic mean of the PBF's for all minimal training samples. This produces the so-called arithmetic intrinsic Bayes (IBF) factor, defined as

B:i(x)

= B,~(x) 1 ±B~(x(l)), L

1=1

(6)

UB~(x(l)) }1I L

B,~(x)

{

L

By taking the median of the PBF's, they obtain the «median» intrinsic Bayes factor. Notice that these empirical functions re-sample the sample and, although they are termed by their authors «Bayes factors», they are not actual Bayes factors. Their behavior is studied by Berger and Pericchi (1996a, 1996b, 1997a, 1997b) in various contexts. An alternative way of avoiding the arbitrariness of choosing the training sample for which the PBF is computed, was suggested by O'Hagan (1995). Instead of using either a training sample, or all possible minimal training samples, he replaces the correction term B~ (x(l)) in (5) by J

where x(l) is such that O < m¡N(x(l)) < "". With the remaining part of the data x(n - 1) the Bayes factor is computed by using (3) as the prior. This gives

(4)

=

F;Jx)

f{h(xle z )};; Jr~(ez) dez 1

(7)

f{¡;(xle,)F Jr~(el) del

This way, he defines the fractional Bayes factor (FBF) as

Other fractions than lln can be considered. In fact, O'Hagan (1995) argues that a larger fraction would reduce sensitivity to the prior and he also proposes using the fractions long nln or .¡;;/ n. In favour of the fraction lln, however, many compelling arguments can be found in the literature (Ber" ger and Mortera 1995, Kass and Wasserman 1995, Moreno 1997). We also remark that even the name of fractional Bayes factor is misleading as it is not really an actual Bayes factor. Advantages and disadvantages of the aboye IBF's and FBF's have been extensively studied; see, in particular, the comparisons between the IBF and FBF by O'Hagan (l995, 1997), the criticisms on the stability of the IBF with respect to the sample by Bertolino and Racugno (1996), the review by De Santis and Spezzaferri (1997) and the criticisms and comparisons by Berger and Pericchi (1997a).

3.

The intrinsic and fractional priors

From a theoretical viewpoint, the important question on the IBF and the FBF is to know wether it corresponds to an actual Bayes factor for sensible priors. If so, it provides a Bayesian justification for these methods, and consistency of these empirical Bayes factors is automatically satisfied.

Problemas complejos de decisión: Elías Moreno et al.

Rev.R.Acad.Cien.Exact.Fis.Nat. (Esp), 1998; 92

293

With the so"caIled intrinsic priors, this question was solved asymptoticaIly by Berger and Pericchi (1996b). These are priors 1t¡(e¡) and 1t2(e2) for which the correspon" ding Bayes factor. is a degenerate random variable, the fractional priors

(1t'¡(e¡), 1t'z( ez)), -priors for which their associated Bayes

= Se,h(xle2 ) 1t'2(eZ) dez

factor and B~(x) are asymptoticaIly equivalent under the two models MI and M2~' are the solutions to the functio" nal equation

fe/;(xle¡) 1t'¡(e¡) del , and B~I(X) are asymptoticaIly equivalent under the two models M¡ and M2. By equating the limit of B;{(x) and B2¡(x) as n ~ under the two models

1t'z(ez) n~(VlI(ez)) 1t'~(e2) 1t',(Vldez))'

(10)

00

It can be shown (Moreno 1997) that the solutions to this equation are any pair (n¡(e¡), n2(eZ))' where 1t'¡(e¡) is any member of the class

Berger and Pericchi (1996b) showed that the intrinsic priors satisfy the functional equations with

EMIBN( (1)) = 1t'z(Vlz(e,)) 1t'~(e,) el IZ x 1t'nVI z( el)) 1t'1(el ) '

(8)

E;:2B¡~(x(l))= n~(e2)

(9)

"

1t'n Vl ¡(e2)). n2 (e2) n¡(VI¡(ez))

The expectations in equations (8) and (9) are taken with respect to j¡(x(l)le¡) and Jix(l)le2), respectivelY;Vl2(e¡) denotes the limit of the maximum likelihood estimator 82(x) under model M¡ at the point el' and VI¡(e2) the limit of 8z(x) under model Mz at the point ez (see also Cox (1961), Huber (1967) and Dmochowsky (1994)). Unfortunately, these equations may have many solutions or may not have a solution. When the models under comparison are nested, say J¡ (xle¡) is nested in!2(xlez), the intrinsic priors always exist,

under sorne miId conditions, and they are shown to be any pair (Moreno, Bertolino and Racugno' 1998b), (n¡(e¡), n2( ( 2 )) such that (a)

1t'¡(el) is any prior in the class

Unfortunately, the intrinsic and fractional priors are classes of distributions for which robustness of their Bayes factor cannot be asserted (Moreno 1997, Moreno Bertolino and Racugno 1998). A procedure for choosing intrinsic and fractional priors in their respective classes, was proposed by Moreno, Ber" tolino and Racugno (1998b). Their procedure considers the restriction of the improper prior for the simplest model, say 1t'~ (e¡), to a subset on which it integrates to a finite quantity. More precisely, the proper prior 1t';"(e,) = 1C~ (el) / km le) e,) is considered, where Cm is an increasing sequence of subsets of the parameter space 0¡ such that n~ (el) de, = km and lim m -7= Cm = el' For this prior

t

the corresponding n~·IIl(e2) is then computed. Then, nH(2 ) = lim llH= ni,rn (e2 ). It can be shown that the latter prior is a unique density. This procedure gives the intrinsic prior

The same procedure applies with the fractional metho" dology but, unfortunately there not always exists a probability density n2 ( ez)' Whenever it exists, the fractional priors are given by Similarly, if the nested models satisfy that

294

Problemas complejos de decisión: Elías Moreno et al. The resulting Bayes factor for the intrinsic priors is

f~2 (xle 2) trf (e2) E~z B~ (x(l)) de2 ;::

f¿ (xlOd trf(0 )dO¡ 1

and, for the fractional priors,

st; (x)

f~2(xI02)

trf(02) F]2(e2)de2

f¿ (xle¡) trf (0 )dO 1

I

Let us illustrate the procedure with a simple example. Example 2.

RevoRoAcadoCienoExact.FisoNat. (Esp), 1998; 92

The Behrens-Fisher problem consists in testing Ho : ,ul ;:: J12 against one of the alternatives ,ul < ,u2' ,ul > ,u2 or ,u¡ ":F J12, assuming that (Jf, (Ji are unrelated. Here we focus on the alternative H[ : ,ul ":F ,u2' Under the frequentist point of view this prob1em poses the difficulty that the normal-theory linear mode1 cannot be applied because of the presence of the two unrelated variances (Jf, (Ji. Sorne approximations to the solution have been given by Fisher (1936), Wald (1955), and Wel, eh (1947), and their relative merits are discussed in Mehta and Srinivasan (1970) and Pfanzagl (1974). Under the Bayesian viewpoint the Behrens,Fisher problem can be foeused as one of mode1 comparison. The models to be compared are

Suppose we want to compare model

M] : N(xIO, 1) versus MI : {N(xI8, 1),

N

1t

t"t) N(yl,u, t"i), Mz :h(zIOz) ;:: N(xl,u¡, at) N(YI,uz, ai),

(8) ;:: e21((J~a)(0).

MI : Ji (zlel )

Rere the prior for model M] is concentrated on e;:: O, and the prior for model MI is an improper uniform distribution. It is easy to see that the intrinsic priors are

Rowever, the fractional priors do not existo Indeed, the resulting fractional priors are

N(elo, 1) (-e)

;::

N( xl,u,

where z;:: (x, y), el ;:: (,u, ti' t"2) ande2 ;:: (,u¡, ,u2' al' ( 2)· The usual priors for ,u¡, ,u2' 10g al' log a 2 are independent uniform distributions over (-00,00), (Jeffreys, 1961). They are appropriated for inference purposes on the ,u's and, in fact, the posterior distribution of ,ul - ,u2 is a linear trans, formation of the so,cal1ed Behrens"Fisher distribution (Lindley, 1970). However, these priors are improper and can be written as

trf(e¡) ;:: -.SL IRXR+xR+(,u, t"¡, t"z),

1((J~O) (e).

t"¡t"z

trf(e2) ;:: ~ lRXRxR+xR+(,u¡, ,uz, al' az), a la2

But

f~¡(o) de

< 1,

does not integrate up to one. Therefore, in this examp1e, the fractional prior for model MI is not a probability den, sity.

4.

;::

Testing problems for which frequentist methods fail, can be solved with the help of the intrinsic and fractional methodologies. A famous examp1e is the well-known Be, hrens,Fisher problem

at), N(yl,u2.' ai) be two normal distribu-

ar, ai

are tions where the means ,ul' J12 and variances unknown. Samples x ;:: (x!' x2, '0" x" 1), y ;:: (YI' Y2' ..., y" z), of sizes ni and n2, respectively, are taken, and the samples means and variances are denoted by 2 . 1 s}" respective y,

x and

Lemma 1.

(Moreno, Bertolino and Racugno, 1997)

For eaeh point, ¡.t, t"¡, t"z, the intrinsie prior oi O2 is

AN APPLICATION TO THE BEHRENS-FISlIER PROBLEM

Let N(xl,u¡,

thus depending on arbitrary positive eonstants el' e2, which make them unsuitable for testing Ha versus H¡, However, intrinsic priors can be derived for the Behrens-Fisher pro, blem.

s; and ji and

where HC"(a;i0, t"¡) denotes the half Cauehy density. Integrating out the variables (,u, t"¡, t"2) with respect to the default prior trf(ed, renders the unconditional intrinsic prior of O2, ¡ri(oz)

=

clf



RxR xR'

TIN(/lil/l' al i=l

d'f2' 2+ 'fl)Hc+(ai1o, 'fi) _I-d/ld'fl 'f1'f Z

The integral in this expression cannot be given in elosed formo However, this is not a serious inconvenient.

Problemas complejos de decisión: Elías Moreno et al. Similarly, the fractional priors can also be obtained. Lemma 2.

(Moreno, Bertolino and Racugno, 1997)

Rev.R.Acad. Cien. Exact. Fis.Nat. (Esp), 1998; 92

295

and several values of x-y. In Table 2, the corresponding values of Bfl' Bit, and Bil are displayed.

For each point p, TI' T2, the fractional prior of 82 is given by

where HN+( O'lIO, 'T; 12) denotes a half Normal density.

!x - y!

P-values (Welch)

Bil

B{¡

Bil

0.00

1

0.36

0.15

0.20

2.20

0.29

0.69

0.29

0.35

4.22

0.05

3.10

1.04

1.32

5.00

0.02

6.33

1.96

2.49

10.0

0.0001

> 100

> 100

> 100

Tllble 2.

The intrinsic nHpI' P2' 0'1' 0'21p, TI' 'T2) and fractio-

P-values of Bfl'

Bit, and Bil

nal nf(PI' P2' 0'1' 0'21p, 'TI' 'T2 ) priors are quite reasonable. The parameters (Pi' 0'), i = 1, 2, are exchangeable in both the intrinsic and the fractional priors. This is sensible as long as the labels given to the populations are somewhat arbitrary.

Table 2 shows that Bit and Bil have a similar behavior. For smaIl values of the difference x = y, they clearIy favour MI and as the difference grows the model M 2 is strongly favored. That behavior is sensible.

The main differences between the intrinsic and fractional priors nH82 1p, TI' T 2 ) and nf(82 Ip, 'TI' T 2 ), are that

The Schwarz approximation Bfl favours the complex model M2 for smaIl departures of x = y, from zero. In particular, for x = y= 5 the evidence in favour of M 2 is larger than the one given by Bit and Bil'

the tails of the densities of the O';s are thicker for the intrinsic than for the fractional, and that (Pi' 0') are dependent in the intrinsic whereas they are independent in the fractional prior. With the aboye intrinsic and fractional priors the corresponding Bayes factors Bi¡(x, Y), and B{¡(x, y), can be computed. For comparison purposes we also consider the Schwarz approximation. It can be shown (Moreno, Bertolino and Racugno 1997), that

where n = nI + n 2 and equation

p.

is the solution to the likelihood

that is, the point that maximizes the likelihood function of model MI' Let us ilIustrate the behavior of the aboye Bayes factors with the data taken from Box and Tiao (1973, example 2.5.4., p. 107). nI

20,

s2

n2

12,

s2

.r

y

= 12, = 40,

For a = 0.05, the WeIch approximation gives the critical region WO.05 = {x, y : !x -yl ~ 4.22}. This con~ clusion is in agreement with the Schwarz approach but is in disagreement with those given by either

5.

Bil or Bit.

EXTENSIONS AND OPEN PROBLEMS

The application of the intrinsic and fractional metho~ dologies to non-regular models have been discussed by several authors. The kind of non-regular models conside~ red include models with improper likelihoods as those appearing in mixture models, likelihoods where the sample space depends on the parameter space, models that have an increasing muItiplicity of parameters as in the Neyman~ Scott problem, or non-nested models as that considered in Example 1. We remark, however, that non-nested models have not yet an acceptable solution nowadays. Sorne particular cases, for instance, the cIass of models that have a given invariance structure are an exception (Berger, Pericchi and Varshavsky (1996) and Moreno, Bertolino and Racugno (l998b». Sorne solutions to the problems listed above appear in the papers by Berger and Pericchi (1997), Shui (1998), Moreno and Liseo (1998) and Moreno, Bertolino and Racugno (1999), and we do not review them here. Another exception is that of testing nonnested models for which there exist a model nested to both. Example 1 is in this situation. In this case, the problem can be solved via

Problemas complejos de decisión: Elías Moreno el al.

296

transforming it into an equivalent testing problem for a nested model. Let us illustrate this point. Example 1 (continued). By considering the model M 3 : {X has density N(xl,O, l)}, which is nested to MI and also to M 2, we can consider the Bayes factor

where we are assuming that the prior 7r3 appearing in the Bayes factor B13 (xn , n), is exactly the same than that appearing in the Bayes factor Bdxn , n). In Example 2 the intrinsic priors for comparing model M 3 versus MI was seen to be

It is easy to see that the intrinsic priors for comparing model M 3 versus M 2 are given by

Rev.R.Acad.Cien.Exact.Fis.Nat. (Esp), 1998; 92

(Xn, n) B(2(Xn, n) p/(M¡lx",

• a(x,xn,n)(- x+n.X)[(-x)t f +n = ( -) , a(x,Xn,n) (x+ nx ) [(-x)t f l+n dx

=

I (-) B12 xn,n

1

-00

where

_)n = exp{2 ( xn' -x (n ( ) 21+n

+ -1) +

X,

The asymptotic behavior of

2

lim

[Pe] Bfz{xn , n) =

,

nx }

X - - - ••

pS(MIlxn, n)

7.1

0.88

90.0

0.99

(-2, 1)

3.2

0.6

7.4

0.88

(-1, 1)

1.7

(O, 1)

1.0

0.63 0.50

1.6 1.0

0.63 0.50

Table 3.

Values of

B¡z(xn , n)

For values of the sample mean less or equal than three standard deviation from the mean ez of the altemative model, the intrinsic posterior probabilities of the model M¡ are more reasonable than those given by the Schawarz's approach. For instance, for (xn , n) = (-3, 1), the posterior probability with the intrinsic prior is 0.88 while with the Schwarz it is 0.99, which seems to be too large. We recall that for this example there .are no fractional priors.

CONCLUSIONS

Improper priors are useful tools when prior information is weak. Furthermore, they are seen by many authors as the basic tools for doing «objective» or «scientific» inference. Our position is that a better analysis of a problem is obtained by incorporating prior information as described by a probability density. However, we recognize that a carefully elicitation on the prior distributions is not always possible and then automatic methods are necessary.

{

However, improper priors are not suitable for a direct application of the Bayes theorem to compute posterior probabilities. The prize to pay is a more elaborated analysis to derive a proper prior from the improper with which we start.

l+n.

B(2(Xn , n) is 00

n-:;oo.

n)

dx

=

-00

a

Bfz(x",

(-3, 1)

6.

Therefore, for a given sample x = (XI' X2"'" X,,), the Bayes factor for intrinsic priors Bfz{xn , n), tums out to be

n)

if e < O,

1 ife = O. o if e < O,

.

so that the Bayes factor for the nonnested models M¡ and

This is, in fact, what the training sample device suggests. We should say that even when empirical measures of evidence, as the variety of partial Bayes factor presented in this paper, are interesting in their own, we feel that we are more in agreement with the Bayesian paradigm if we use these partial Bayes factors as a tool to derive sensible prior distributions. In other words, we should act in agreement with the Berger and Pericchi's principIe stated in the Introduction.

M 2, with the intrinsic priors is consistent.

For sorne values of

(xn , n)

the resulting values of

B(2(Xn , n) are given in the second column of Table 3. For the prior P(M¡) = P(Mz} = 112, the posterior probabilities of model MI' say pI (M,lxn , n), are given in the third column of Table 3. The Schwarz's criterion to the Bayes factor and their corresponding posterior probabilities of model MI are displayed in the fourth and fifth columns, respectively.

In practice, Bayes factors with fractional and intrinsic priors are very close to each other. While it seems that fractional priors are easier to be obtained than the intrinsic priors (because to calculate a limit in probability is typically less involved than an expectation), they may not always exist and this alone is a strong reason in favor of the intrinsic priors. Interesting enough is the fact that many challenged problems, for which the standard classical theory does not

Rev.R.Acad.Cien.Exact.Fis.Nat. (Esp), 1998; 92

Problemas complejos de decisión: Elías Moreno et al.

apply, do have, however, a default and sensible Bayesian solution.

15.

Cox, D.R. (1961) «Test of separate families of hypothesis», Proceedings of the Fouth Berkeley Symposium, 1, 105-123.

16.

Dmochowski, 1. (1996) «1ntrinsic Bayes factor via KullbackLeibler geometry», in Bayesian Statistics 5, eds. 1.M. Bernardo et al., London: Oxford University Press.

17.

De Santis, F. & Spezzaferri, F. (1996) «Comparing hierarchical models using Bayes factor and fractional Bayes factor», in Bayesian Robustness, eds. 1.0. Berger, B. Betró, E. Moreno, L.R. Pericchi, F. Ruggeri and L. Wasserman, Hayward, C.A.: 1nstitute of Mathematical Statistics, vol. 29, pp. 305-314.

18.

De Santis, F. & Spezzaferri, F. (1997) «Alternative Bayes factor for model selection», Canadian Joumal of Statistic (to appear).

19.

De Santis, F. & Spezzaferri, F. (1999) «Methods for Default and Robust Bayesian Model Comparison: the Fractional Bayes Factor Approach. 1ntemational Statiscal Review (to appear).

20.

Fisher, R.A. (1936) «The fiducial arguments in statistical inference», Ann. Eugen., 6, 391-422.

21.

Geisser, S. & Eddy, W.F. (1979) «A predictive approach to model selection», Journal of the American Statistieal Association, 74, 153-160.

ACKNOWLEDGMENTS This research was supported by grants PB96-1388 and PB97-1403-C03 from the Ministry of Education, Spain.

297

REFERENCES 1. Aitkin, M. (1991) «Posterior Bayes factors», Joumal of the Royal Statistical Society, Ser. B, 53, 111-142. 2.

Berger, 1. O. & Mortera, 1. (1995) Discussion of O'Hagan. Joumal of the Royal Statistical Society, Ser B, 57, 99-138.

3.

Berger, 1.0. & Pericchi, L.R. (l996a) «The intrinsic Bayes factor for linear models», in Bayesian Statistics 5, eds. 1.M. Bernardo et al., London: Oxford University Press, pp. 23-42.

4.

Berger, 1.0. & Pericchi, L.R. (l996b) «The intrinsic Bayes factor for model selection and prediction», Joumal of the American Statistical Association, 91-109-122.

5.

Berger, 1.0. & Pericchi, L.R. (l997a) «On the justification of default ans intrinsic Bayes factor», in Modeling and Prediction, eds. 1.C. Lee et al., New York: Springer-Verlag, pp. 276293.

22.

Good,!.J. (1985) Weight of evidence: a brief survey», in Bayesian Statistics 2, eds. 1.M. Bernardo et al., Valencia: Valencia University Press. pp. 249-270.

23.

Berger, 1.0. & Pericchi, L.R. (l997b) «Accurate and stable Bayesian model selection: the median intrinsic Bayes factor», Sankya B, 60, 1-18.

Huber, P. (1967) «Thebehaviour of maximum likelihood esti" mators under nonestandard conditions», Proceedings ofthe Fifth Berkeley Symposium, 1, 221-233.

24.

Lindley, D.V. (1970) An 1ntroduction to Probability and Sta" tistics fram a Bayesian Viewpoint (Vol. 2). Cambridge University Press, Cambridge.

25.

Berger, 1.0., Pericchi, L.R. & Varshavsky, 1. (1996) «Bayes factors and marginal distribution in invariant situations», Technical Report 95-7C, Purdue University, Dpto. Statistics.

Kass, R.E. & Wasserman, L. (1995) A Reference Bayesian Test for Nested Hypotheses and lts Relationship to the Schwarz Criterion. Joumal of the American Statistical Association, 90, 928-934.

26.

Kass, R.E. & Raftery, A. (1995) «Bayes factors», Joumal of the American Statistical Association, 90, 773-795.

9.

Bernardo, 1.M. & Smith, A.F.M. (1994) Bayesian Theory, New York: Wiley.

27.

Leamer, E.E. (1978) Specification Searches: Ad hoc inference with nonexperimental data, New York: Wiley.

10.

Bertolino, F. & Racugno, W. (1996) «1s the intrinsic Bayes factor intrinsic?», Metron, 1, 5-15.

28.

11.

Box, G.E.P. (1980) «Sampling and Bayes' inference in cientific modleing and robustness» (with discussion), Joumal of the Royal Statistical Society, Ser. A, 143, 383-430.

Moreno, E. (1997) «Bayes Factor for 1ntrinsic and Fractional Priors in Nested Models: Bayesian Robustness», in L,-Statistical Procedures and Related Topics (Vol. 31), Hayward, C.A.: lnstitute of Mathematical Statistics, vol. 29, 257-270.

29.

Moreno, E., Bertolino, F. & Racugno, W. (1997) «Default Bayesian Analysis of the Behrens-Fisher problem», (Accepted for publication in the Joumal of Statistical Inference and Planning).

30.

Moreno, E., Bertolino, F. & Racugno, W. (1998a) «Default approacher to compare means of normal populations» (with discussion), Proceedings of the workshop On Mode! Selection, ed. W. Racugno, Bologna: Pitagora, 133-155.

31.

Moreno, E., Bertolino, F. & Racugno, W. (1998b) «An lntrinsic Limiting Procedure for Model Selection and Hypotheses

6.

7.

8.

Berger, 1.0. & Pericchi, L.R. (1998) On Criticism and Comparison of Default Bayes Factors for Model Selection and Hypothesis Testing. Proceedings of the Workshop on Model Selection, Ed. W. Racugno, Pitagora Ed., Bologna, pp. 1-50

12.

Box, G.E.P. & Tiao, G.C. (1973) Bayesian Inference in Statistical Analysis. Reading, M.A.: Addison-Wesley.

13.

Brown, M.B. & Forsythe, A.B. (l974a) The ANOVA and Multiple Comparisons for Data with Heterogeneous Variances. Biometrics, 30, 719-724.

14.

Brown, M.B. & Forsythe, A.B. (1974b) The Small Sample Behaviour of Sorne Statistics Which Test the Equality of Several Means. Technometrics, 16, 129-132.

298

Testing". Journal of the American Statistical Association, 93, 1451-1460. 32.

Rev.R.Acad.Cien.Exact.Fis.Nat. (Esp), 1998; 92

Problemas complejos de decisión: Elías Moreno et al.

Moreno, E., Bertolino, F. & Racugno, W. (1999) «Bayesian Model Selection Approach to Analysis of Variance under Heteroscedasticy», Technical Report, University of Granada.

33.

Moreno, E. & Liseo, B. (1999) «Default priors for testing the number of components of a mixture», Technical Report, University of Granada.

34.

Mehta, J.S. & Srinivasan, R. (1970) «On the Behrens-Fisher problem» Biometrika, 57, 649-655.

35.

O'Hagan, A. (1994) Bayesian Inference, KendaU's Advances Theory of Statistics (Vol. 2B), London: Edward Amold.

36.

O'Hagan, A. (1995) Fractional Bayes Factor for Model Comparison (with discussion). Journal of the Royal Statistical Society, Ser B, 57, 99-138.

40.

San Martini, A. & Spezzaferri, F. (1984) «Apredictive model selection criterion», Journal of the Royal Statistical Society, Ser. B, 46, 296-303.

41.

Sawa, T. (1987) «Information criteria for discriminating among altemative regression models», Econometrica. 46, 1273-1291.

42.

Schwarz, G. (1978) «Estimating the Dimension of a Model», Annals of Statistics. 6, 461-464.

43.

Shui, C. (1997) «Default Bayesian Analysis of Mixtures Modeis>" Thesis, Purdue University, Dept of Statistics.

44.

Tan, W.Y. & Tabatabai, M.A. (1986) Sorne Motecarlo Studies on the Comparison of Several Menas Under Heteroscedasticity and Robustness with Respect to Departure of Normality. Biomedical Journal. 28,801- 814.

45.

Wald, A. (1955) «Testing the difference between the means of two normal populations with unknown standard derivations», in Selected papers in Statistics and Probability, eds. T.W. An" derson et al., Stanford University Press, pp. 669-695.

37.

O'Hagan, A. (1997) «Properties of lntrinsic and Fractional Bayes Factors», Test, 1, 101-118.

38.

Pfanzagl, J. (1974) «On the Behrens-Fisher problem», Biometrika, 61, 39-47.

46.

Welch, B.L. (1947) «The generalization of Student's problem when several populations are involved», Biometrika, 34, 28-35.

39.

Smith, A.F.M. & Spiegelhalter, D.J. (1980) «Bayes factor and choice criteria for linear models», Journal of the Royal Statistical Society, Ser. B, 42, 213-220.

47.

Verdinelli, I. & Wasserman L. (1995) «Computing Bayes factor using a generalization of the Savage-Dickey density ratio», Journal of the American Statistical Association, 90, 614-618.