A new bivariate exponential distribution for modeling ... - Springer Link

10 downloads 0 Views 738KB Size Report
Oct 24, 2013 - Bivariate Affine-Linear Exponential distribution, to model moderately ... and Gupta (2008) introduce a generalized exponential distribution ...
Stat Methods Appl (2014) 23:123–148 DOI 10.1007/s10260-013-0246-3

A new bivariate exponential distribution for modeling moderately negative dependence Muhammad Mohsin · Hannes Kazianka · Jürgen Pilz · Albrecht Gebhardt

Accepted: 5 October 2013 / Published online: 24 October 2013 © Springer-Verlag Berlin Heidelberg 2013

Abstract This paper introduces a new bivariate exponential distribution, called the Bivariate Affine-Linear Exponential distribution, to model moderately negative dependent data. The construction and characteristics of the proposed bivariate distribution are presented along with estimation procedures for the model parameters based on maximum likelihood and objective Bayesian analysis. We derive Jeffreys prior and discuss its frequentist properties based on a simulation study and MCMC sampling techniques. A real data set of mercury concentration in largemouth bass from Florida lakes is used to illustrate the methodology. Keywords Bivariate exponential distribution · Copula · Jeffreys prior · Largemouth bass · Mercury concentration 1 Introduction Undoubtedly, exponential distributions are among the most frequently used distributions and functional models in many fields such as reliability, telecommunication, hydrology, medical sciences, environmental science etc. A large number of bivariM. Mohsin · H. Kazianka (B) · J. Pilz · A. Gebhardt Department of Statistics, Alpen-Adria University of Klagenfurt, 9020 Klagenfurt, Austria e-mail: [email protected] M. Mohsin e-mail: [email protected] J. Pilz e-mail: [email protected] A. Gebhardt e-mail: [email protected] M. Mohsin Department of Statistics, COMSATS Institute of Information Technology, Lahore, Pakistan

123

124

M. Mohsin et al.

ate exponential distributions has been proposed in the literature, e.g. Gumbel (1960), Freund (1961), Downton (1970), Arnold and Strauss (1988) and others, for a variety of practical problems. Marshall and Olkin (1967a,b) derive a well-known multivariate exponential distribution using shock models. In the bivariate case this structure could describe the lifetimes of two components operating in a random environment and subjected to fatal shock governed by a Poisson process. The distribution has both an absolutely continuous and a singular part and is the unique multivariate distribution having exponential marginals which exhibits a multivariate lack of memory property. Doss and Graham (1975) suggest a procedure of constructing a multivariate linear exponential distribution from pre-assigned univariate marginal distributions. Hougaard (1986) proposes a class of multivariate lifetime distributions which has become very popular as a frailty model. A substantial amount of literature on various bivariate and multivariate exponential models is given in Basu (1988). More recently, Iyer et al. (2002) derive a bivariate exponential distribution using independent auxiliary random variables which is useful in modeling dependence between the interarrival and service times in a queuing model during a failure process in multi-component systems. Gupta and Kundu (1999, 2001, 2002, 2007) and Kundu and Gupta (2008) introduce a generalized exponential distribution which is used effectively in analyzing many lifetime datasets. They also discuss different methods of parameter estimation and statistical inference for this generalized exponential distribution. Sarhan and Balakrishnan (2007) introduce a new bivariate distribution based on the generalized exponential distribution and discuss several interesting properties. This distribution also has an absolutely continuous as well as a singular component. Franco and Vivo (2010) and Franco et al. (2011) provide a multivariate extension of Sarhan and Balakrishnan’s (modified) bivariate distribution along with its ageing and dependence properties. Kundu and Gupta (2009) propose a bivariate distribution having generalized exponential marginals, discuss its properties and provide maximum likelihood estimators. Regoli (2009) proposes a new class of bivariate exponential distributions generated from quadratic forms of standard multivariate normal variates. However, little work has been done so far in modeling biological and environmental data using bivariate exponential distributions. In the present paper we propose a new bivariate exponential distribution, which we call the Bivariate Affine-Linear Exponential (BALE) distribution, and its application in environmental and ecological sciences. In particular, this new distribution is applied to model the effect of some chemicals from the contaminated water in the tissues of largemouth bass (fish). We study the dependence structure of minimum mercury (Hg) concentration and alkalinity (as CaC O3 ), calcium and chlorophyll and explore whether Hg concentration in the tissue of largemouth bass increases with the alkalinity, calcium and chlorophyll levels. Wiener (1987) argues that an increased mercury level in largemouth bass constitutes a direct threat to the health of humans, piscivorous mammals and other birds. Lange et al. (1993) demonstrate that mercury accumulation in the tissue of largemouth bass directly depends on the chemical behaviour of the lake. For this purpose they collected data from 53 Florida lakes to determine the relationship between mercury concentration in bass and physical and chemical lake characteristics. FAO/WHO (2003) report that high exposure to mercury possibly affects the central nervous system, kidney, liver and reproductive organs of the human body.

123

A new bivariate exponential distribution

125

It is not trivial to extend the univariate exponential distribution to the bivariate or the multivariate case, therefore, none of the models proposed in the literature so far proves to be universally applicable. One of the distinguishing features of the model we propose in this paper is that it is absolutely continuous and well-suited for modeling environmental data, survival data etc., whereas many other well known models such as Marshall and Olkin (1967a), Gupta and Kundu (1999), Sarhan and Balakrishnan (2007) etc. have a continuous part as well as a singular one. Moreover, its parameters can be easily estimated and it is easy to simulate from this distribution. Another interesting feature of our proposed distribution is that it has a closed form representation, which is not the case for the distribution proposed by Regoli (2009). Therefore, for our distribution all important quantities, e.g. product moments, copula function, conditional distribution functions, conditional densities and conditional moments can be given in analytically closed form. The new bivariate exponential distribution is rather flexible, shows interesting dependence properties and offers the potential for generalizations. The paper is organized as follows. Section 2 provides the construction of our novel distribution on the basis of the marginal distribution of X and the conditional distribution of Y given that X = x, along with its characteristics. Maximum likelihood estimation of the model parameters is discussed in Sect. 3. We derive the Jeffreys prior for conducting objective Bayesian analysis and show posterior propriety in Sect. 4. Moreover, we provide a simulation study to analyze the frequentist properties of the Bayes estimators when Jeffreys prior and the flat prior for the model parameters are used. Section 5 presents the application of the proposed distribution to a real data set including goodness-of-fit testing and frequentist as well as Bayesian parameter estimation. Finally, Sect. 6 summarizes our conclusions. 2 The Bivariate Affine-Linear Exponential (BALE) distribution and its characteristics A random variable X is said to have a univariate exponential distribution with parameter α if its probability density function (pdf) is f X (x) = α exp {−αx} , α > 0, x ≥ 0.

(1)

Further, suppose another random variable Y has an exponential distribution with parameter φ(x), where φ(x) is some function of the random variable X . Then the conditional probability density function of Y given φ(x) is f Y |X (y|φ(x)) = φ(x) exp {−φ(x)y} , φ(x) > 0, y ≥ 0.

(2)

The density of the compound distribution of Eq. 1 and Eq. 2 is given as f X Y (x, y) = αφ(x) exp{−αx − φ(x)y}, α, φ(x) > 0, x, y ≥ 0.

(3)

Depending upon the choice of φ(x) the density function Eq. 3 can be used to generate a whole variety of bivariate distributions. We choose φ(x) to be an affine-linear function

123

126

M. Mohsin et al.

of x, i.e. φ(x) = β + γ x. Plugging φ(x) in Eq. 3 we arrive at the following proposal for a new bivariate exponential distribution: f X Y (x, y) = α(β + γ x) exp{−αx − βy − γ x y}, α > 0, β, γ , x, y ≥ 0, β + γ > 0.

(4)

Since this distribution is obtained from the more general bivariate exponential distribution in Eq. 3 by letting φ(x) be an affine-linear function of x, we call it the Bivariate Affine-Linear Exponential (BALE) distribution. For γ = 0 the random variable Y becomes independent of X and is exponentially distributed with parameter β. 2.1 Marginals, CDFs and product moments To derive some mathematical properties of the new bivariate exponential distribution, we need the following result: Lemma 1 (Prudnikov et al. 1986, Vol. 1, Eq. (2.3.6.9), p. 324). If a, b ∈ R, s > 0 and | arg c| < π , ∞ 0

x a−1 exp(−sx) d x = (a)ca−b (a, a + 1 − b; cs), (c + x)b

where (.) is the Euler gamma function and (.) is Kummer’s (confluent hypergeometric) function of second kind which is given by (x, y; z) =

1 (x)

∞ exp(−zt)t x−1 (1 + t) y−x−1 dt. 0

The product moment of the distribution proposed in Eq. 4 is obtained as: ∞ ∞ E(X Y ) = α p

x p y q (β + γ x) exp{−αx − βy − γ x y}d yd x, p, q = 1, 2, . . .

q

0

0

α(q + 1) = γq

∞ 0

x ( p+1)−1 exp{−αx} d x.  q β + x γ

Using Lemma 1 and further simplifications we arrive at   αβ p−q+1 αβ p q . (q + 1)( p + 1) p + 1, p − q + 2; E(X Y ) = γ p+1 γ

(5)

From Eq. 4, the cumulative distribution function (cdf) of (X, Y )T is given as: FX Y (x, y) = 1 − exp{−αx} −

123

α exp{−βy}[1 − exp{−(α + γ y)x}] . (α + γ y)

(6)

A new bivariate exponential distribution

127

The marginal density of Y can be obtained from Eq. 4 by integration with respect to x, which leads to f Y (y) =

α exp{−βy}[γ + β(α + γ y)] . (α + γ y)2

(7)

The corresponding cumulative distribution function is given as FY (y) = 1 −

α exp{−βy} (α + γ y)

(8)

and the pth moment of Y reads ∞ E(Y ) = p

yp 0

α exp{−βy}[γ + β(α + γ y)] dy. (α + γ y)2

Since this equation can be rewritten as E(Y p ) =

α γ

∞ 0

y ( p+1)−1 exp{−βy} αβ dy +  2 γ α γ +y

∞ 0

y ( p+1)−1 exp{−βy}   dy, α + y γ

the application of Lemma 1 leads to the following expression: E(Y p ) = ( p + 1)

    p   αβ αβ αβ α  p + 1, p; +  p + 1, p + 1; . γ γ γ γ (9)

It is worth mentioning that the marginal distribution of Y does not constitute an exponential family of distributions (Lehmann and Casella 2003) unless γ = 0. This can be proved by exploiting (1986): Let D (n) =

a criterion due to Klauer T T (y1 , . . . , yn ) , y1 < y2 < . . . < yn and y1 = (y11 , . . . , yn1 ) , y2 = (y12 , . . . , yn2 )T ∈ c1 and c2 such that for all D (n) , then the condition “there n are positive constants n f Y (yi1 ) = c2 i=1 f Y (yi2 )” implies that α, β, γ ∈ R+ it holds that c1 i=1 c1

n

α exp {−βyi1 } (γ + β(α + γ yi1 ))(α + γ yi2 )2

i=1

= c2

n

α exp {−βyi2 } (γ + β(α + γ yi2 ))(α + γ yi1 )2 .

i=1

This equality must hold for all α, γ > 0, β ≥ 0. Thus, the zeros of the left-hand side and right-hand side of the above equation have to coincide which implies that yi1 = yi2 , i = 1, . . . , n. This means that the distribution distinguishes on D (n) (Barndorff-Nielsen 1978, p. 12) which implies that the distribution is not an expo-

123

128

M. Mohsin et al.

nential family. Similar calculations show that, despite we call the proposed bivariate distribution in Eq. 4 the BALE distribution, it does not constitute an exponential family of distributions. Next, we consider the conditional densities and moments. By construction (see Eq. 2), the conditional density of Y given X = x is a univariate exponential distribution, Y |X = x ∼ E x p(β+γ x). The conditional probability density function of X given Y = y reads f X |Y (x|Y = y) =

(β + γ x)(α + γ y)2 exp{−(α + γ y)x} . (γ + αβ + βγ y)

(10)

The pth conditional moment of X given Y = y is obtained as E(X p |Y = y) =

( p + 1)(αβ + (1 + p + βy)γ ) . (α + γ y) p (γ + αβ + βγ y)

(11)

2.2 Ageing properties For a possible application of the BALE distribution to modeling lifetimes, we investigate its ageing properties. Generally, notions of ageing compare conditional survival probabilities for residual lifetimes for different ages of some of the surviving individuals (Spizzichino 2001). In the following we show that the proposed distribution has the bivariate increasing failure rate property if γ ≤ α 2 and the bivariate decreasing hazard gradient property. These properties are all based on the survival function F X Y (x, y) = 1 − FX (x) − FY (y) + FX Y (x, y), which can be computed using Eqs. (6) and (8): F X Y (x, y) =

α exp {−αx − βy − γ x y} . α +γy

To prove the bivariate increasing failure rate property, we need to show that the ratio F X Y (x + t, y + t)/F X Y (x, y) is a non-increasing function of x, y for all t > 0. The partial derivatives of the ratio with respect to x and y, therefore, need to be non-positive: ∂ F X Y (x + t, y + t) = −γ t exp {−t (α + β + γ (t + x + y))} ∂x F X Y (x, y) ×(α + γ y)(α + γ (y + t))−1 , ∂ F X Y (x + t, y + t) = −γ t exp {−t (α + β + γ (t + x + y))} ∂y F X Y (x, y) ×((α + γ y)(α + γ (y + t)) − γ )(α + γ (y + t))−2 . While the derivative with respect to x is always non-positive, the derivative with respect to y is non-positive for all t > 0 and y ≥ 0 only if (α+γ y)(α+γ (y+t)) ≥ γ , which is the case as long as γ ≤ α 2 . If in addition γ > 0, the ratio F X Y (x +t, y +t)/F X Y (x, y) is strictly decreasing. In case that γ > α 2 , the distribution neither has the increasing nor the decreasing failure rate property.

123

A new bivariate exponential distribution

129

The bivariate hazard gradient (Johnson and Kotz 1975) is defined as  h X Y (x, y) = −

∂ ∂ , ∂x ∂y



T log F X Y (x, y) =

α +γy γ (α + γ y)−1 + β + γ x

 .

Since the components of the hazard gradient are non-increasing functions of the corresponding variables, the distribution has the bivariate decreasing hazard gradient property. In the area of stress-strength models, the life of a component with random strength Y is subjected to random stress X and fails whenever X > Y . Hence, R = P(X < Y ) is a measure of component reliability and has been worked out for certain bivariate exponential distributions by Nadarajah and Kotz (2006). The reliability in case that (X, Y )T follows the BALE distribution can be computed as ∞ ∞ f X Y (x, y)d yd x

R = P(X < Y ) = 0

x

√  

 α+β (α + β)2 πα 1− √ = √ exp , 4γ γ 2γ where (.) denotes the standard Gaussian distribution function. 2.3 Dependence properties and copula function From the results of the previous subsection it is straightforward to compute the covariance between X and Y , Cov(X, Y ) = −γ −1 (2, 1; η), where η = αβ γ . Moreover, the Pearson product moment correlation coefficient is Cor (X, Y ) = 

−(2, 1; η) 2((3, 2; η) + η(3, 3; η)) − ((2, 1; η) + η(2, 2; η))2

. (12)

From the positivity of (2, 1; .) it is clear that only negative linear correlations can be described by the BALE distribution. Interestingly, the linear correlation depends only on the single parameter η and takes its minimum value −0.3622 at η = 0.1920. This suggests that only relatively weak negative linear dependence can be modeled using this distributional assumption (see also Fig. 2). However, due to the non-linearity of X and Y , the explanatory power of the Pearson product moment correlation coefficient is limited and one should rely on rank correlation coefficients such as Spearman-ρ or Kendall-τ when drawing conclusions about the level of dependence for the proposed distribution. To examine the non-linear dependence properties of the distribution, we examine its copula (Nelsen 2006). The copula can be obtained by exploiting the relationship C X Y (u, v) = FX Y (FX−1 (u), FY−1 (v)), u, v ∈ [0, 1], which is known as

123

130

M. Mohsin et al.

Sklar’s Theorem. After computing the inverse of Eq. 8 and inserting it along with FX−1 (u) = −α −1 log(1 − u) in Eq. 6, we arrive at     η exp(η) 1 ηW 1−v C X Y (u, v) = u − (1 − u) − 1 (v − 1).

(13)

Here, W (.) is the product logarithm or Lambert W-function (Corless et al. 1996). The limiting cases of the copula as η → ∞ and η → 0 are C X Y (u, v) = uv, known as the independence copula, and C X Y (u, v) = u − ((1 − u)1/(1−v) − 1)(v − 1), respectively. The corresponding copula density is 1

c X Y (u, v) =

(1 − u) η

 W



η exp(η) 1−v

−1

 W

 η exp(η) 2 (η 1−v

   η2 1 + W η exp(η) 1−v

− log(1 − u))

.

It is evident from the copula that the entire dependence structure depends only on the parameter η = αβ γ . Furthermore, it is easy to see that the copula is not symmetric, i.e. c X Y (u, v) = c X Y (v, u), which means that high and low quantiles of the marginals do not have the same dependence properties. Figure 1 displays the copula density for different choices of the parameter η.

2 1.8

1.9 1.6

0.7

0.9

1.7

0.6

0.5

0.3

0.2

0.8

1.5

(b)

0.1

0.4

η=0.5 1.0

η=2 1.0

(a)

3.2

2.4

0.2

0.4

0.8

2

2.2

1.4

1.8

1.3

1.6

0.8

0.8

0.6 2.6

1.2

1.4

1.1

0.6

0.6

1.2

v

0.9 1

1.1

0.8

v

1

1.2

0.4

0.4

1

1.2

0.8

1.3

0.6

1.4

0.2

2 2.2

1.5

0.2

0.4

0.6

0.8

1.0

2.4

0.0 0.0

0.4

0.0

1.7

0.7

1.8

1.8

1.6

1.4

0.2

1

1.6

0.0

0.2

0.4

u

3.5

0.8

1.0

4

η=0.01

3

2 7

3

6

5

(d)

0.5

6

1.0

η=0.2 1.0

(c)

0.6

u

4

2.5

0.8

0.8

2

0.6

v 0.4

2

0.2

1

0.0

0.0

3

2.5

0.5

3

2

1

0.2

1.5

1

1

0.4

v

0.6

1.5

0.0

0.2

0.4

0.6

0.8

1.0

u Fig. 1 Copula densities for different choices of η = αβ/γ

123

0.0

0.2

0.4

0.6

u

0.8

1.0

A new bivariate exponential distribution

131

Inspection of the copula function also reveals that X is stochastically decreasing in Y and Y is stochastically decreasing in X . The property of X being stochastically decreasing in Y means that P(X > x | Y = y) is a nonincreasing function of y for all x. Lehmann (1966) calls this property negative regression dependence. Stochastic monotonicity can be proved (Nelsen 2006, p. 196) by showing that    η exp(η) 1 η exp(η) −1 ηW 1−v (1 − u) 1−v ⎞  ⎛ η exp(η)   log(1 − u)W η exp(η) 1 1−v ∂C X Y (u, v) W 1−v ⎝1 −  ⎠  = 1 − (1 − u) η η exp(η) ∂v η + ηW 1−v

1 ∂C X Y (u, v) = 1 − (1 − v) W ∂u η



are nondecreasing functions of u (for all fixed v) and v (for all fixed u), respectively. This is a consequence of η−1 W (η exp(η)/(1 − v)) ≥ 1 for all v ∈ [0, 1]. In addition, from Eq. (3) we find that the BALE distribution satisfies an even stronger dependence property, namely negative likelihood ratio dependence (Nelsen 2006, p. 200). For all x, y, x , y ≥ 0 such that x ≤ x and y ≤ y the determining inequality holds: f X Y (x, y) f X Y (x , y ) ≤ f X Y (x, y ) f X Y (x , y). Imputing the joint density Eq. (3) into the preceding inequality yields after simplification exp −γ (x − x)(y − y) ≤ 1 which is true since γ ≥ 0, x ≤ x and y ≤ y . Negative likelihood ratio dependence implies that X and Y are left corner set increasing and right corner set decreasing (Nelsen 2006, p. 198). The copula exhibits zero lower (λ L ) and upper (λU ) tail dependence (Nelsen 2006, p. 214). This can be verified by applying L’Hopital’s rule and the identity η−1 W (η exp(η)) = 1: 1

 W

η exp(η)



1−t C X Y (t, t) (1 − t) η −1 λ L = lim = 1 + lim t→0 t→0 t t 1 = 1 − W (η exp(η)) = 0, η   η exp(η) 1 1 − C X Y (t, t) ηW 1−t λU = 2 − lim = lim (1 − t) = 0. t→1 t→1 1−t

This property suggests that the BALE distribution cannot be recommended for modeling extreme phenomena since regardless of the parameter η, if we go far enough into the tail of the distribution, extreme events appear to occur independently in the marginals (Embrechts et al. 2002). To investigate the range of negative dependence described by the proposed distribution, we compute the Spearman-ρ and the Kendall-τ rank correlation coefficients. These two measures of association solely depend on the copula but not on the marginal distribution as the Pearson product moment correlation coefficient does. Since it is not possible to derive closed-form expressions for ρ and τ , computations are performed by exploiting the following relationships numerically:

123

132

M. Mohsin et al. 0.0

2

4

6

8

10

-0.1 -0.2 Kendall tau -0.3 Spearman rho

-0.4 -0.5

Correlation -0.6 -0.7

Fig. 2 Kendall-τ , Spearman-ρ and Pearson product moment correlation coefficient as functions of η = αβ/γ

1 

1

ρ = 12 0

1  τ =4 0

C X Y (u, v)dudv − 3,

0 1

C X Y (u, v)c X Y (u, v)dudv − 1.

0

Figure 2 shows the rank correlation coefficients together with the Pearson product moment correlation coefficient as a function of η, in this way displaying the behaviour of dependence between X and Y as η varies. Note the non-monotonic relationship between η and the Pearson product moment correlation coefficient, which can be explained by the non-linearity assumed for X and Y in the distribution. The ranges of ρ and τ are ρ ∈ [−9 + log(4096), 0] and τ ∈ [−1/2, 0], respectively. We conclude that the proposed BALE distribution is able to describe weakly to moderately negative non-symmetric dependence. All joint or marginal distribution and density functions, the corresponding moments, the copula and its density have comparatively simple analytic expressions that can be easily employed in applications. Due to its non-trivial properties it should always be subject to thorough examination, e.g. via hypothesis testing and visual inspection, whether or not the distribution can be applied. 3 Maximum likelihood estimation of parameters The likelihood function for the bivariate distribution proposed in Eq. (4) on the basis of n independent pairs of observed data D = {(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )} takes the form   n n n n    n (β + γ xi ) exp −α xi − β yi − γ xi yi . (14) L(α, β, γ ; D) = α i=1

123

i=1

i=1

i=1

A new bivariate exponential distribution

133

ˆ γˆ )T we compute the correspondTo find the maximum likelihood estimates (α, ˆ β, ing estimating equations by taking partial derivatives of the log-likelihood function Eq. (14) with respect to (α, β, γ )T and setting them to zero: n  ∂ log L (α, β, γ ; D) = − xi = 0, ∂α α n

(15a)

i=1

 1 ∂ log L (α, β, γ ; D)  yi = 0, = − ∂β (β + γ xi )

(15b)

∂ log L (α, β, γ ; D) = ∂γ

(15c)

n

n

i=1 n 

i=1 n 

i=1

xi − (β + γ xi )

xi yi = 0.

i=1

The maximum n likelihood estimate for α can easily be obtained by solving Eq. (15a) xi . It is the usual ML-estimate of α for the exponential distribution, as αˆ = n/ i=1 as discussed by Johnson et al. (1994). An unbiased estimate for α, which we use in the remainder of the paper, is n−1 . αˆ = n i=1 x i

(16)

Solving the non-linear equations Eqs. (15b) and (15c) numerically, we get estimated values of β and γ . One way of doing this is to use an off-the-shelf non-linear equation solving routine like fsolve in Matlab, another possibility is to employ the Fisher information matrix derived in Eq. (17) and to use Fisher’s method of scoring. Starting with (β0 , γ0 )T , at step t ≥ 1 of the algorithm the update is (βt , γt ) = (βt−1 , γt−1 ) + λI (β,γ ) (βt−1 , γt−1 ) T

T

−1

 ∂ log L (α, β, γ ; D)   β=βt−1 , ∂(β, γ ) γ =γt−1

where I (β,γ ) is the submatrix of the Fisher information matrix corresponding to the parameters (β, γ )T and λ > 0 is the step size that needs to be selected in advance. Whenever an iterate happens to fall outside the parameter space, which may happen especially in the first few iterations, we simply project it onto the paramter space. As an alternative to solving the non-linear equations, an optimization algorithm, e.g. the BFGS-algorithm, can be used to maximize the likelihood function. The corresponding asymptotic standard errors of the maximum likelihood estimates are √ ˆ n, se ˆ α = α/   ˆ  2βˆ γˆ (3, 2; αˆγˆβ )  se ˆβ = ˆ ˆ ˆ n α(2(3, ˆ 2; αˆγˆβ )(1, 0; αˆγˆβ ) − (2, 1; αˆγˆβ )2 )

123

0.15

α =1 β =1 γ =1

0.00

0.00

0.10

relative bias

α=3.5 β=0.02 γ=0.1

0.05

0.12

(b)

0.04

(a)

0.08

M. Mohsin et al.

relative bias

134

100

200

300

400

500

100

n

200

300

400

500

n

Fig. 3 Relative bias of αˆ (red lines), βˆ (blue lines) and γˆ (green lines). The dashed line shows the function f (n) = 2/n for visual comparison. (Color figure online)

and     se ˆγ =

ˆ

γˆ 3 (1, 0; αˆγˆβ ) ˆ

ˆ

ˆ

ˆ n αˆ β(2(3, 2; αˆγˆβ )(1, 0; αˆγˆβ ) − (2, 1; αˆγˆβ )2 )

.

The expressions can easily be verified by inverting the Fisher information matrix, see Eq. (17). In the context of maximum likelihood estimation we also study the bias of the estimates. To shed some light on the bias of βˆ and γˆ we simulate 10,000 draws from the corresponding distribution, D1 , . . . , D10000 , for two parameter constellations and different sample sizes. For simulation of n independent draws from the BALE distribution, we use a two-step approach. For i = 1, . . . , n we perform the following steps: 1. Draw xi ∼ Exp(α). 2. Draw yi ∼ Exp(β + γ xi ). For every simulated dataset we compute the maximum likelihood estimates of the parameters by solving the non-linear estimating equations Eqs. (15b)–(15c) numerically. The resulting relative biases in case of (α, β, γ )T = (3.5, 0.02, 0.1)T (corresponds to the parameter values that arise in our case study—see the estimates in Table 3) and (α, β, γ )T = (1, 1, 1)T are shown in Fig. 3a, b. It is visible that the biases of βˆ and γˆ are roughly of the order O(n −1 ), which is in accordance with the theoretical results about the bias of maximum likelihood estimates. The bias of αˆ in Eq. (16) is zero, which also follows the theory. In applications, we suggest to use a jackknifed maximum likelihood estimator (Miller 1974) if there is the need to reduce the order of the bias, e.g. when the sample size is low.

123

A new bivariate exponential distribution

135

4 Jeffreys rule prior and objective Bayesian analysis Bayesian analysis combines a-priori information about the model parameters with the sample information for conducting inference about the parameters. A-priori information is provided in terms of a prior distribution, while the sample information is contained in the likelihood function. Applying Bayes’ theorem then yields that the a-posteriori distribution of the model parameters is proportional to the likelihood, L(α, β, γ ; D), times the prior, π(α, β, γ ). Again, D denotes the observed data D = {(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )}. The main advantage of Bayesian inference is that there is a complete posterior distribution of the model parameters available instead of only a point estimate and (asymptotic) standard errors and that prior information about the parameters can be incorporated into the model. However, subjective specification of a prior distribution appears to be difficult in situations where there is a lack of prior knowledge. Therefore, objective Bayesian data analysis is concerned with the elicitation of priors in a noninformative and automatic way without requiring any subjective steps or tuning to the problem at hand. The objective Bayesian methodology is used in many branches of statistics, the most widely used noninformative (or default) prior is the so-called Jeffreys rule prior, see Jeffreys (1961). The Jeffreys rule prior, π J (α, β, γ ), is based upon the notion that probability statements made about the observable random variables should remain invariant under changes in the parameterization. This property is satisfied because Jeffreys prior is proportional to the square root of the determinant of the Fisher information matrix, I (α,β,γ ) . Theorem 1 states that the Jeffreys rule prior for the proposed bivariate distribution for Eq. (4) has a very convenient form and that the resulting posterior distribution is proper. Note that the parameters α and (β, γ ) turn out to be orthogonal. Moreover, the Jeffreys prior equals the reference prior (Berger et al. 2009) with ordering {α, {β, γ }}. Interestingly, also the rather ad-hoc choices of a flat prior for all parameters, π U1 (α, β, γ ) ∝ 1, and a flat prior for (β, γ )T , π U2 (α, β, γ ) ∝ 1/α, lead to proper posteriors as can be readily obtained when integrating Eq. (14) with respect to the parameters. Theorem 1 The Jeffreys prior corresponding to the model Eq. (14) is  π (α, β, γ ) ∝ γ J

−2

      αβ αβ 2 αβ  1, 0; −  2, 1; 2 3, 2; . γ γ γ

Moreover, the posterior distribution p(α, β, γ |D) ∝ L(α, β, γ ; D)π J (α, β, γ ) is proper. Proof It is straightforward to verify that the Fisher information matrix has the following block-diagonal form ⎛

n α2

⎜ 0 I (α,β,γ ) (α, β, γ ) = ⎜ ⎝ 0

0 nα βγ (1, 0; nα (2, 1; γ2

0

αβ γ ) αβ γ )



⎟ nα ⎟ (2, 1; αβ γ ) ⎠. γ2 2nαβ αβ (3, 2; γ ) γ3

(17)

123

136

M. Mohsin et al.

By Eq. (14) the log-likelihood function is given by

log(L(α, β, γ ; D)) = n log(α) +

n 

ln(β +γ xi ) − α

i=1

n 

xi − β

i=1

n 

yi − γ

i=1

n 

xi yi

i=1

and from Lehmann (1966) we know that the entries of the Fisher information matrix can be computed as the negative expected values of the second derivatives of the log-likelihood with respect to the corresponding parameters: 



n  n = 2, 2 α α    2 n   ∂ log(L(α, β, γ ; D)) = E (β + γ xi )−2 −E 2 ∂β

−E

∂ 2 log(L(α, β, γ ; D)) ∂α 2

=E

i=1

∞ = nα  −E

 ∂ 2 log(L(α, β, γ ; D)) ∂γ 2

0

=

n 

  E xi2 (β + γ xi )−2

i=1

∞ = nα  −E

  αβ exp {−αx} nα  1, 0; , dx = (β + γ x)2 βγ γ

∂ 2 log(L(α, β, γ ; D)) ∂β∂γ

0

 =

n 

  x 2 exp {−αx} 2nαβ αβ , dx =  3, 2; (β + γ x)2 γ3 γ

  E xi (β + γ xi )−2

i=1

∞ = nα 0

  x exp {−αx} nα αβ . d x = 2  2, 1; (β + γ x)2 γ γ

Applying basic rules to compute the determinant yields the desired expression. ∞ Now we prove the propriety of the posterior distribution, i.e. 0 < 0 L(α, β, γ ; D)π J (α, β, γ )dαdβdγ < ∞. It is more convenient to change the parameterization to (α, β, 1/γ )T and to apply the Hadamard inequality (Maz’ya and Shaposhnikova 1998, p. 385) before evaluating the asymptotics of the integrand. Thus, we need to show that ∞ 0< 0



123

 n   n   xi xi yi β+ exp − α αxi + βyi + γ γ n

i=1

i=1

(3, 2; αβγ )(1, 0; αβγ )dαdβdγ < ∞.

A new bivariate exponential distribution

137

It holds that (3, 2; z) ∈ O(z −3 ) and (1, 0; z) ∈ O(z −1 ) as z → ∞. Moreover, (3, 2; z) ∈ O(z −1 ) and (1, 0; z) ∈ O(1) as z → 0. Since z −2 is integrable as z → ∞ and z −0.5 is integrable as z → 0, the posterior is proper. For conducting Bayesian inference about the model parameters (α, β, γ )T we need to draw samples from the posterior distribution. We use a Metropolis-Hastings algorithm with a Gaussian proposal distribution centered at the previous sample and covariance matrix proportional to the inverse of the Fisher information matrix as suggested by Gelman et al. (2003). For a given starting sample (α0 , β0 , γ0 )T , we do the following steps for t = 1, . . . , N , where N is sufficiently large: ˜ γ˜ )T ∼ N ((log(αt−1 ), log(βt−1 ), log(γt−1 ))T , c2 V ), and set (α ∗ , β ∗ , 1. Draw (α, ˜ β, ∗ T ˜ exp(γ˜ ))T . ˜ exp(β), γ ) = (exp(α), T ∗ 2. Set (αt , βt , γt ) = (α , β ∗ , γ ∗ )T with probability ρ, where

ρ = min 1,

 L (α ∗ , β ∗ , γ ∗ ; D) π J (α ∗ , β ∗ , γ ∗ )α ∗ β ∗ γ ∗ , L (αt−1 , βt−1 , γt−1 ; D) π J (αt−1 , βt−1 , γt−1 )αt−1 βt−1 γt−1

and (αt , βt , γt )T = (αt−1 , βt−1 , γt−1 )T with probability 1 − ρ. The scale matrix V of the Gaussian proposal distribution is taken as the Fisher information matrix of the transformed parameters (log α, log β, log γ )T that arises when the maximum likelihood estimates are inserted. This approach, which corresponds to a Gaussian approximation around the posterior mode, has been used in various applications and can even be combined with a Langevin-Hastings update (Waagepetersen et al. 2008). ˆ exp(γˆ )) J, ˆ exp(β), V −1 = J I (α,β,γ ) (exp(α), ˆ exp(γˆ )). J = diag(exp(α), ˆ exp(β), The quantity c2 is the only parameter that needs to be tuned to the actual problem and is chosen in such a way that the average acceptance probability in the resulting Metropolis-Hastings algorithm lies between 0.2 and 0.45. Gelman et al. (2003) rec√ ommend to use c = 2.4/ 3 as a starting point which would be the optimal choice if the conditional posterior was truly Gaussian. Jeffreys priors are known to have deficiencies in certain multiparameter models, see e.g. Berger et al. (2001). Therefore, we investigate the appropriateness of the Jeffreys prior for our likelihood Eq. (14). It is widely accepted in the literature (Kass and Wasserman 1996) to examine frequentist properties of resulting Bayesian inferences when judging prior distributions. Therefore, we compute the frequentist coverage of equal-tailed Bayesian credible intervals for the Jeffreys prior. For every parameter constellation and every sample size (n = 25, n = 50 and n = 100) that are considered, we simulate 3,000 draws from the corresponding distribution, D1 , . . . , D3000 , and use the MCMC approach outlined above for simulating from the posterior. Each Markov chain is run for N = 15500 iterations, the first 500 being discarded as burn-in. The average acceptance rate in the Metropolis-Hastings step is about 0.3 in all cases.

123

138

M. Mohsin et al.

Table 1 Frequentist coverage and average lengths (in brackets) of the 5, 50, 75 and 95 % Bayesian credible intervals derived using the Jeffreys prior and flat prior for different sample sizes and different choices of the model parameters: π J /π U1 α

β

γ

0.049 [0.090] / 0.056 [0.093]

0.052 [0.001] / 0.047 [0.001]

0.045 [0.007] / 0.050 [0.006]

0.501 [0.968] / 0.486 [0.994]

0.527 [0.013] / 0.513 [0.013]

0.511 [0.069] / 0.536 [0.068]

0.759 [1.658] / 0.723 [1.701]

0.758 [0.021] / 0.771 [0.022]

0.771 [0.118] / 0.764 [0.117]

0.951 [2.820] / 0.943 [2.898]

0.956 [0.036] / 0.960 [0.036]

0.964 [0.193] / 0.968 [0.192]

0.049 [0.063] / 0.050 [0.064]

0.058 [0.001] / 0.050 [0.001]

0.049 [0.005] / 0.051 [0.005]

0.491 [0.676] / 0.506 [0.685]

0.526 [0.009] / 0.504 [0.009]

0.513 [0.049] / 0.508 [0.048]

0.762 [1.160] / 0.743 [0.170]

0.746 [0.015] / 0.753 [0.015]

0.748 [0.084] / 0.754 [0.083]

0.954 [1.968] / 0.942 [1.995]

0.948 [0.026] / 0.953 [0.026]

0.958 [0.140] / 0.951 [0.139]

0.052 [0.044] / 0.051 [0.045]

0.048 [0.001] / 0.059 [0.001]

0.051 [0.003] / 0.049 [0.003]

0.508 [0.474] / 0.494 [0.478]

0.495 [0.006] / 0.499 [0.006]

0.503 [0.034] / 0.515 [0.034]

0.752 [0.809] / 0.745 [0.814]

0.762 [0.011] / 0.751 [0.011]

0.749 [0.058] / 0.753 [0.058]

0.950 [1.380] / 0.946 [1.391]

0.941 [0.018] / 0.954 [0.018]

0.950 [0.099] / 0.946 [0.099]

0.959 [0.545] / 0.954 [0.570]

0.962 [2.694] / 0.962 [2.683]

α = 3.5, β = 0.02, γ = 0.1 n = 25

n = 50

n = 100

α = 2, β = 0.2, γ = 2 n = 25

0.952 [1.631] / 0.947 [1.666]

n = 50

0.955 [1.128] / 0.950 [1.138]

0.949 [0.384] / 0.944 [0.385]

0.942 [1.893] / 0.953 [1.876]

n = 100

0.946 [0.790] / 0.945 [0.794]

0.943 [0.267] / 0.958 [0.266]

0.944 [1.328] / 0.948 [1.310]

n = 25

0.948 [0.400] / 0.940 [0.413]

0.938 [5.526] / 0.964 [5.690]

0.972 [2.575] / 0.962 [2.654]

n = 50

0.954 [0.280] / 0.950 [0.285]

0.948 [3.986] / 0.966 [4.060]

0.973 [1.744] / 0.967 [1.801]

n = 100

0.950 [0.197] / 0.945 [0.198]

0.962 [2.919] / 0.967 [2.943]

0.971 [1.266] / 0.965 [1.298]

n = 25

0.951 [0.809] / 0.949 [0.829]

0.959 [1.587] / 0.966 [1.605]

0.963 [2.229] / 0.962 [2.226]

n = 50

0.956 [0.559] / 0.935 [0.569]

0.951 [1.167] / 0.957 [1.158]

0.954 [1.614] / 0.955 [1.602]

n = 100

0.947 [0.395] / 0.945 [0.397]

0.953 [0.835] / 0.952 [0.828]

0.946 [1.150] / 0.949 [1.149]

α = 0.5, β = 5, γ = 0.5

α = 1, β = 1, γ = 1

For every 3000 parameter of interest the frequentist coverage is computed, e.g. 1(L(Di )α < α˜ < U (Di )α ), where U (Di )α and L(Di )α denote the 1/3000 i=1 upper and lower bounds of the Bayesian credible interval for α, respectively, and α˜ is the true parameter value. It is desirable to have a frequentist coverage close to the nominal level, i.e. in case a confidence level of 95 % is used, the frequentist coverage should also be 95 %, ideally. The standard error of this approximation to the coverage probability is roughly 0.004 (95 and 5 % credible interval), 0.008 (75 % credible interval) and 0.009 (50 % credible interval), respectively, for all the parameters. In addition 3000 we compute and report the average length of the credible interval, e.g. (U (Di )α − L(Di )α ). 1/3000 i=1 Table 1 presents the frequentist coverage and average lengths of the 5, 50, 75 and 95 % Bayesian credible intervals for the Jeffreys prior and the flat prior. Since the difference in performance across the different confidence levels is similar for both

123

A new bivariate exponential distribution

139

priors, we report the results for 5, 50 and 75 % only for the first scenario. Generally, the first scenario is chosen to cover the parameter constellation that arises in our case study (see the estimates in Table 3), the remaining scenarios in contrast consider completely different parameter sets to cover a broader range of possible values for the three parameters of interest. It is found that both the Jeffreys prior and the flat prior perform well in terms of frequentist coverage, i.e. the coverage percentages are typically close to the nominal level. As expected, the confidence bounds get tighter when the sample size is increased. The simulation study slightly favors the Jeffreys prior because it leads more often to tighter confidence bounds and better coverage percentages, especially for the parameters α and β. Since there is no indication that the Jeffreys prior has any adverse properties for this multiparameter model and is rather easy to compute, it can be recommended for use without restriction. 5 Application: mercury concentration in largemouth bass In this section we provide an application of the new BALE distribution to a dataset used by Lange et al. (1993) to explore the mercury (Hg) concentration in largemouth bass. It can be downloaded from http://wiki.stat.ucla.edu/socr/index.php/NISER_081107_ ID_Data. The data collected from 53 different Florida lakes were used to examine the factors that influence the level of mercury concentration in bass. Water samples were collected from the surface of the middle of each lake in August 1990 and then again in March 1991. Amongst others, the amount of alkalinity (mg/l), calcium (mg/l) and chlorophyll (mg/l) were measured in each sample. The average values of the August and March samples were used in the analysis. Next, a sample of fish was taken from each lake with sample sizes ranging from 4 to 44 fish. The minimum mercury concentration (µg/g) among the sampled fish were measured as well. Lange et al. (1993) observed that the bioaccumulation of mercury in the largemouth bass was strongly influenced by the chemical characteristics of the lakes. Therefore, we select three chemical substances, i.e. alkalinity (as CaC O3 ), calcium and chlorophyll, along with minimum mercury concentration in the sampled fish. We use the proposed BALE distribution and estimate the model parameters by maximum likelihood- and Bayesian estimation. 5.1 Copula test For assessing the adequacy of the dependence structure implied by the proposed BALE distribution, we perform a bootstrap-based goodness-of-fit test. We use a blanket test that was originally presented (Genest and Remillard 2008) and validated (Genest et al. 2008) for arbitrary copulas. The test relies on a parametric bootstrapping procedure and employs the Kolmogorov-Smirnov statistic, Tn , or the Cramer-von Mises statistic, Sn .  Sn =

Cn (x, y)2 dCn (x, y) , Tn = [0,1]2

sup (x,y)∈[0,1]2

|Cn (x, y)| ,

123

140

M. Mohsin et al.

√ where Cn = n (Cn − C X Y ), Cn is the empirical copula calculated using the n data points and C X Y is the estimation under the null hypothesis (see Eq. (13)). The steps of the algorithm are as follows: 1. Compute the empirical copula Cn . 2. Estimate the parameters of the theoretical copula using the maximum likelihood approach as outlined in Sect. 3. 3. Calculate the Cramer-von Mises or the Kolmogorov-Smirnov statistic, Tn or Sn , respectively. 4. For a large integer N , repeat the following steps for every k ∈ {1, . . . , N }: (a) Simulate a dataset whose copula is exactly the estimated theoretical copula from step 2 (see Sect. 3). (b) Compute the empirical copula corresponding to the simulated data, Cn,k . (c) Estimate the parameters of the theoretical copula corresponding to the simulated data, C X Y,k . (d) Evaluate Tn,k or Sn,k . 5. In the case of the Kolmogorov-Smirnov statistic an approximate p-value for the test is given by N ! 1  I Tn,k > Tn , p= N k=1

where I (.) is an indicator function. To get an approximate p-value in the case of the Cramer-von Mises statistic, just replace Tn,k and Tn by Sn,k and Sn , respectively. When visually inspecting the data, it becomes apparent that there is one observation, data point 40 of 53, which is probably an outlier (Figs. 4 and 5). As can be seen in Fig. 5a, b, this data point (marked with a cross) has large minimum mercury concentration but also very large values of the chemical substances alkalinity and calcium. Therefore, we test the goodness-of-fit with and without considering observation 40. The p-values of the Kolmogorov-Smirnov and Cramer-von Mises test are presented in Table 2. From these results we obtain that the Cramer-von Mises test rejects the null hypothesis of (X Mercury , YAlkalinity ) and (X Mercury , YCalcium ) having copula Eq. (13) at the 5 % confidence level when (x40 , y40 ) is included. The same holds true for the Kolmorov-Smirnov test and (X Mercury , YAlkalinity ). If (x40 , y40 ) is excluded from the Table 2 p-Values of the copula test for the three pairs of variables (X Mercury , YAlkalinity ), (X Mercury , YCalcium ) and (X Mercury , YChlorophyll )

Alkalinity

Calcium

Chlorophyll

Results including (x40 , y40 ) Sn

0.04

0.05

0.36

Tn

0.04

0.16

0.47

Results excluding (x40 , y40 )

123

Sn

0.09

0.45

0.37

Tn

0.19

0.62

0.54

4 3

2.5 2

1.5

3

2.5

2

1.5

1

1

3.5

Chlorophyll

1.5

0.0 0.2 0.4 0.6 0.8 1.0

2

1.5

0.5

5.5 3.5

1

2.5

2

1

3

3

0.5

1

1

3.5

(c)

0.5

4 2.5

0.5

0.5

1.5

Calcium

1.5

5.5 3.5

2

(b)

0.5

2.5

2.5

2

3

4

3.5

3.5

4.5

141 0.0 0.2 0.4 0.6 0.8 1.0

6

8.5

3

Alkalinity

(a)

0.0 0.2 0.4 0.6 0.8 1.0

A new bivariate exponential distribution

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

min Mercury

min Mercury

min Mercury

(c)

50

Chlorophyll

60 40 20

Calcium

100

100

150

100

150

Alkalinity

(b)

80

(a)

50

Fig. 4 Estimated copula densities along with the rank transformed observations. Sample point 40 is marked with a cross

0.02

0.045

0.055

0.04

0.065

0.055

0.06

0.045

0.035

0.03

0.01 0.025

0.02

0.07

0.005

0.015

0.085

0

0

0.05

0.075 0.1

0.04

0.06

0.07

0.05

0.005

0.01

0.08

0.06

0.03

0.05

0.035

0.025

0.02

0.015

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

min Mercury

min Mercury

0.07

0

0.04

0.065

0.055

0.045

0.035

0.030.025

0.015 0.01

0.005

0.09

0.0 0.2 0.4 0.6 0.8 1.0 min Mercury

Fig. 5 Estimated densities along with the observations. Sample point 40 is marked with a cross

sample, all tests indicate the acceptance of the null hypothesis at the 5 % significance level. Although the p-values are rather low in case of (X Mercury , YAlkalinity ) they are still acceptable for our needs. The estimated copula densities and the rank transformed observations are shown in Fig. 4. Data point (x40 , y40 ) is marked with a cross. Based on the results of the goodness-of-fit test and the visual inspections it is decided to leave out data point 40 for the remaining analyses and when drawing conclusions. 5.2 ML-estimation of model parameters and Bayesian analysis Parameter point estimation is performed by the maximum likelihood approach as outlined in Sect. 3. To account for the bias of βˆ and γˆ we additionally compute the first order jackknife estimates. The ML point estimates and jackknife estimates, the estimated asymptotic standard deviations and the 95 % two-sided confidence intervals of the model parameters (based on the normal or log-normal approximation) are given in Table 3. The results are obtained by leaving out data point 40, which otherwise would influence the actual values considerably. For a parameter θ the asymptotic confidence bounds are constructed by using the normal distribution or the log-normal ˆ θ and θˆ exp{±q(1−ξ )/2 se ˆ θ /θˆ }, respectively, distribution and are given as θˆ ± q(1−ξ )/2 se where q(1−ξ )/2 is the (1 − ξ )/2 quantile of the standard normal distribution. The lognormal can be more appropriate when the parameter is restricted to be positive. It is used

123

123

0.0019

0.1924

β

γ

0.1871

0.0021

3.6578

0.0101

0.2499

β

γ

0.2390

0.0100

3.6578

3.6335

0.0092

0.2251

α

β

γ

0.2252

0.0077

3.6578

X = minimum Mercury, Y = Chlorophyll

3.6335

α

X = minimum Mercury, Y = Calcium

3.6335

α

0.0483

0.0046

0.5040

0.0534

0.0051

0.5040

0.0328

0.0017

0.5040

(0.1305, 0.3196)

(0.0002, 0.0183)

(2.6452, 4.6198)

(0.1450, 0.3550)

(0.0038, 0.0270)

(2.6452, 4.6198)

(0.1282, 0.2566)

(0.0004, 0.0103)

(2.6452, 4.6198)

95 % conf. int.

0.1806 0.1875

π U1 πJ

0.2472 0.2589

πJ π U1 πJ

3.7228 0.0112 0.0079 0.2227 0.2340

πJ π U1 πJ π U1 πJ

3.7779

0.0087

π U1

π U1

3.7193 0.0123

πJ

3.7750

0.0026

πJ

π U1

3.7238 0.0046

π U1

3.7755

πJ

π U1

Prior

0.2339

0.2222

0.0069

0.0105

3.6992

3.7474

0.2592

0.2461

0.0076

0.0115

3.6959

3.7539

0.1874

0.1801

0.0017

0.0040

3.6994

3.7568

Median

Mean

SD

ML est.

Jackknife

Bayesian inference

Maximum likelihood

X = minimum Mercury, Y = Alkalinity

Para.

Table 3 Summary statistics corresponding to maximum likelihood and Bayesian inference for α, β and γ

0.0531

0.0515

0.0064

0.0064

0.5130

0.5155

0.0579

0.0571

0.0070

0.0071

0.5167

0.5142

0.0332

0.0346

0.0028

0.0032

0.5148

0.5217

SD

(0.1317, 0.3383)

(0.1238, 0.3274)

(0.0000, 0.0229)

(0.0011, 0.0257)

(2.7858, 4.7855)

(2.8431, 4.8509)

(0.1457, 0.3713)

(0.1397, 0.3618)

(0.0000, 0.0248)

(0.0012, 0.0281)

(2.7760, 4.7992)

(2.8270, 4.8316)

(0.1225, 0.2536)

(0.1150, 0.2502)

(0.0000, 0.0099)

(0.0003, 0.0122)

(2.7900, 4.7982)

(2.8152, 4.8563)

95 % cred. int.

142 M. Mohsin et al.

0.0 0.2 0.4 0.6 0.8 1.0

Expected Calcium

0.0 0.2 0.4 0.6 0.8 1.0

(c)

Observed Chlorophyll

0.0 0.2 0.4 0.6 0.8 1.0

Expected Alkalinity

0.0 0.2 0.4 0.6 0.8 1.0

(b)

143

Observed Calcium

0.0 0.2 0.4 0.6 0.8 1.0

Expected min Mercury

0.0 0.2 0.4 0.6 0.8 1.0

(a)

Observed Alkalinity

0.0 0.2 0.4 0.6 0.8 1.0

Observed min Mercury

A new bivariate exponential distribution

(d)

0.0 0.2 0.4 0.6 0.8 1.0

Expected Chlorophyll

Fig. 6 Probability plots for minimum mercury concentration, alkalinity, calcium and chlorophyll

when the standard deviation of the respective parameter is larger than half of its point estimate. As can be seen from Table 3, the 95 % confidence interval for γ is away from zero for any of the three pairs of variables. At least on the basis of the asymptotic distribution of the parameters it can therefore be concluded that the minimum mercury concentration is not independent from alkalinity, calcium and chlorophyll. Figure 4a–c, which show the estimated pdfs along with the corresponding sample points as black circles, underpin this conclusion and provide visual evidence that the model fits well to the data. Using the ML estimates from Table 3 and inserting them into Eq. (9), the estimated means of alkalinity, calcium and ˆ Alkalinit y ) = 54.17, chlorophyll (by leaving out data point 40) are derived as E(Y ˆ Calcium ) = 25.00 and E(Y ˆ Chlor ophyll ) = 27.55 whereas the empirical means E(Y ! ! are E˜ YAlkalinity = 36.57, E˜ (YCalcium ) = 20.98 and E˜ YChlorophyll = 23.18, respectively. The estimated and the actually observed mean of minimum mer˜ Mercury ) ˆ Mercury ) = E(X cury concentration of course always coincide, i.e. E(X = 0.2700. Further conclusions about the goodness-of-fit for the marginal distributions can be drawn from probability plots. In probability plots, the observed probability is plotted against the predicted probability for the fitted model. To draw probability plots for the variables X (mercury) and Y (alkalinity, calcium and chlorophyll), respectively, we plot FX (xi ) and FY (yi ) versus (i −0.375)/(n+0.25), i = 1, . . . , 53, as recommended by Blom (1958) and Chambers et al. (1983), where FX (.) and FY (.) represent the estimated marginal cdfs of X and Y , while xi and yi are the samples sorted in ascending order. Figure 6 depicts that all probability plots indicate an adequate fit of the estimated marginals. Metropolis-Hastings sampling as presented in Sect. 4 is used to sample the posterior distribution when π J (α, β, γ ) and π U1 (α, β, γ ) are taken as prior distributions. Figure 7 shows the shape of the posterior densities of the parameters along with trace plots for 305000 MCMC iterations after a burn-in of 5000 iterations. Thinning the chain by retaining only every 30th sample yields 10000 samples from the posterior distribution. The average acceptance rates are 26, 28 and 27 %, respectively. Convergence is monitored by the trace plots for each parameter. The trace plots seem to be very stable and well mixed and appear constant over the graphs. Note the fundamental difference in the posteriors for β: the likelihood as a function of β is concentrated so close to zero that in case of the Jeffreys prior the posterior adopts the asymptotics of the prior and is in O(β −0.5 ) as β → 0.

123

M. Mohsin et al.

0.6 0.4

Density

0.2

4 2

0.0

3

α

5

0.8

144

0

2000

6000

10000

2

3

4

6

0

0.00

20

60

Density

100

0.04 0.02

β

5

α

Trace

0

2000

6000

10000

0.00

0.02

β

0.04

4

Density

0

0.1

2

0.3

γ

6

Trace

0

2000

6000

Trace

10000

0.0

0.2

0.4

0.6

γ

Fig. 7 Trace plots and posterior densities for (α, β, γ )T in case of X Mer cur y and YCalcium when π J (α, β, γ ) (red) and π U1 (α, β, γ ) (black) are used as prior distributions. (Color figure online)

Table 3 represents the summary statistics of the posterior distributions for the parameters α, β and γ when data point 40 is excluded. There is not too much difference between the estimated parameters by ML and Bayesian methods. Due to the additional uncertainty about the parameters, the standard errors of the Bayesian estimates are slightly larger as compared to those of the ML method. As described in Sect. 1, a number of bivariate exponential distributions have been proposed in the literature so far. Certainly, one is interested in how well the proposed BALE distribution performs in comparison with those alternatives. Since the data do not indicate that a distribution with a singular component could be a suitable model, the following comparison is restricted to absolutely continuous bivariate exponential distributions. We compare our proposal to the distributions of Gumbel (1960), Arnold and Strauss (1988) and Hougaard (1986), all of them having three parameters, and base our model choice on a likelihood ratio test. This means that the model with highest maximum value of its log-likelihood function is declared as the best fitting model. The maxima of the log-likelihood functions for the different distributions and the different

123

A new bivariate exponential distribution

145

Table 4 Maxima of the log-likelihood functions for the three pairs of variables (X Mercury , YAlkalinity ), (X Mercury , YCalcium ) and (X Mercury , YChlorophyll ) and different distributional assumptions Alkalinity

Calcium

Chlorophyll

BALE distribution

−204.43

−181.37

−186.63

Gumbel (1960)

−205.96

−182.47

−190.31

Arnold and Strauss (1988)

−211.71

−186.19

−191.41

Hougaard (1986)

−223.29

−182.95

−191.49

combinations of X (mercury) and Y (alkalinity, calcium and chlorophyll) are given in Table 4. The results indicate that the proposed BALE distribution fits best among these distributions for all combinations of mercury with one of the three chemical substances. The distribution proposed by Gumbel (1960) is second best in all cases. According to the scale given in Jeffreys (1961), the evidence in favor of the BALE distribution varies from substantial (difference in log-likelihood value around 1) for alkalinity and calcium to decisive (difference in log-likelihood value around 4) for chlorophyll. 5.3 Conditional probabilities of mercury exceeding a threshold Florida Department of Health and Rehabilitative Services opts 0.5 µg Hg/g in the edible tissue as unsafe level for health. Ware et al. (1991) point out that the mercury concentrations have exceeded 0.5 µg Hg/g in largemouth bass from all the distant and urban lakes as well as rivers in Florida. Using the above threshold level of the mercury concentration, we can estimate the conditional probabilities of the minimum mercury exceeding the threshold value for a given amount of alkalinity. Similarly, we can estimate the conditional probabilities of the minimum Hg exceeding the threshold level for a given amount of calcium and chlorophyll as well. For this purpose we use Eq. (10) to find the conditional probabilities by the different parameter estimation methods. In case of the Bayesian approach we approximate the true conditional probabilities by averaging Eq. (10) evaluated at the posterior samples obtained from the Metropolis-Hastings algorithm. The results can be seen in Table 5. Interestingly, the exceedance probabilities obtained using the Jeffreys prior, π J (α, β, γ ), are closer to the results for ML than those obtained using the flat prior, π U1 (α, β, γ ). Table 5 highlights that for alkalinity equal to 1 mg/l there is a chance of about 40 % that the minimum amount of mercury in largemouth bass exceeds the threshold level, in case the alkalinity level rises to 20 mg/l this chance decreases almost to 10 %. In simple words, as the alkalinity level increases, the chance of minimum mercury exceeding the unsafe level postulated by Florida Department of Health and Rehabilitative Services decreases. These results are visualized in the conditional cumulative probability graphs of Fig. 8. These graphs also show the probability of minimum mercury absorption in largemouth bass exceeding certain thresholds other than 0.5 µg Hg/g given the level of the chemical substances in the lakes.

123

146

M. Mohsin et al.

Table 5 Conditional probabilities of minimum mercury concentration exceeding 0.5 µg Hg/g given certain levels of alkalinity, calcium and chlorophyll computed using the maximum likelihood and the Bayesian method P(min Mer cur y ≥ 0.5|Alkalinit y = y) Alkalinity

1

5

ML

0.4197

0.3213

10 0.2266

20 0.1063

50 0.0091

75 0.0010

120 0.0000

Bayes π U1

0.3922

0.3021

0.2166

0.1068

0.0122

0.0021

0.0001

Bayes π J

0.4078

0.3142

0.2233

0.1092

0.0118

0.0019

0.0001

P(min Mer cur y ≥ 0.5|Calcium = y) Calcium

1

5

ML

0.3845

0.2647

10 0.1612

20 0.0561

40 0.0059

60 0.0006

90 0.0000

Bayes π U1

0.3616

0.2490

0.1531

0.0565

0.0080

0.0013

0.0001

Bayes π J

0.3799

0.2601

0.1583

0.0570

0.0076

0.0012

0.0001

P(min Mer cur y ≥ 0.5|Chlor ophyll = y) 0.2771

0.1779

0.0694

0.0033

0.0000

0.0000

Bayes π U1

0.3644

0.2606

0.1684

0.0688

0.0051

0.0001

0.0000

Bayes π J

0.3826

0.2719

0.1740

0.0694

0.0047

0.0001

0.0000

0.0 0.2 0.4 0.6 0.8 1.0

x

50

(b) Calcium=1 Calcium=5 Calcium=10 Calcium=20 Calcium=40 Calcium=60 Calcium=90

0.0 0.2 0.4 0.6 0.8 1.0

x

100

150

(c) 0.0 0.2 0.4 0.6 0.8 1.0

Alkalinity=1 Alkalinity=5 Alkalinity=10 Alkalinity=20 Alkalinity=50 Alkalinity=75 Alkalinity=120

20

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

(a)

10

P(min Mercury >= x|Chlorophyll)

5

0.3874

P(min Mercury >= x|Calcium)

1

ML

P(min Mercury >= x|Alkalinity)

Chlorophyll

Chlorophyll=1 Chlorophyll=5 Chlorophyll=10 Chlorophyll=20 Chlorophyll=50 Chlorophyll=100 Chlorophyll=150

0.0 0.2 0.4 0.6 0.8 1.0

x

Fig. 8 Conditional probabilities of minimum mercury exceeding a certain threshold given different levels of alkalinity, calcium and chlorophyll

6 Conclusion We have presented a new bivariate exponential distribution to model the concentration of chemical elements and substances in an eco-environmental system. In many real life situations the standard distributions do not work properly because of increased variation and uncertainty in the data to be attributed to hidden sources of randomness. The proposed BALE distribution is quite flexible for modeling moderately negative dependence and all marginal and conditional distributions and density functions, the corresponding moments, the copula and its density have comparatively accessible analytic expressions. Adjusting the function φ(x) in Eq. (3) would introduce still more flexibility. This flexibility can be easily extended to multivariate distributions by

123

A new bivariate exponential distribution

147

using the same procedure. Furthermore, it is rather easy to simulate data according to this bivariate distribution and to estimate parameters using maximum likelihood and (objective) Bayesian analysis. The real data example has shown that this bivariate distribution works well for modeling the mercury concentration in bass. A comparative study shows that the proposed distribution provides a better fit to the data than other well-known bivariate exponential distributions. Therefore, the proposed distribution makes it possible to draw reliable conclusions about exceedance probabilities of important thresholds. Acknowledgments The authors are thankful to the associate editor and the two referees for their valuable comments and suggestions which certainly helped to improve the paper. The first author is also thankful to the Higher Education Commission of Pakistan for their financial support of this project.

References Arnold BC, Strauss D (1988) Bivariate distributions with exponential conditionals. J Am Stat Assoc 83:522– 527 Barndorff-Nielsen O (1978) Information and exponential families. Wiley, New York Basu AP (1988) Multivariate exponential distribution and their application in reliability. In: Krishnaiah PR, Sen PK (eds) Handbook of statistics-7. North-Holland, New York, pp 467–477 Berger J, De Oliveira V, Sanso B (2001) Objective Bayesian analysis of spatially correlated data. J Am Stat Assoc 96:1361–1374 Berger J, Bernardo J, Sun D (2009) The formal definition of reference priors. Ann Stat 37:905–938 Blom G (1958) Statistical estimates and transformed beta-variables. Wiley, New York Chambers J, Cleveland W, Kleiner B, Tukey P (1983) Graphical methods for data analysis. Chapman&Hall/CRC, London Corless R, Gonnet G, Hare D, Jeffrey D, Knuth D (1996) On the Lambert W function. Adv Comput Math 5:329–359 Doss DC, Graham RC (1975) Construction of multivariate linear exponential distribution from univariate marginals. Sankhya A 37:257–268 Downton F (1970) Bivariate exponential distributions in reliability theory. J R Stat Soc B 32:408–417 Embrechts P, McNeil A, Straumann D (2002) Correlation and dependence in risk management: properties and pitfalls. In: Dempster M (ed) Risk management: value at risk and beyond. Cambridge University Press, Cambridge, pp 176–223 FAO/WHO (2003) Joint FAO/WHO expert committee on food additives: summary and conclusions. Sixty first meeting, food and agriculture organization of the United Nations and World Health Organization, Rome Franco M, Vivo JM (2010) A multivariate extension of Sarhan and Balakrishnan’s bivariate distribution and its ageing and dependence properties. J Multivar Anal 101:491–499 Franco M, Kundu D, Vivo JM (2011) Multivariate extension of modified Sarhan–Balakrishnan bivariate distribution. J Stat Plan Inference 141:3400–3412 Freund JE (1961) A bivariate extension of the exponential distribution. J Am Stat Assoc 56:971–977 Gelman A, Carlin J, Stern H, Rubin D (2003) Bayesian data analysis. Chapman & Hall/CRC, Boca Raton Genest C, Remillard B (2008) Validity of the parametric bootstrap for goodness-of-fit testing in semiparametric models. Ann Inst Henri Poincare B 44:1096–1127 Genest C, Remilliard B, Beaudoin D (2008) Goodness-of-fit tests for copulas: a review and a power study. Insur Math Econ 44:199–213 Gumbel EJ (1960) Bivariate exponential distributions. J Am Stat Assoc 55:698–707 Gupta RD, Kundu D (1999) Generalized exponential distributions. Aust NZ J Statist 41:173–188 Gupta RD, Kundu D (2001) Generalized exponential distributions: different methods of estimation. J Stat Comput Simul 69:315–338 Gupta RD, Kundu D (2002) Generalized exponential distributions: statistical inferences. J Stat Theory Appl 1:101–118

123

148

M. Mohsin et al.

Gupta RD, Kundu D (2007) Generalized exponential distributions: existing methods and recent developments. J Stat Plan Inference 137:3537–3547 Hougaard P (1986) A class of multivariate failure time distributions. Biometrika 73:671–678 Iyer S, Manjunath D, Manivasakan R (2002) Bivariate exponential distributions using linear structures. Sankhya A 64:156–166 Jeffreys H (1961) Theory of probability. Oxford University Press, Oxford Johnson NL, Kotz S (1975) A vector multivariate hazard rate. J Multivar Anal 5:53–66 Johnson NL, Kotz S, Balakrishnan N (1994) Continuous univariate distributions, vol 1. Wiley, New York Kass R, Wasserman L (1996) The selection of prior distributions by formal rules. J Am Stat Assoc 88:1343– 1370 Klauer KC (1986) Non-exponential families of distributions. Metrika 33:299–305 Kundu D, Gupta RD (2008) Generalized exponential distribution: Bayesian estimations. Comp Stat Data Anal 52:1873–1883 Kundu D, Gupta RD (2009) Bivariate generalized exponential distribution. J Multivar Anal 100:581–593 Lange TL, Royals HE, Connor LL (1993) Influence of water chemistry on mercury concentration in largemouth bass from Florida lakes. Trans Am Fish Soc 122:74–84 Lehmann EL (1966) Some concepts of dependence. Ann Math Stat 37:1137–1153 Lehmann EL, Casella G (2003) Theory of point estimation. Springer, New York Marshall AW, Olkin I (1967a) A multivariate exponential distribution. J Am Stat Assoc 62:30–44 Marshall AW, Olkin I (1967b) A generalized bivariate exponential distribution. J Appl Prob 4:291–302 Maz’ya V, Shaposhnikova T (1998) Jacques Hadamard, a universal mathematician. AMS/LMS, Providence Miller RG (1974) The Jackknife—a review. Biometrika 61:1–15 Nadarajah S, Kotz S (2006) Reliability for some bivariate exponential distributions. Math Probl Eng. doi:10. 1155/MPE/2006/41652 Nelsen R (2006) An introduction to copulas. Springer, New York Prudnikov AP, Brychkov YA, Marichev OI (1986) Integrals and series, vol 1. Gordon and Breach, Amsterdam Regoli G (2009) A class of bivariate exponential distributions. J Multivar Anal 100:1261–1269 Sarhan AM, Balakrishnan N (2007) A new class of bivariate distributions and its mixture. J Multivar Anal 98:1508–1527 Spizzichino F (2001) Subjective probability models for lifetimes. Chapman & Hall/CRC, Boca Raton Waagepetersen R, Ibanez-Escriche N, Sorensen D (2008) A comparison of strategies for Markov chain Monte Carlo computation in quantitative genetics. Genet Sel Evol 40:161–176 Ware FJ, Royals H, Lange T (1991) Mercury contamination in Florida largemouth bass. Proc Ann Conf SEAFWA 44:5–12 Wiener JG (1987) Metal contamination of fish in low pH-lakes and potential implications for piscivorous wildlife. Trans North Am Wild Nat Resour Conf 52:645–657

123