Monotone dependence in graphical models for ... - Semantic Scholar

1 downloads 0 Views 252KB Size Report
Dec 19, 2012 - of monotone dependence between the components of the chain. ... This case and all the likelihood ratio tests for the discussed hypotheses are.
Metrika (2013) 76:873–885 DOI 10.1007/s00184-012-0421-9

Monotone dependence in graphical models for multivariate Markov chains Roberto Colombi · Sabrina Giordano

Received: 19 January 2012 / Published online: 19 December 2012 © Springer-Verlag Berlin Heidelberg 2012

Abstract We show that a deeper insight into the relations among marginal processes of a multivariate Markov chain can be gained by testing hypotheses of Granger noncausality, contemporaneous independence and monotone dependence. Granger noncausality and contemporaneous independence conditions are read off a mixed graph, and the dependence of an univariate component of the chain on its parents—according to the graph terminology—is described in terms of stochastic dominance criteria. The examined hypotheses are proven to be equivalent to equality and inequality constraints on some parameters of a multivariate logistic model for the transition probabilities. The introduced hypotheses are tested on real categorical time series. Keywords Granger causality · Conditional independence · Stochastic orderings · Order-restricted inference · Chi-bar-square distribution

1 Introduction When multivariate categorical data are collected over time, the dynamic character of their association must be taken into account. This aspect plays an important role in modelling discrete time-homogeneous multivariate Markov chains (MMCs).

R. Colombi Department of Engineering, University of Bergamo, viale Marconi 5, 24044 Dalmine, BG, Italy e-mail: [email protected] S. Giordano (B) Department of Economics, Statistics and Finance, University of Calabria, Cubo 0C, 87036 Arcavacata di Rende, CS, Italy e-mail: [email protected]

123

874

R. Colombi, S. Giordano

In this work the dynamic relations among the marginal processes of an MMC are described through Granger noncausality (hereafter also G-noncausality) Chamberlain (1982), and contemporaneous independence conditions, which are presented as Markov properties encoded by a mixed graph. Moreover, the dependence of a univariate marginal process on the past of the chain is modelled by stochastic dominance conditions. Our approach enables us to establish whether the dependence of a component of the MMC on its parents—according to the graph terminology—satisfies an appropriate stochastic ordering by testing constraints on certain parameters. The work is organized as follows: in Sect. 3, we present the Granger noncausality and contemporaneous independence conditions which specify when an MMC is Markov with respect to a mixed graph and then, in Sect. 4, we investigate hypotheses of monotone dependence between the components of the chain. In Sect. 5, we illustrate how to express these hypotheses through equality and inequality constraints on some parameters of a multivariate logistic model for transition probabilities. When the null hypothesis is specified by inequality constraints, the likelihood ratio statistic has chi-bar-square asymptotic distribution whose tail probabilities are computed by simulation. This case and all the likelihood ratio tests for the discussed hypotheses are described in Sect. 6. The given methodology is applied to real data in the final Section. 2 Basic notation Let AV = {AV (t) : t ∈ N} = {A j (t) : t ∈ N, j ∈ V}, V = {1, . . . , q}, N = {0, 1, 2, . . . , }, be a time-homogeneous first order q−variate Markov chain. For all t ∈ N, AV (t) = {A j (t) : j ∈ V} is a discrete random vector where each element A j (t) is an ordinal categorical variable taking values in the finite set of integers A j = {1, 2, . . . , s j } and I = × j∈V A j indicates the joint state space. For every S ⊂ V, a marginal process of the chain is represented by AS = {AS (t) : t ∈ N} where AS (t) = {A j (t) : j ∈ S} takes values on the set IS = × j∈S A j . When S = { j}, the univariate marginal process is indicated by A j = {A j (t) : t ∈ N}, j ∈ V. A state of the MMC is denoted by the vector i = (i 1 , i 2 , . . . , i q ) ∈ I. If S ⊂ V then iS , iS ∈ IS , indicates the vector with components i j , j ∈ S. When S = { j} is a singleton we will use the easier notation i j . Moreover, we adopt the notation (Dawid 1979) of X ⊥⊥ Y |W conditional independence when the random variables X and Y are independent once the value of a third variable W is given. 3 Mixed graphs associated with an MMC We provide a graphical representation of the dynamic dependence relations among the component processes of an MMC through a mixed graph that encodes Granger noncausal and contemporaneous independence statements. Mixed graphs have been previously used by Eichler (2007) in the context of multivariate time series. Other mixed graphs featuring different Markov properties are

123

Monotone dependence in graphical models

875

discussed by many authors, i.e. Andersson et al. (2001), Cox and Wermuth (1996) and Richardson (2003), among others. Here, we briefly recall some features of this kind of graph and refer to Colombi and Giordano (2012) for a deeper discussion of its properties. In the mixed graph G = (V, E), each node j corresponds to the univariate marginal process A j , j ∈ V, of the q-variate MMC AV , and the edges in the edge set E describe the interdependence among these processes. A pair of nodes i, k ∈ V of the mixed graph may be joined by the directed edges i → k, i ← k, and by the bi-directed edge i ↔ k. Each pair of distinct nodes i, k ∈ V can be connected by up to all the three types of edges. For each single node i ∈ V, the directed edge i → i may or may not be present. As the self-loop i → i is allowable in this class of mixed graphs, such graphs cannot be included in the well-known family of acyclic mixed graphs by Richardson (2003). If i → k ∈ E then i is a parent of k and k is a child of i. The set of parents of the node i is denoted by pa(i) = { j ∈ V : j → i ∈ E}. Moreover, when i ↔ k ∈ E the nodes i, k are spouses and the set of spouses of i is sp(i) = { j ∈ V : i ↔ j ∈ E}. Note that the generic node i may also be parent and child of itself. More generally, pa(S) and sp(S) are the collection of parents and spouses of nodes in S, for every non-empty subset S of V. A subset S ⊆ V is said to be connected if for every pair of nodes i, j ∈ S there exists a path between i and j involving only nodes belonging to S. Finally, P(V) will denote the family of all non-empty subsets of the node set V. The mixed graphs used here obey Markov properties which associate sets of G-noncausality and contemporaneous independence restrictions with missing directed and bi-directed edges, respectively. In particular, missing bi-directed edges lead to independencies of marginal processes at the same point in time; missing directed edges, instead, refer to independencies which involve marginal processes at two consecutive instants. The definition below specifies the properties for an MMC being Markov with respect to a mixed graph. Definition 1 (MMC Markov wrt a mixed graph) A multivariate Markov chain is Markov with respect to a mixed graph G = (V, E) if and only if its transition probabilities satisfy the following conditional independencies for all t ∈ N\{0} AS (t) ⊥⊥ AV \ pa(S ) (t − 1)|A pa(S ) (t − 1) ∀S ∈ P(V) AS (t) ⊥⊥ AV \sp(S)(t)|AV (t − 1) ∀S ∈ P(V).

(1) (2)

In the context of first order MMC, condition (1) corresponds to the classical notion of Granger noncausality (Proposition 1 in Colombi and Giordano 2012). Henceforth, we will refer to (1) with the term Granger noncausality condition for MMCs saying that AS is not G-caused by AV \ pa(S) with respect to AV , and use the shorthand notation AV \ pa(S)  AS [V] (hereafter the full set [V] is implicitly assumed for simplicity). Condition (2) does not involve the marginal processes A j : j ∈ sp(S)\S, at time t, and more precisely it states that the transition probabilities must satisfy the bi-directed Markov property (Richardson 2003) with respect to the graph obtained by removing the directed edges from G. Here, we will refer to (2) with the term contemporaneous

123

876

R. Colombi, S. Giordano

Fig. 1 Example of a mixed graph

1

2

3

independence condition for MMCs using a shorthand notation AS  AV \sp(S ) (the full set [V] is implicitly assumed), and say that AS and AV \sp(S ) are contemporaneously independent. Example 1 The graph in Fig. 1 displays the contemporaneous independence relation A23  A1 and the G-noncausal restrictions: A3  A2 ; A1  A3 ; A23  A1 ; A3  A12 . Note that in this example the presence of the edges i → i is assumed even if these edges are not drawn in Fig. 1. The above definition means that the lack of a directed edge from node i to k, (i, k ∈ V), implies the independence of the present of the univariate marginal process Ak from the immediate past of Ai given the most recent past of the marginal process AV \{i}, that is, for all t ∈ N\{0} i →k∈ / E ⇒ Ak (t) ⊥⊥ Ai (t − 1)|AV \{i} (t − 1).

(3)

Moreover, from Definition 1, a missing bi-directed arrow between i and k implies that the corresponding marginal processes are contemporaneously independent given the recent past of the MMC, that is, for all t ∈ N\{0} i ↔k∈ / E ⇒ Ai (t) ⊥⊥ Ak (t)|AV (t − 1).

(4)

The conditional independencies (3) and (4) are interpretable in terms of pairwise Granger noncausality and contemporaneous independence conditions, respectively. However, note that (1), (2) are not equivalent to the implied set of pairwise conditions as shown in-depth by Eichler (2012), Colombi and Giordano (2012). Conditions (1) and (2) encoded by the proposed mixed graph coincide with the Drton’s type IV block recursive Markov properties (Drton, 2009) of a two-component Markov chain graph where each chain component contains variables of the processes at same time and the parameterization (Sect. 5) for the transition probabilities of an MMC is a special case of that proposed for the multivariate regression chain graph models by Marchetti and Lupparelli (2011). Such connections between chain and mixed graphs are discussed in Sect. 8 of Colombi and Giordano (2012). Although the chain graph approach is widely appreciated in the literature, we believe more appropriate in the context of Markov chains the use of a graphical representation where nodes correspond to processes rather than to univariate variables. In fact, a mixed graph gives a more compact visualization of the whole MMC without imposing the choice of two arbitrary contiguous time points. Moreover, a drawback of the chain graph is that, in the case of first order multivariate Markov chains, it involves 2q nodes instead of only q as in the mixed graph. Furthermore, the mixed graph with q nodes is meaningful also for Markov chains of any order, whereas an extension of the chain graph for m-order Markov chains would require q(m + 1) nodes. Moreover, the problem of investigating G-noncausality and contemporaneous independence relations

123

Monotone dependence in graphical models

877

for marginal processes of an MMC, described as global Markov properties of mixed graphs (Sect. 4 in Colombi and Giordano 2012), would be quite complicated through chain graphs with a node for each variable at each time point. Furthermore, the idea of using the nodes of graphs to represent processes is not new in the graphical modelling literature, this peculiar structure has been already adopted to model composable finite Markov processes by Didelez (2006), multivariate time series by Eichler (2007) and marked point processes by Gottard (2007). 4 Monotone MMC Besides the Granger noncausality and independence conditions, the form of dependence of A j on A pa( j) , j ∈ V, is also relevant. Here, we address the problem of assessing the relationships between the components of an MMC by using the notion of monotone dependence. Given two ordered categorical variables Y and X , the nature of the monotone dependence of Y on X can be specified by requiring that the conditional distributions of Y given X satisfy an appropriate stochastic dominance criterion. The dominance criterion can be chosen from the simple, uniform and likelihood ratio stochastic orderings which correspond to successively stronger notions of monotone dependence. There is wide literature on stochastic orderings and the related kinds of monotone dependence (the reader refers, among others, to Shaked and Shantikumar 1994, Dardanoni and Forcina, 1998). In an MMC the hypothesis of monotone dependence of one marginal component A j on Ak , j ∈ V, k ∈ pa( j), states that the distributions of A j (t) conditioned by A pa( j) (t − 1) can be ordered, for all t, according to a stochastic dominance criterion in a coherent way with the order of the categories of Ak (t − 1), as the categories of A pa( j)\k (t − 1) are maintained fixed. An MMC where at least one component depends monotonically on all its parent will henceforth be referred to as marginally monotone MMC as stated by the following definition. Definition 2 (Marginally Monotone MMC) An MMC, which is Markov with respect to a mixed graph G = (V, E), is a marginally monotone MMC if there exists at least one j ∈ V such that the dependence of A j on Ak is monotone for every k ∈ pa( j). To illustrate different hypotheses of monotone dependence in the MMC framework we use the equivalence between inequality constraints on different types of logits and stochastic orderings shown by Shaked and Shantikumar (1994); Douglas et al. (1990), among others. The positive (negative) simple monotone depen{ j} dence of A j on Ak , k ∈ pa( j), holds if the global logits ηg (i j |i pa( j)\k , i k ) = P[A (t)>i |A (t−1)=i ,A

(t−1)=i

]

pa( j)\k pa( j)\k log P[A jj (t)≤i jj |Akk (t−1)=ikk ,A pa( are increasing (decreasing) with respect j)\k (t−1)=i pa( j)\k ] to i k , i k = 1, ..., sk − 1, for all i j = 1, ..., s j − 1, t ∈ N and i pa( j)\k ∈ I pa( j)\k . There is positive (negative) uniform dependence of A j on Ak , k ∈ pa( j), if the continuation P[A (t)≥i |A (t−1)=i ,A j)\k (t−1)=i pa( j)\k ] { j} logits ηc (i j |i pa( j)\k , i k ) = log P[A jj (t)=i jj |Akk (t−1)=ikk ,A pa( are increaspa( j)\k (t−1)=i pa( j)\k ] ing (decreasing) with respect to i k , i k = 1, ..., sk − 1, for all i j = 1, ..., s j − 1, t ∈ N and i pa( j)\k ∈ I pa( j)\k .

123

878

R. Colombi, S. Giordano

The positive (negative) dependence is of the likelihood ratio type when the local logP [ A (t)=i +1|Ak (t−1)=i k ,A pa( j)\k (t−1)=i pa( j)\k ] { j} are increasits ηl (i j |i pa( j)\k , i k ) = log P [ Aj (t)=ij |A (t−1)=i j j k k ,A pa( j)\k (t−1)=i pa( j)\k ] ing (decreasing) with respect to i k , i k = 1, ..., sk − 1, for all i j = 1, ..., s j − 1, t ∈ N and i pa( j)\k ∈ I pa( j)\k . To sum up, an MMC with positive time-homogeneous transition probabilities and Markov with respect to a mixed graph G = (V, E) is positive (negative) marginally monotone if and only if the following inequality constraints on logits are satisfied η{ j} (i j |i pa( j)\k , i k ) ≤ (≥)η{ j} (i j |i pa( j)\k , i k + 1), i k = 1, ..., sk − 1

(5)

i pa( j)\k ∈ I pa( j)\k , k ∈ pa( j), for at least one j ∈ V, the logits η{ j} (i j |i ) are of global, continuation or local type according to the simple, uniform and likelihood ratio orderings. The monotone dependence of a univariate component of an MMC can be positive with respect to certain components, and negative with respect to others. It is worth noting that the dominance criterion concerns only the marginal processes in an MMC and does not refer to their joint behaviour. In the special case of univariate Markov chains, our definition of marginally monotone MMC coincides with the one proposed by Kijima (1997).

5 A multivariate logistic model for transition probabilities Here, we clarify that the requirements of Definitions 1 and 2 are equivalent to equality and inequality constraints on suitable interactions of a multivariate logistic model which parameterize the transition probabilities. The time-homogeneous joint transition probabilities are denoted by p(i|i ), for every pair of states i ∈ I, i ∈ I, at two consecutive instants. / S, S ⊂ V, at the baseline Any state which includes categories i j ∈ A j , for j ∈ value (the first category), is denoted by (i S , i∗V \S ). Given a state i , for the transition probabilities p(i|i ), i ∈ I, we adopt the Glonek and Mccullagh (1995) multivariate logistic model whose marginal interaction parameters are denoted by η P (i P |i ), for every non-empty subset P of V and for every i P ∈ I P . The Glonek-McCullagh baseline interactions η P (i P |i ) are given by the following contrasts of logarithms of marginal transition probabilities p P (i P |i ) =    j:j P =i P p(j|i ), from the state i ∈ I to one of the states in I P , η P (i P |i ) =

 K⊆P

(−1)|P\K| log p P (iK , i∗P\K |i ).

(6)

Only when P = { j} is a singleton are the parameters in Eq. (6) logits, which will be denoted by η{ j} (i j |i ). The proof of the next proposition follows from a result by Marchetti and Lupparelli (2011).

123

Monotone dependence in graphical models

879

Proposition 1 For an MMC with positive time-homogeneous transition probabilities, it holds that: i) the Granger noncausality condition (1) is true if and only if η P (i P |i ) = η P (i P |ipa(P) ), i P ∈ I P , i ∈ I, P ⊆ V, P = ∅, ii) the contemporaneous independence condition (2) is equivalent to η P (i P |i ) = 0, i P ∈ I P , i ∈ I, for all P that are not connected sets in the bi-directed graph obtained by removing every directed edge from the mixed graph G. The definition of the Glonek-McCullagh interactions in terms of baseline log-linear contrasts of marginal transition probabilities is not necessarily the most convenient. When specific forms of monotone dependence are of interest, more general types of Glonek-McCullagh interactions are needed, as shown by Bartolucci et al. (2007), in order to provide the logits that must be constrained. Example 2 For every i 1 = 1, . . . , s1 , i 2 = 1, . . . , s2 , the transition probabilities p(i 1 , i 2 |i 1 , i 2 ), i 1 = 1, . . . , s1 , i 2 = 1, . . . , s2 , of the two-component Markov chain {1} {A j (t) : t ∈ N, j = 1, 2} can be parameterized by the local logits ηl (i 1 |i 1 , i 2 ) = log

p1 (i 1 +1|i 1 ,i 2 ) , i1 p1 (i 1 |i 1 ,i 2 )

p2 (i 2 +1|i 1 ,i 2 ) , i 2 = 1, . . . , s2 − p2 (i 2 |i 1 ,i 2 ) p(i 1 ,i 2 |i 1 ,i 2 ) p(i 1 +1,i 2 +1|i 1 ,i 2 ) log p(i ,i +1|i  ,i  ) p(i +1,i |i  ,i  ) , 1 2 1 2 1 2 1 2

{2}

= 1, . . . , s1 −1, ηl (i 2 |i 1 , i 2 ) = log

1, and by the local log odds ratios η1,2 (i 1 , i 2 |i 1 , i 2 ) = i 1 = 1, . . . , s1 − 1, i 2 = 1, . . . , s2 − 1. The hypothesis pa( j) = { j}, j = 1, 2, that every component A j , j = 1, 2, {1} depends on its own past only is equivalent to the constraints ηl (i 1 |i 1 , i 2 ) = {1} {2} {2} ηl (i 1 |i 1 ), ηl (i 2 |i 1 , i 2 ) = ηl (i 2 |i 2 ), i 1 = 1, . . . , s1 − 1, i 2 = 1, . . . , s2 − 1 {1} {1} for every i 1 , i 2 . If ηl (i 1 |i 1 ) ≤ (≥)ηl (i 1 |i 1 + 1), i 1 , i 1 = 1, . . . , s1 − 1, or {2} {2} ηl (i 2 |i 2 ) ≤ (≥)ηl (i 2 |i 2 + 1), i 2 , i 2 = 1, . . . , s2 − 1, the chain satisfies the positive (negative) likelihood ratio monotone dependence. If all the log odds ratios η1,2 (i 1 , i 2 |i 1 , i 2 ) are equal to zero the contemporaneous independence A1 (t) ⊥⊥ A2 (t)|A1 (t − 1), A2 (t − 1) holds. The local logits can be replaced by the global or continuation logits if a less restrictive stochastic order is appropriate. To model the dependence of the transition probabilities on the conditioning states i ∈ I, the following factorial expansion of the Glonek-McCullagh marginal interactions can be used  θ P,Q (i P |iQ ). (7) η P (i P |i ) = θ P (i P ) + ∅⊂Q⊆V

The Möbius inversion theorem (Lauritzen 1996) ensures that θ P,Q (i P |iQ ) =

 H⊆Q

(−1)|Q\H| η P (i P |iH , i*V \H )

(8)

where |Q\H| is the cardinality of the set Q\H. The next proposition clarifies that the requirements of Proposition 1 correspond to simple zero constraints on the θ P,Q (i P |iQ ) parameters.

123

880

R. Colombi, S. Giordano

Proposition 2 For an MMC with positive time-homogeneous transition probabilities, it holds that: (i) the Granger noncausality condition (1) is equivalent to θ P,Q (i P |iQ ) = 0 for all Q ⊆ pa(P), i P ∈ I P , iQ ∈ I Q , and (ii) the contemporaneous independence condition (2) is equivalent to θ P,Q (i P |iQ ) = 0, i P ∈ I P , iQ ∈ I Q , for all P that are not connected sets in the bi-directed graph obtained by removing every directed edge from the mixed graph G. Proof It follows from Proposition 1 and Eq. (8), see also Colombi and Giordano (2012). A useful restriction that considerably simplifies the monotone dependence constraints (5) is the hypothesis of additivity of the effects of AV (t − 1) on A j (t) for all t and j ∈ V. This marginal additive dependence allows the logits η{ j} (i j |i ), j ∈ V, to be expressed by the factorial expansion η{ j} (i j |i ) = θ { j} (i j ) +



θ j,k (i j |i k ).

(9)

k∈V

Note that under this hypothesis, the parameters θ P,Q (i P |iQ ) are null if P = { j} and |Q| > 1. When the additivity is assumed, (1) and (2) are equivalent to simple zero constraints and inequality constraints on the main effects θ j,k (i j |i k ). More precisely, / pa( j), positive assuming additivity, the main effect θ j,k (i j |i k ) must be null when k ∈ and increasing (negative and decreasing) in i k if the dependence of A j on Ak is positive (negative) monotone. 6 Likelihood ratio tests Before illustrating the testing procedure, let us rewrite the constraints introduced in the previous section in a compact form. Following Bartolucci et al. (2007), it can be proved that the set of zero restrictions imposed under the G-noncausality and contemporaneous independence hypotheses of Proposition 1 can be rewritten in the form C ln(Mπ) = 0, while the inequality constraints (5) for monotone dependence have a compact expression given by K ln(Mπ ) ≥ 0, where π is the vector of all the transition probabilities and C, K are matrices of contrasts and M is a zero-one matrix. Let HG be the hypothesis: C ln(Mπ ) = 0 stating that an MMC is Markov with respect to a mixed graph, and let HM : C ln(Mπ ) = 0, K ln(Mπ ) ≥ 0 be the hypothesis of a marginally monotone MMC. Let L G , L M and L U denote the maximum of the log-likelihood functions under the previous constraints and the unrestricted model. Under the assumptions provided by Basawa and Prakasa Rao (1980) for Markov chains, the likelihood ratio test (LRT) statistic 2(L U − L G ) for testing HG has the classical chi-square asymptotic distribution. In contrast, the statistics 2(L G − L M ) and 2(L U − L M ), for HM against HG and HM against the unrestricted alternative HU , are asymptotically chi-bar-square distributed (Silvapulle and Sen 2005). The chi-bar-square distribution is a mixture of chi-square random variables. The asymptotic chi-bar-square distribution follows from the same assumptions of Basawa

123

Monotone dependence in graphical models

881

and Prakasa Rao (1980) needed for the asymptotic distribution of 2(L U − L G ), and from the fact that the parametric space under the null hypothesis is defined by linear equality and inequality constraints on the model parameters. It may be also interesting to test the null hypothesis H0 : C ln(Mπ ) = 0, K ln(Mπ) = 0 against HM . In this case, the LRT statistic 2(L M − L 0 ) has a chi-barsquare asymptotic distribution as well. According to Silvapulle and Sen’s terminology, testing H0 against HM is a testing problem of type A and in this case the alternative hypothesis is specified by inequality constraints. Testing HM against HG or HU is of type B and the inequalities are under the null hypothesis. The ML estimation methods developed by Cazzaro and Colombi (2009) for multinomial data under equality and inequality constraints are easily adapted to the MMC context of this work. Moreover, Monte Carlo methods can be employed to compute the p-values of the LRT statistics 2(L G − L M ), 2(L U − L M ) and 2(L M − L 0 ) (details in Silvapulle and Sen 2005).

7 Examples The introduced hypotheses are tested on two datasets. All the procedures for computing ML estimates and p values, used in this section, are implemented in the R-package hmmm (Colombi et al. 2012). The first dataset consists of a 3-dimensional binary time series of sales levels (low, high) of three well-known Italian brands (Amato, Barilla, Divella) of pasta (spaghetti) sold by a wholesale dealer operating in a region of Southern Italy. The data were collected on 365 days in the period between December 2006 and January 2009. The sale rate series are reasonably assumed to be modelled by a first order 3-variate Markov chain. One question that arises in managing the sales inventory of pasta is whether the quantity of spaghetti sold by one brand depends on the amount of sales of the two competitors on the same day, given the past sales of all brands. Moreover, it is also important to ascertain whether the current sales of one brand of spaghetti are influenced by the previous demand for every brand of spaghetti. Monotone dependence hypotheses can also be plausible. For example, we can hypothesize that the probability of selling a high amount of spaghetti of a certain brand is greater when a large quantity of spaghetti of all three companies has been sold in the past. The answer to these questions can be obtained by testing hypotheses of Gnoncausality, contemporaneous independence and stochastic order. This boils down to testing equality and inequality constraints on the interactions which parameterize the transition probabilities of the spaghetti Markov chain and to identifying the mixed graph which encodes the underlying independence conditions. To this end, various hypotheses associated with edges of the mixed graph G = (V, E), with one node for each brand of pasta V = {1, 2, 3}, have been tested. In short 1, 2, 3 stand for brands Amato, Barilla, and Divella. In this example, all the marginal processes have only two states 0, 1 which correspond to low and high sales level, respectively; thus for every P ⊆ {1, 2, 3} and every level i = (l, m, n), l = 0, 1,

123

882

R. Colombi, S. Giordano

m = 0, 1, n = 0, 1, of the past sales, there is only one Glonek-McCullagh multivariate logistic interaction η P (i P |i ) which will be denoted by η P (l, m, n). According to Proposition 1, the hypothesis of mutual contemporaneous independence requires that all the interactions η P (l, m, n) are null, except the 24 logits η{ j} (l, m, n), j ∈ {1, 2, 3} which, instead, are to be constrained to satisfy the Granger noncausality and the monotone dependence conditions. The hypothesis of contemporaneous independence cannot be rejected as L RT = 34.212, d f = 32, p value = 0.362. The graph corresponding to this hypothesis has all the directed edges and no bi-directed edges. Under the hypothesis of additivity of the past sale effects on the current sale level, the j,1 j,2 j,3 logits η{ j} (l, m, n) have the factorial expansion η{ j} (l, m, n) = θ { j} +θl +θm +θn in terms of a general effect and three main effects. The previous main effects are null if their index l, m or n is zero. Note that under this hypothesis, the 24 logits are j,k parameterized by three general effects θ { j} and nine main effects θ1 , moreover the constraints (5) of monotone dependence are reduced to simple inequality constraints j,k on the previous main effects. More precisely, the main effect θ1 must be null when k∈ / pa( j) and positive (negative) if the dependence of A j on Ak is positive (negative) monotone. The hypothesis of contemporaneous independence together with additivity is not rejected as L RT = 44.81, d f = 44, p value = 0.44. Then, we add the order-restricted hypothesis that all causal relations are monotone. In particular, the hypothesis that the monotone dependence associated to all the edges is positive except for 2 → 3 and 3 → 2 which correspond to a negative monotone dependence is not rejected (L RT = 1.24, j,k p value = 0.86). These properties impose that the main effects θ1 are non-negative when associated with the edges 1 → 1, 2 → 2, 3 → 3, 2 → 1, 1 → 2, 1 → 3 and 3 → 1 while the main effects are non-positive when related to 2 → 3 and 3 → 2. Alternatively, we can proceed by considering that the graph which includes all directed and no bi-directed edges may be simplified since the hypothesis that the edges 2 → 1 and 3 → 1 can be removed is not rejected (L RT = 4.4, d f = 2, p value = 0.11). Therefore, the order-restricted hypotheses may be tested on this reduced graph where 2 → 1 and 3 → 1 are missing. In this case the hypothesis of monotone dependence of positive type for the edges 1 → 1, 2 → 2, 3 → 3, 1 → 2, 1 → 3 and negative for 2 → 3 and 3 → 2 is clearly not rejected, in fact L RT = 0, p value = 1. In conclusion, the spaghetti Markov chain can be described by a marginally monotone MMC with respect to the mixed graph in Fig. 2 where the edges 1 → 1, 2 → 2, 3 → 3 are implicitly inserted.

Fig. 2 Mixed graph for spaghetti MMC

2

1

3

123

Monotone dependence in graphical models

883

This means that the current sales level of Amato does not depend on previous sales of either Divella or Barilla. Moreover, a high level of sales of Barilla and Divella on one day is more probable when the quantity which Amato previously sold was high. On the contrary, given the previous high sales level of Divella spaghetti, a low level of Barilla sales is more probable, and vice versa. Moreover, there is no influence among the contemporaneous sales of all 3 brands, while sales of all brands depend positively on their own previous sales performance. Unlike the previous binary data where the order-restricted hypotheses are verified by constraining logits of local type, we now discuss another case with the aim of exemplifying the use of different types of logits to test different hypotheses of monotone dependence. We analyze a bivariate time series of the daily utilization rate of one server located in Milan (Italy) that provides connections for wap and web services. The data were collected every day for 6 months by an Italian mobile telephone company. We tested the hypotheses of G-noncausality and monotone dependence in the first order bivariate Markov chain AV = {A j (t) : t ∈ N, j = 1, 2} which describes the joint dynamic behaviour of the server utilization rates for wap and web connections. In this example, the wap marginal components A1 and the web marginal component A2 of the MMC take values on the same set of 3 ordered categories A = {1, 2, 3}, which correspond to the low, medium and high level of the utilization rate. The transition probabilities p(i 1 , i 2 |i 1 , i 2 ), i 1 , i 1 , i 2 , i 2 = 1, 2, 3, of the bivariate chain AV are parameterized by the 18 logits for the wap marginal component η{1} (i 1 |i 1 , i 2 ), the 18 logits for the web marginal component η{2} (i 2 |i 1 , i 2 ) and by the 36 log odds ratios η1,2 (i 1 , i 2 |i 1 , i 2 ). Under the hypothesis of additivity of the past server utilization rate effects on the current rate for wap-web services, the logits η{1} (i 1 |i 1 , i 2 ) and η{2} (i 2 |i 1 , i 2 ) are rewritten according to (9) as: η{1} (i 1 |i 1 , i 2 ) = θ {1} (i 1 )+ θ 1,1 (i 1 |i 1 ) + θ 1,2 (i 1 |i 2 ) and η{2} (i 2 |i 1 , i 2 ) = θ {2} (i 2 )+ 2,2 θ (i 2 |i 2 ) + θ 2,1 (i 2 |i 1 ). So the number of parameters is reduced since the 36 logits are expressed in terms of 4 general effects θ {1} (i 1 ), θ {2} (i 2 ) and 16 main effects θ 1,1 (i 1 |i 1 ), θ 1,2 (i 1 |i 2 ), θ 2,2 (i 2 |i 2 ), θ 2,1 (i 2 |i 1 ). Under this assumption, the conditions of Granger noncausality and monotone dependence correspond to equality and inequality constraints on these main effects. A further simplification comes from the hypothesis that the association between the daily utilization of the server for wap and web services does not depend on the utilization levels on the day before. This form of contemporaneous dependence between the two marginal processes forces the 36 log odds ratios η1,2 (i 1 , i 2 |i 1 , i 2 ) to be invariant with respect to the past states i 1 , i 2 , i.e. η1,2 (i 1 , i 2 |i 1 , i 2 ) = η1,2 (i 1 , i 2 ), so the number of log odds ratios needed to parameterize the transition matrix reduces to 4. In this case, local log odds ratios have been considered. The aim now is to verify whether the MMC of the server utilization rate for wap-web connections is Markov with respect to a mixed graph and whether it is monotone. Table 1 summarizes the results of the testing procedure. The hypothesis H A of additivity and constant association against the saturated model where no parameters are constrained is not rejected (rows 1, 4, 7 in Table 1) for all the logits of global, continuation and local types denoted by g, c and l.

123

884

R. Colombi, S. Giordano

Table 1 Hypotheses test H

Hypotheses

Test

Logit

LRT

df

p value

HA

add. + const. ass.

H A vs HU

g

58.3

48

0.14

HG

add. + const. ass. + G-noncausality

g

6.17

8

0.63

HM

add. + const. ass. + G-noncausality + monot.

HG vs H A H M vs HG

g

0.00

0–8∗

1

HA

add. + const. ass.

H A vs HU

c

55.4

48

0.22

HG

add. + const. ass. + G-noncausality

HG vs H A

c

9.07

8

0.34

HM

add. + const. ass. + G-noncausality + monot.

H M vs HG

c

0.91

0–8∗

0.89

HA

add. + const. ass.

H A vs HU

l

56.1

48

0.19

HG

add. + const. ass. + G-noncausality

HG vs H A

l

8.43

8

0.40

HM

add. + const. ass. + G-noncausality + monot.

H M vs HG

l

3.07

0–8∗

0.63

∗ LRT for this test is chi-bar-square distributed and the range of degrees of freedom of the chi-square

variables in the mixture is reported

The hypothesis HG refers to the double G-noncausality between the two processes, in short form A1  A2 and A2  A1 . Under this hypothesis the 8 main effects θ 1,2 (i 1 |i 2 ), θ 2,1 (i 2 |i 1 ), i 1 , i 2 = 1, 2, i 2 , i 1 = 1, 2 are required to be null. HG is not rejected against H A for all the considered logits (rows 2, 5, 8 in Table 1), so in the mixed graph associated to the chain the edges 1 → 2 and 2 → 1 are omitted. The test confirms that the knowledge of yesterday’s requests for wap (web) service does not have any influence in determining the more probable level of today’s utilization of the server for the web (wap) service. Moreover, it is also important to ascertain whether the current working of the server for the web or wap connections is influenced by the extent that the server worked for the same service the previous day. This relation, if it exists, may reasonably be monotone. Under the additivity assumption, the hypothesis that the dependence of A1 (t) and A2 (t) on their own past, associated to the edges 1 → 1 and 2 → 2, is positive monotone requires that θ 1,1 (i 1 |i 1 ), θ 2,2 (i 2 |i 2 ), are positive and increasing with respect to i 1 and i 2 . The hypothesis HM of positive monotone dependence according to the simple stochastic order is clearly not rejected against HG (row 3 in Table 1). The data are also strongly coherent with the positive monotone dependence according to both the uniform and likelihood ratio orderings associated to the edges 1 → 1 and 2 → 2 (rows 6, 9 in Table 1). In conclusion, the wap-web MMC turns out to be marginally monotone with respect to the mixed graph with the bi-directed edge 1 ↔ 2 and the edges 1 → 1, 2 → 2, associated with the monotone dependence of positive type. References Andersson S, Madigan D, Perlam M (2001) Alternative Markov properties for chain graphs. Scand J Stat 28:33–85 Bartolucci F, Colombi R, Forcina A (2007) An extended class of marginal link functions for modelling contingency tables by equality and inequality constraints. Stat Sin 17:691–711

123

Monotone dependence in graphical models

885

Basawa IV, Prakasa Rao BLS (1980) Statistical inference for stochastic processes. Academic Press, New York Cazzaro M, Colombi R (2009) Multinomial-Poisson models subject to inequality constraints. Stat Model 9(3):215–233 Chamberlain G (1982) The general equivalence of Granger and Sims causality. Econometrica 50(3): 569–582 Colombi R, Giordano S (2012) Graphical models for multivariate Markov chains. J Multivar Anal 107: 90–103 Colombi R, Giordano S, Cazzaro M (2012) R-package hmmm: hierarchical multinomial marginal models. http://CRAN.R-project.org/package=hmmm Cox DR, Wermuth N (1996) Multivariate dependencies—models, analysis and interpretation. Chapman and Hall, London Dawid AP (1979) Conditional independence in statistical theory (with discussion). J R Stat Soc Ser B 41:1–13 Didelez V (2006) Graphical models for composable finite Markov processes. Scand J Stat 34:169–185 Douglas R, Fienberg SE, Lee MT, Sampson AR, Whitaker LR (1990) Positive dependence concepts for ordinal contingency tables. Institute of Mathematical Statistics. Lecture Notes-Monograph Series: Topics in Statistical Dependence 16:189–202 Eichler M (2007) Granger causality and path diagrams for multivariate time series. J Econom 137:334–353 Eichler M (2012) Graphical modelling of multivariate time series. Prob Theory Relat Fields 153:233–268 Glonek GJN, Mccullagh P (1995) Multivariate logistic models. J R Stat Soc B 57:533–546 Gottard A (2007) On the inclusion of bivariate marked point processes in graphical models. Metrika 66: 269–287 Lauritzen SL (1996) Graphical models. Clarendon Press, Oxford Kijima M (1997) Markov chains and stochastic modeling. Chapman-Hall, London Marchetti GM, Lupparelli M (2011) Chain graph models of multivariate regression type for categorical data. Bernoulli 17:736–753 Richardson T (2003) Markov properties for acyclic directed mixed graphs. Scand J Stat 30:145–157 Shaked M, Shantikumar JG (1994) Stochastic orders and their applications. Academic Press, Boston Silvapulle MJ, Sen PK (2005) Constrained statistical inference. Wiley, New-Jersey

123