A Bayesian State Space Approach to

A Metropolis-Hastings algorithm for reduced rank covariance matrices with application to Bayesian factor models

GAETANO CARMECI 

Department of Economics and Statistics, University of Trieste, Piazzale Europa 1; 34127 Trieste, Italy. E-mail address: [email protected] tel.: +39 0405587100; fax: +39 040567543.

A Metropolis-Hastings algorithm for reduced rank covariance matrices with application to Bayesian factor models

Abstract Markov Chain Monte Carlo (MCMC) algorithms for Bayesian factor models are generally parametrized in terms of the loading matrix and the latent common factors which are sampled into two separate blocks. In this paper, we propose a novel implementation of the MCMC algorithm that is designed for the model parametrized in terms of the reduced rank covariance matrix underlying the factor model. Hence, the strategy proposed makes it possible to sample directly from the singular covariance matrix. For this purpose, we develop an efficient Metropolis-Hastings (M-H) step that takes explicitly into account the curved geometry of the support of the target distribution. The developed M-H proposal distribution is based on a mixture of Wishart Singular distributions. The M-H step is general and can be applied, in principle, to any Bayesian model in which inference on a random positive semi-definite matrix is required. Moreover, we propose a Bayesian point estimator which preserves the original structure of the singular covariance matrix. In the paper we implement the MCMC algorithm for the static factor model and the multivariate local level model with common trends and present two empirical illustrations on financial data. The algorithm works very well in both applications.

KEYWORDS: Wishart Singular distribution, Generalized Inverse Wishart distribution, Hausdorff measure, state space model, local level model with common trends, cointegration.

JEL classification: C15, C11, C32

2

1. Introduction

Factor models have a long tradition in social science; they are important in a wide range of subjects, including economics, finance, statistics and econometrics. Furthermore in latest years, as testified by the 2004’s special issue of the Annals in the Journal of Econometrics edited by Croux et. al. (2004), there has been renewed interest in dynamic factor models and related state space models, particularly in financial and macroeconometrics. From the Bayesian viewpoint, Markov Chain Monte Carlo (MCMC) methods for estimating static and dynamic factor models have been proposed, among others, by Geweke and Zhou (1996), Pitt and Shephard (1999), Aguilar and West (2000), Jacquier et al. (1995) and Lopes and West (2004). All these MCMC algorithms are parametrized in terms of the loading matrix and the latent common factors which are sampled into two separate blocks. In this paper, we propose a novel implementation of the MCMC algorithm which is designed for the model parametrized in terms of the reduced rank covariance matrix underlying the factor model. Hence, the strategy proposed makes it possible to sample directly from the reduced rank covariance matrix. Indeed, the reduced rank covariance matrix is at the heart of the factor model. For example, in classical factor analysis the covariance matrix of the linear combinations of common factors is a singular, positive semidefinite matrix; see, e.g., Anderson (2003). As another simple but important example in time series analysis, consider the multivariate random walk plus noise process, usually named as local level model; see, e.g. Harvey (1989) and Harvey and Shephard (1993). A (nonstationary) dynamic common factor model arises when the covariance matrix of the disturbances driving the trend component is of positive but reduced rank; see, e.g. Harvey and Koopman (1997). In this model the rank deficiency of the covariance matrix implies the existence of common trends in the time series, and in turn this means that the time series are cointegrated. In the paper, we apply the proposed novel implementation of the MCMC algorithm to these examples, namely the static factor model and the local level model with common trends. However, our approach can be fruitfully employed in all models in which Bayesian inference on a random positive semidefinite matrix is required. Better

3

still, in the paper we will show that, with minor modifications, it can also be applied to positive definite matrices. Using the alternative parameterization of the static factor model in terms of the reduced rank covariance matrix we obtain two significant advantages over the standard parameterization. First, in contrast to the standard parameterization where the loading matrix and the latent common factors enter the model as a product, in the proposed parameterization the singular unobserved components and their reduced rank covariance matrix appear separately. As a result, a better mixing of the Markov chain, which improves the overall efficiency of the algorithm, is obtained. Second, contrary to the standard formulation where a priori constraints must be imposed on the loading matrix to achieve parameter identification1, in the proposed formulation the identification of the parameters associated to the systematic component of the model requires fulfilment of the reduced rank condition only. Moreover, the alternative parameterization in terms of the reduced rank covariance matrix is undoubtedly more natural for the linear dynamic factor model. In this paper we adopt, as prior for the singular covariance matrix2, the noninformative prior distribution first considered by Diaz-Garcia and Gutierrez (2006), and we develop an efficient Metropolis-Hastings (M-H) step that takes explicitly into account the curved geometry of the support of the target distribution. Indeed, the crucial issue in the design of the M-H step is that the support of the target distribution is not a vector space but a manifold. As such, a linear combination of any two elements of this set may not be an element of the same set. For this purpose, we develop a M-H proposal distribution based on a mixture of Wishart Singular distributions.3 In the paper we show that, by fine tuning the setting parameters of this proposal distribution, we succeed in efficiently moving the chain through the whole support of the target density. Further, we notice that the curved geometry of the support has implications for the Bayesian inference on the reduced rank covariance matrix too. In fact, although 1

For example, Geweke and Zhou (1996) and subsequently many other authors have specified the factor loading matrix as a full rank, block lower triangular matrix with diagonal elements strictly positive. 2 In recent years the study of singular distributions has received renewed attention in the literature. Among the analyses about singular random matrices see, e.g., Katri (1968), Uhlig (1994), Diaz-Garcia et al. (1997), Diaz-Garcia and Gutierrez (1997), Srivastava (2003), Diaz-Garcia and Ramos-Quiroga (2003), Diaz-Garcia and Gutierrez (2006) and references therein.

4

posterior means for the parameters are routinely calculated as averages of the simulated samples, the averages thus obtained do not preserve the original structure of the unknown singular covariance matrix. To avoid misleading results, a Bayesian point estimator which preserves such matrix structure is needed. For this purpose, in the paper we propose a Bayesian point estimator obtained by the generalized Choleski decomposition for reduced rank covariance matrices. It is worth noting that, as explained by Diaz-Garcia et al. (1997) and, very recently, by Diaz-Garcia and Gutierrez (2006) and Diaz-Garcia (2007), the singular random matrices considered here possess a density only with respect to Hausdorff measure. Therefore, the prior and posterior densities, as well as the density of the proposal distribution in the M-H step, are specified with respect to Hausdorff measure and integral, instead of the usual Lebesgue measure and integral. However, we will show that, from a practical point of view, working with Hausdorff measure and integral does not require any special attention in the MCMC implementation. Finally, in the paper we consider the case when the number of common factors is unknown; hence the rank of the singular covariance matrix must be selected too. The rank selection will be based on the Bayesian Deviance Information Criterion (DIC), proposed by Spiegelhalter et al. (2002) as a simple alternative to Bayes factors or posterior odds computations. The plan of the paper is as follows. In section two we present our approach for drawing singular covariance matrices in a different context from factor models. That is, for ease of presentation, but without any loss of generality, we start the analysis making inference on a singular normal population. In section three we apply the proposed novel implementation of the MCMC algorithm to the static factor model and we introduce the Bayesian DIC. Finally, an application to exchange rates is presented. In section four we consider the multivariate local level model with common trends and show how our algorithm has to be adapted. The approach is then applied to three monthly US short-term interest rates. Section five concludes.

3

For a definition of the Wishart Singular distribution see, e.g., Diaz-Garcia et al. (1997).

5

2. A Metropolis-Hastings algorithm for singular covariance matrices For illustrative purposes, but without any loss of generality, we initially introduce our approach for drawing singular covariance matrices in a different context from factor models, i.e. we start the analysis making inference on a normal m-dimensional singular population. This model, which can be thought as the simplest Bayesian singular multivariate linear model4, is defined as follows. For the m x 1 vector of observations y t , it is assumed that (1)

y t  ηt ,

ηt ~ SNID(0; Σ ), t  1,...,T ,

where SNID (0; Σ ) denotes that the vector is serially independent and distributed as a singular Normal with zero mean vector and m x m positive semidefinite covariance matrix Σ of known reduced rank n with n  m  T

.

This model can be written in matrix form as (2)

Y  ,

where Y  y 1 ,..., y T  '

T ,n 0, Σ  I T . As in Diaz-Garcia and and P( / Σ )  N Txm

T ,n Gutierrez (2006), we denote by N Txm 0, Σ  IT  the corresponding T x m matrix-

variate singular Normal distribution with rank ( Σ )  n and we propose the following noninformative prior density for Σ : n

(3)

dPΣ    i 2 m  n 1/ 2 dΣ  i 1

where i , i  1,.., n are the non-null eigenvalues of Σ . Diaz-Garcia and Gutierrez (2006) point out that in eq. (3) the measure dΣ  is not the Lebesgue measure, but the Hausdorff measure defined on S m (n) , the manifold of dimension mn-n(n-1)/2 of 4

MCMC methods are not required for this model, because by now the joint posterior distribution for

the parameters of the Bayesian singular multivariate linear model has been analytically obtained by Diaz-Garcia and Gutierrez (2006) and Diaz-Garcia and Ramos-Quiroga (2003), under the assumption of a noninformative prior and a proper generalized natural coniugate prior respectively. Nevertheless, this model allows us to neatly set up the key ingredients of the proposed M-H step.

6

the real symmetric rank-n positive semi-definite mxm matrices with n distinct positive eigenvalues. Moreover, by using the nonsingular part of the spectral decomposition for Σ , the Hausdorff measure dΣ  can be explicitly written in terms of exterior product of differential forms as follows:

(4)

(dΣ )  2

m

Λ

( m n )

n

 

i

  j  ( H1' dH1 ) (dΛ )

i j

where Σ  H1 ΛH1' , is the nonsingular part of the spectral decomposition of Σ , with Λ  diag 1 , 2 ,..., n  , 1  2  ...  n  0 and H1 Vn , m , the Stiefel manifold,

i.e. the set of the real m x n full-rank matrices, such that H1' H1  I n . Notice that (dΛ) is the exterior product of the positive eigenvalues of Σ ; see Uhlig (1994) and DiazGarcia and Gutierrez (1997). Remark: As far as the differential form (H1' dH1 ) is concerned, notice that it is invariant under the transformations H1  QH1 , Q  Om  and H1  H1P, P  On  , where

Om   Vm, m is the so-called orthogonal group. Therefore such differential form defines an (unnormalized) invariant measure on Vn , m ; see James (1954) and Muirhead (1982). This measure is invariant because for any region   Vn , m ,  Q   P     , where    ( H1' dH1 ) ; so that   represents the volume of the region  on the Stiefel





manifold Vn , m . The normalized invariant measure is a probability measure on Vn , m defined as

   

2 n  mn / 2 1   ' '   ( H d H )  , where Vol V  ( H d H )  , n,m 1 1  1 1 Vol Vn ,m   Vol Vn ,m  m Vn ,m n   2

with n  denoting the multivariate Gamma function; see Muirhead (1982), pgg. 67-72.

The likelihood function of the normal singular population can be written as (5)

n  1  l ( Σ / Y )  dP / Σ dΣ    iT / 2 exp  trΣY ' Y dΣ   2  i 1

where Σ  H1Λ 1H1' is the Moore-Penrose generalized inverse of Σ .

7

Thus, the posterior distribution of Σ is dPΣ / Y   l ( Σ / Y )dP Σ dΣ  (6)

n  1    i(T  2m  n 1) / 2 exp  trΣY ' Y dΣ   2  i 1

which, according to the definition given by Diaz-Garcia and Gutierrez (2006), is a mdimensional central, Singular Generalized Inverse Wishart of rank n, with

  T  n  1 degrees of freedom and scale matrix G  Y ' Y , denoted by Wm n, , G  ; that is we have that Σ / Y ~ Wm n, , G  . We now expound the MCMC approach that will be subsequently used in the context of factor models, pretending the posterior distribution of Σ to be unknown and setting up a simple M-H algorithm for drawing from it. Therefore, the target density of the M-H algorithm is (7)

 Σ   l ( Σ / Y )dPΣ dΣ 

and in what follows we develop a proposal distribution from which the candidate values for Σ  S m (n ) can be efficiently drawn. As outlined in the introduction, we explicitly take into account the curved geometry of the support of the target distribution, which prevents us from using standard candidate-generating densities such as the random walk M-H chain; see, e.g., Chib (2001). In fact, the crucial issue in the design of the M-H step is that a linear combination of any two elements of the manifold S m (n ) may not be an element of the same manifold. We resolve this issue by setting up a proposal distribution which is based on a mixture of Wishart Singular distributions. Yet, before presenting such proposal distribution we briefly define the Wishart Singular distribution. The following definition is taken from Diaz-Garcia et al. (1997) and adapted to our objective. If the matrix W has a m-dimensional Wishart Singular distribution with degrees of freedom v and scale matrix C, where both W and C are mxm symmetric positive semi-definite matrices of rank n