Understanding Forecasting: A Unified framework for ...

58 downloads 0 Views 722KB Size Report
Market forecasts are expressed in many forms; for example, the analyst's price .... Both the buy and sell sides of the investment management business employ a ...... where we have used equation (25) to simplify the first term on right hand side.
Understanding Forecasting: A Unified framework for combining both analyst and strategy forecasts1 James Sefton, Mark Bulsing and Alan Scowcroft

Abstract A fund manager needs to process a plethora of forecasts. These forecasts will be a mixture of stock recommendations, sector calls or style plays. He must interpret them, possibly checking them for mutual consistency, before combining them into a final estimate of expected returns. This paper develops a structured framework in which this problem can be expressed precisely. A forecast is based on additional information; be it information directly relating to the likely future performance of a class of securities, or information about market processes. We therefore place our analysis within an information theoretic framework. This has the implication that the information within a forecast is expressed in the forecasted return relative to the consensus forecast, and not relative to any long-term market equilibrium. However, there are strong connections between this framework, the Bayesian framework of Black and Litterman (1992) and the more classical one of Grinold and Kahn (2000). In fact we argue that our more general framework can be understood loosely as a fusion of these two approaches. We draw out the connections and discuss the estimation of our model. The fusion suggests various extensions, which we explore. A major problem of any fund manager is to ensure that he uses a mutually consistent set of forecasts. We discuss two approaches to test for the consistency of any two sets of forecasts. The first is a direct statistical test using the relative risk statistic; defined as the unconditional probability of observing the first set of forecasts divided by the conditional probability of observing the first set of forecasts given the second set of forecasts. A statistic greater than one indicates possible incompatibility. We derive confidence bounds for this statistic. The second approach focuses more explicitly on the mutual consistency between a set of stock level forecasts and a set of strategists or top down forecasts. We use the top down forecasts to inform the estimation of the factor returns implicit within a set of stock recommendations. This can be used to investigate whether these stock recommendations are consistent with the strategy calls.

The views and opinions expressed in this article are those of the authors and are not necessarily those of UBS. UBS accepts no liability over the content of the article. It is published solely for informational purposes and is not to be construed as a solicitation or an offer to buy or sell any securities or related financial instruments. 1

1

1. The Problem A fund manager needs to process a plethora of forecasts. These forecasts will be a mixture of stock recommendations, sector calls or style plays. He must interpret them, possibly checking them for mutual consistency, before combining them into a final estimate of expected returns. This paper develops a structured framework in which this problem can be expressed precisely. In a companion paper, Bulsing and Sefton (2004), we illustrate the use of the framework with a set of case studies. Obviously fund managers have developed various pragmatic approaches to some of these problems, the most commonly used is stock screening2. However, there have been only two attempts to formalise the process mathematically. The first was the Black and Litterman (1992) global asset allocation model; a Bayesian approach based on the Theil (1971) estimator. The second is the more classical approach of Grinold and Kahn (1995), which is framed in terms of conditional expectations. This work is closely related to both. Our framework is primarily an extension of the Grinold and Kahn (1995) approach. However, we make a different assumption about the covariance of the forecasts, one that is in fact implicit within the work of Black and Litterman. This enables us to combine the flexibility of the Grinold and Kahn approach with the analytical tractability of Black and Litterman’s model. The fusion of these two approaches suggests various extensions, which are explored in this paper. First we use the framework to test the mutual consistency of any two sets of forecasts. Here we employ the relative risk statistic3; defined as the unconditional probability of observing the first set of forecasts divided by the conditional probability of observing the first set of forecasts given the second set of forecasts. A statistic greater than one indicates possible incompatibility. We derive confidence bounds for this statistic. Secondly, we focus more explicitly on the mutual consistency between a set of stock level forecasts and a set of strategists or top down forecasts. We use the top down forecasts to inform the estimation of the factor returns implicit within a set of stock recommendations. This can be used to investigate whether these stock recommendations are consistent with the strategy calls. The structure of the paper is as follows. In the rest of this introduction we look in more detail at the sort of forecasts a fund manager might wish to exploit. We suggest a simply taxonomy of forecast types based around the linear factor model of stock returns. In Section 2, we set out our forecasting framework; and explore its connections with the Black and Litterman model in Section 3 and the Grinold and Kahn framework in Section 4. In Section 5 we discuss how our model might be calibrated. In section 6 we derive our test statistic for mutual consistency of two sets of forecasts; and in Section 7 we analyse the compatibility of top down and bottom up forecasts. Section 8 concludes.

A Taxonomy of Forecast Types Anyone holding only consensus views on expected returns should hold the benchmark portfolio. As Grinold and Kahn state ‘active management is forecasting’. A fund manager will base his forecasts on private information of some kind. It may be private market information or it may be the knowledge embedded in one of his proprietary models. As such all forecasts used in fund management are conditional forecasts, conditional on some additional information.

2

In a stock screen, stocks are ranked into quantiles with respect to each of number of chosen criteria, e.g. book to price, dividend yield, ROIC etc. These

rankings are then added using a chosen set of weights, e.g. an equal weighting. This aggregate ranking can be regarded as a scaled estimate of expected returns. Later we shall discuss how our approach can be used to inform this stock screening technique, by suggesting a ‘best’ set of weights. 3

This statistic is commonly employed in both biostatistics and sociology.

2

Market forecasts are expressed in many forms; for example, the analyst’s price target or the strategist’s sector view. If we are to build a framework that is flexible enough to embed within it the majority of these forecasts types, we must first develop an understanding of these types. We can then develop a simple classification system. Before we proceed further, we must introduce some notation. Assume we can invest in any of n stocks, and denote the returns to these stocks from time t to time t+1 by the n-vector rt+1. This return can be decomposed into the vector of consensus or expected returns µt +1 and the return innovation, θ t +1 rt +1 = µt +1 + θt +1 .

(1)

We shall assume that a linear factor model can describe the return innovation in any period

θ t +1 = Bf t +1 + ε t +1

(2)

where ft+1 is the k-vector of factor returns, B is the n-by-k matrix of factor sensitivities and ε t +1 is the nvector of idiosyncratic returns. As usual the distinction between factor and idiosyncratic returns is that in a sufficiently diversified portfolio, the idiosyncratic risk is diversified away. In this paper, we are concerned with three forecast types; portfolio, factor and company specific forecasts. These three types refer respectively to forecasts of the return innovation, θ t +1 , the factor returns, ft+1 and the idiosyncratic returns ε t +1 and are denoted gt+1, gf,t,+1 and gε,t+1 respectively. Each forecast could relate to the return of one stock, or the return to a number of stocks. Further, there could be more than one forecast of any type in any given period. To allow for both possibilities, we write the forecasting equations as gt +1 = PE (θ t +1 ) + ηt +1 g f ,t +1 = Pf E ( ft +1 ) + η f ,t +1

(3)

gε ,t +1 = Pε E ( ε t +1 ) + ηε ,t +1

where the expectation is taken with respect to the private information and ηt+1, ηf,t,+1 and ηε,t+1 are the respective forecast errors. Again we defer a detailed discussion to the next section. However, we need to make a few points relating to the structure of these equations; these, we shall make, with reference to the portfolio forecasting equation but the points apply equally to the other two types. The p-by-n matrix P is a matrix of portfolio weights. Each row of the matrix refers to a separate portfolio forecast. The elements of the row vector give the weights of the stocks in the portfolio being forecast4. The error vector ηt+1 allows for forecast errors. These errors may arise for any number of reasons; for example, data errors, model misspecification, difficulties in interpreting the forecast etc. Clearly any portfolio forecast can be understood as a combination of factor and company specific forecasts. This follows directly from the relation in equation (4). Section 7 discusses how to effect such a decomposition (this work is a development of the ideas in Chapter 14 of Grinold and Kahn (2000)). This

4

Thus for a single stock absolute forecast, the corresponding row of the matrix P is a row of zeros except for a 1 in the relevant column. For a single stock

relative forecast, the corresponding row of the matrix P is minus the vector of market weights except in the relevant column the element is equal to 1market weight.

3

decomposition can be useful for revealing the implied factor expectations in a set of analyst’s company forecasts. Now we are ready to develop a basic classification system for forecasts. The motivation is to demonstrate that almost all forecasts can be understood as one of our three forecast types; portfolio, factor or company specific. In Figure 1 we attempt just such taxonomy. The principal division is between top down or strategist forecasts and bottom up or analyst forecasts. The market makes a very clear distinction between these roles. Both the buy and sell sides of the investment management business employ a host of analysts who follow a small number of companies each and they also employ a number of strategists who comment on sectors or regional markets. Recent work by Vuolteenaho (2002) suggests that the strategist’s role is not just a task of aggregating up the individual stock forecasts. In fact the roles are very distinct and reflect a clear dichotomy in the way the market works. Stock returns are driven either by news about future cash-flows or news about future discount rates. In an econometric analysis of stock returns over the past 40 years, Vuolteenaho found that the principal driver of individual stock returns was cash flow news. Therefore at the individual stock level, the analysts should concentrate their efforts on forecasting the earnings of their covered stocks. This is of course is what they do. However, this cash flow news is relatively uncorrelated from one stock to another so at the portfolio level most of the effects are diversified away, see Vuolteenaho (2002). In contrast, the news concerning future discount rates is highly correlated from one stock to another and so relatively little of it gets diversified away. Therefore at the portfolio level, returns are mostly driven by news about future discount rates, even though at the individual stock level this news is less important than cash flow news. Hence strategists are well advised to concentrate on forecasting future discount rates, and leave analysts to concentrate on forecasting cash flows. Figure 1:A Taxonomy of Forecast Types

Conditional Forecasts

Bottom Up/Micro or Analysts' Forecasts

Top Down/Macro or Strategists' Forecasts

Portfolio Forecasts (Model Independent)

Sector/Country Forecasts

Source: UBS

4

Style Forecasts

Factor Forecasts

Style Screening

Discount Rate Forecasts

Cost of Capital Forecast

Business Risk (Company Specific Forecasts)

Cashflow Forecast

Common Drivers (Portfolio Forecasts)

Company Specific Drivers (Company Specific Forecasts)

It is tempting to assume that all analysts do is forecast future earnings and all strategists do is forecast future discount rates, as this would simplify the discussion enormously. Yet this is too extreme a polarisation as analysts sometimes do consider the future cost of capital. We have therefore broken down the bottom-up forecasts into forecasts about cash-flows and forecasts concerning discount rates. Yet we believe the majority of bottom up forecasts are concerned with future cash flows. These forecasts may be specific to one company, such as a change in management or product base; or common to a small number of stocks, such as a change in regulatory requirements. Therefore, in the absence of any other information, these forecasts are best modelled as company specific forecasts. Indeed, this is one way to understand the assumptions implicit in Grinold and Kahn’s approach to modelling ‘multiple assets, multiple forecasts’. Alternatively, these forecasts could be written as a portfolio forecast. In this case, it might be necessary to check that the implied factor forecasts are consistent with those of the strategists. Section 6 discusses how this might be done. Certainly if the information is no longer specific to a few companies, for example if the analyst anticipates that economic conditions for the sector as whole are changing, the forecast is definitely a portfolio forecast. Analysts do sometimes write about changes in the cost of capital. This may be because changes in the business interests of the company make it more or less sensitive to market conditions, effectively a change in the company’s beta. As such, it is company specific but concerns the risk characteristics rather than the return to the stock. A forecast of this type is unusual, and can only be modelled rather indirectly in our framework. More common, is for analysts to forecast a change in the cost of capital for their sector. A change to the cost of capital, by definition, can not be company specific but must relate to a change in the expected return to a priced common factor of returns. A factor forecast can be expressed in a variety of ways. Given a structural factor model of returns, the forecast can be expressed directly as an expected return to one of the model’s factors, a factor forecast. Though such a forecast has the advantage of being transparent or easily interpretable, its disadvantage is that it can be model dependent. An alternative approach is to build a factor-mimicking portfolio, and to express the factor forecast as the expected return to this portfolio. Expressed this way it is a portfolio forecast. In this case, it is probably worth checking if there are implicit company specific forecasts. Again this is relatively easy in this framework. A typical stock screen is type of factor forecast. In a stock screen all potential investments are ranked according to some criterion or set of criteria with the most desirable investments given the highest ranks. A common example is to rank stocks by their price to earnings ratio, with the stocks of longer duration (higher P/E) given a higher ranking if risk premiums are forecasted to fall or visa versa. Such a forecast is a factor forecast, but now the betas have been specified as well. In this example, the analyst is forecasting a positive return to a growth factor and further that he believes a stock’s P/E is a good proxy for the sensitivity of that stock to this factor. In this framework, one would directly include such a forecast as a growth factor forecast, or indirectly as a portfolio forecast. In this later case the portfolio would be long high P/E stocks and short low P/E stocks. Strategists work at the aggregate level. They may make factor forecasts too. It is this that makes it necessary to sometimes check the consistency of the analysts’ implicit or explicit forecasts for the cost of capital with those of the strategists. Section 6 is devoted to developing this idea. However, strategists also forecast changing market conditions; thus they may believe that current economic conditions favour the basic material sectors, or that the downside political risks in one country are about to ease, or that nominal interest rates will fall, reducing future debt payments for highly geared companies etc. All these types of forecast are naturally expressed as portfolio forecasts.

5

However, there is one important final point; a degree of iteration may be required in finding the best way to express a strategy forecast. This is best illustrated with a current topical example; market conditions are expected to improve, but it is forecast that those industries manufacturing investment goods would disproportionately benefit from the upturn as their values were particularly badly hit by the previous recession. The first thought might be to express this forecast as a positive return to a portfolio that is long the manufacturing companies and short the market. However, these industrial companies tend to have high market betas and so this forecast could be interpreted as implying a positive market factor return. Such an event would in all probability deliver a positive return to this high beta portfolio! Yet the forecast is more specific than that; and further information needs including. This could be achieved by including a market portfolio forecast as well, or including a less positive forecast for a portfolio that is long other high beta stocks and short the market.

6

2. The Basic Framework As we have argued, investors may have very different subjective expectations of future returns; possibly because they have access to different information, or possibly because they have very different views on market mechanism. We attempt to codify this as follows. Assume an investor’s information at time t on the market is denoted by the set I t+ , which consists of publicly available information, It, his own private information, Pt and his prior on the model parameters, θ . Thus It ∪ Pt ∪ θ Public Private Distributional Prior Information Information Information

I t+ =

(5)

Investors employ their information to make the best forecast they can of future returns; their conditional expected return forecast. We shall denote the conditional forecast of next period’s return given only public information as µt+1 and refer to it as the consensus forecast. We shall call the difference between this consensus forecast and the investor’s forecasts based on his additional information, the incremental forecast and denote it as µt++1 ; thus µt +1 = E ( rt +1 I t )

and

µt++1 = E ( rt +1 I t+ ) − E ( rt +1 I t )

(6)

Clearly, if the investor has only publicly available information, then his forecast will equal the publicly available forecasts and the incremental forecast will be zero. This is in line with Fama’s (1970) definition of Market Efficiency. Equation (7) decomposes total stock returns into their expected component and the return innovation, θ t +1 . Thus the incremental forecast µt++1 can be regarded as the best forecast of θ t +1 given the investor’s additional information. Recently, there has been a considerable volume of literature investigating the predictability of asset returns. This work is well summarised in Cochrane (Chapter 20, 2001). However, here, the researchers are investigating whether there is evidence of predictable patterns or cycles in the expected return to the market given publicly available information, that is whether µt +1 ≠ µt They are therefore looking for variables in It that predict returns. Repeatedly both the dividend to price ratio and the term premium have been shown to have some power to forecast the market returns. Further Fama and French (1989) found the default spread and even the return on cash can predict future returns too. We are concerned with the advantage that private information or developed intuition (model priors) can give in forecasting returns. This distinction is not always perhaps as clear as the terminology might suggest. For example lets look at market flow information, which may be useful in predicting short-term market momentum movements. This information is generally the private information of the large brokerage houses, but can usually be purchased from them for a fee. Does this make it public? The same goes for analyst reports. In practice, there are therefore costs to acquiring most types of information. Every investor then has to make a decision whether to acquire this information based on its costs versus its investment benefits. In this respect, we have a market in mind very much in the spirit of the stylised model of Grossman and Stiglitz (1980). The theory of efficient markets makes the assumption that all investors act as if they know the ‘true’ structural model describing asset returns. However, even if the investors agree on the structural form, they still need to estimate or ‘learn about’ the parameters of the model. (Disagreement about structural forms,

7

can be always be rephrased as a disagreement about parameters in a more general framework that embeds the differing forms as special cases.) In estimating these parameters, investors (consciously or unconsciously) will undoubtedly use their priors or beliefs about the world to improve the efficiency of the process. For example, the choice of which data sample to estimate the model on. This choice may be based on expediency or on rational arguments, but either way, the parameter estimates are a function of this choice. Another example is whether to assume returns are normally or lognormally distributed. All this must imply that investors return expectations will be a function of their individual beliefs or priors. Detemple and Murphy (1993) explores the influence of these priors on the equilibrium pricing of assets in a very simplified model. The incremental forecast, µt++1 , in equation (8) describes the effects of any additional private information and individually held priors on future expected returns. It, however, excludes the element of returns that is forecastable simply from the use of publicly available information. Now is there any evidence that better informed investors can better predict and therefore outperform other investors regularly? A growing number of papers find evidence of ‘hot hand’ funds, funds that consistently outperform. Bollen and Busse (2002) and Wermers (2003) find support for hot hands in the US, Blake and Timmermann (1999) and Tonks (2002) find similar support in the UK. Further Grinblatt and Titman (1987, 1992, 1993) and Wermers (1997, 2000) find that mutual funds do pick stocks that on average outperform their benchmark before costs. They found this ability to be particularly strong amongst growth orientated funds.

The Portfolio Forecasting Equation This section derives the central results for portfolio forecasts only. The approach is identical for the other forecast types and so these are simply quoted in the next section. If the fund manager received information on the implied incremental forecasts, µt++1 , for every stock in his universe, then his forecasting work would be finished. Unfortunately he does not observe µt++1 . But he does receive a wide variety of forecasts from very different sources that incorporate some of this information. However they incorporate this information rather imperfectly. Possibly because of forecast errors possibly because they need ‘interpreting’. For example, a forecast might estimate that a particular sector is underpriced by x%; some assumption needs to be made about the speed of adjustment to fair price before this forecast can be put into return units. Therefore the fund manager’s problem is to combine these very different forecasts consistently so as to reveal the implied underlying incremental forecast, µt++1 . The forecasting equation is our representation of the relationship between the information or forecasts available to the fund manager, gt +1 , and underlying incremental forecasts µt++1 . Formally, let the p-vector gt +1 be the return forecast to the p portfolios given by the p-by-n matrix P, then g t +1 = P µt++1 + ηt +1

(9)

where ηt +1 is a random error. This equation is flexible enough to embed a stock forecast; in this case the relevant row of the P matrix is all zeros except for the one element corresponding to the stock, which is set to 1. It can incorporate a sector or regional forecast by setting the relevant row to the weights of the stocks in the sector or regional portfolio. Further this forecast could be expressed as an absolute return or as a return relative to the market, by expressing the portfolio weights in absolute or relative terms. In order to estimate the underlying incremental forecasts from our observed forecasts, we make the following strong but standard assumptions:

8

Assumption 1: The incremental forecast, µt++1 , the stock return innovation θ t +1 and the forecast error ηt +1 are jointly normally distributed. There is, of course, evidence that the return distribution is positively skewed; however, normality is a reasonable working assumption and relaxing it would significantly complicate the analysis. Now if returns are jointly normally distributed, then the conditional means of the incremental forecasts given the observed forecasts are equal to the linear least square estimates of the incremental forecasts given the observed forecasts. This is often called the Gauss-Markov property; see for example Theorem 1.2.11 Muirhead (1982, p12). Therefore, given assumption 1,

(

)

E µt++1 gt +1 , I t = Cov ( gt +1 , rt +1 )Var ( gt +1 ) gt +1 −1

(10)

This is the basic forecasting equation. Unfortunately, it is difficult to use this equation in all but the simplest situations. This is because, generally, we have little information on the structure of the covariance matrix. Further, in only very rare circumstances, do we have a time history of forecasts so as to be able to estimate these matrices. It is therefore necessary to make some further assumptions. Assumption 2: The forecasts are unbiased; by this we understand that the forecast error ηt +1 is uncorrelated with the incremental forecast µ + . t +1

Secondly, we have assumed that there are no consistent errors in the forecasts. This may show a rather naïve faith in the quality of the forecasts. Any preliminary analysis will immediately uncover that analyst’s forecasts are habitually over-optimistic. It will therefore probably be necessary to adjust or demean the raw forecasts for the analyst’s naturally positive outlook on life. There are also those welldocumented conflicts of interest that are likely to induce a bias. However, it seems reasonable to assume that where there may be such a conflict, the forecast will not be used for investment purposes. Therefore, assumption 2 is best understood as referring only to forecasts of investment grade. Lemma 1: Denote the covariance of stock return innovations θ t +1 and forecast errors ηt +1 as Var (θt +1 ) = V and Var (ηt +1 ) = Ω respectively then, given assumptions 1 and 2, the best estimate of the underlying incremental returns and returns respectively given forecasts gt +1 and public information It are

(

(

)

E µt++1 gt +1 , I t = Var ( µt++1 ) P ' PVar ( µt++1 ) P ′ + Ω

(

)

−1

gt +1

E ( rt +1 gt +1 , I t ) = µt +1 + Var ( µt++1 ) P ' PVar ( µt++1 ) P ′ + Ω

)

−1

(11) g t +1

and the expected covariance in these estimates is given by

(

(

)

Var µt++1 g t +1 , I t = Var ( µt++1 ) − Var ( µt++1 ) P ' PVar ( µt++1 ) P ′ + Ω

(

E ( rt +1 gt +1 , I t ) = V − Var ( µt++1 ) P ' PVar ( µt++1 ) P ′ + Ω

)

−1

)

−1

Var ( µt++1 )

Var ( µt++1 )

(12)

Proof: This is relegated to the Appendix. It shows that Assumption 2 implies the incremental forecasts are uncorrelated with the return innovations. Then the proof is a straightforward application of Theorem 1.2.11 Muirhead (1982, p12).

9

To proceed any further, we must make some assumption about the covariance of the conditional incremental returns. Assumption 3: The covariance of incremental forecasts is proportional to the covariance of the stock return innovations

(

)

Var µt++1 I t = τ 2V

(13)

We are therefore assuming that the component of returns that is forecastable from our private information is correlated in an identical way to the return innovation. The constant τ can be regarded as a coefficient of forecastability. It is the proportion of the return innovation variance that is forecastable. Given this assumption, and Lemma 1 we can write down the distribution of the best estimates of the unobserved incremental forecasts

(µ (r

+ t +1

t +1

( , I ) ~ N (µ )

g t +1 , I t ~ N τ 2VP ' (τ 2 PVP ′ + Ω ) g t +1 ,τ 2V − τ 4VP ' (τ 2 PVP ′ + Ω ) PV g t +1

t

−1

−1

)

′ g t +1 ,V − τ 4VP ' (τ 2 PVP ′ + Ω ) PV t +1 + τ VP ' (τ PVP + Ω ) 2

−1

2

−1

)

(14)

In a later section we will discuss how we might estimate the coefficient of forecastability τ and the error covariance Ω. It is worth pointing out that if all one is interested in is the central estimate of revealed incremental returns, then all one care about is the ratio Ω /τ2, for dividing through by τ2 gives

( E (r

) ,I ) = µ

E µt++1 gt +1 , I t = VP ' ( PVP ′ + τ −2 Ω ) gt +1 t +1

gt +1

t

−1

+ VP ' ( PVP ′ + τ −2 Ω ) gt +1 −1

t +1

(15)

The parameters Ω and τ excepted, all other parameters are known or can be estimated from a history of stock return data. In the next section we shall extend this framework to the other forecasts types.

Company Specific and Factor Forecasts and a decomposition of expected returns In this section we discuss the derivation of the best conditional forecast of expected returns given any combination of company specific, factor or portfolio forecasts. We shall also show how the conditional expected return forecast can be decomposed into the implied factor returns and the implied idiosyncratic returns. This decomposition is used later to check the internal consistency of a set of forecasts. The assumptions underlying the derivation are the obvious analogue to those in the previous section. We therefore keep the discussion brief. We shall first need some notation. Equation (16) describes a linear factor model of the return innovations θ t +1 . As the factor and idiosyncratic returns have been defined with respect to the innovation rather than the total return their expected value given only public information is zero, E ( f t +1 I t ) =0

and

E ( ε t +1 I t ) = 0 .

(17)

We shall denote the conditional expected factor and idiosyncratic returns given the full information set as

10

ϕt++1 = E ( f t +1 I t+ )

and

α t++1 = E ( ε t +1 I t+ ) .

(18)

The factor and company specific forecasting equations reveal information about these expected factor and company specific returns. We write the basic forecasting equations as g f ,t +1 = Pf ϕt++1 + η f ,t +1

(19)

gε ,t +1 = Pε α t++1 + ηε ,t +1

where η f ,t +1 and ηε ,t +1 are random forecast errors. As before, we shall assume that all the relevant processes are jointly normally distributed. The covariance of the factor and idiosyncratic returns will be denoted by Var(ft+1)=F and Var(εt+1)=D respectively. As the factor returns are assumed to be uncorrelated with the idiosyncratic returns then covariance matrix of the return innovations is Var (θt +1 ) = V = BFB ′ + D

(20)

As before, we assume the random forecast errors are uncorrelated with the return innovations, θ t +1 ; that is the forecasts are unbiased. Further, though this is simply for notational ease, we shall assume that the random errors in the portfolio, factor and company specific forecasts are uncorrelated. The final assumption concerns the covariance of the forecastable element of the factor and idiosyncratic return processes. These covariance matrices are assumed to be proportional to the covariance of the underlying processes, Var (ϕt++1 ) = τ 2 F

and

Var (α t++1 ) = τ 2 D .

(21)

The assumption that the constant of proportionality is the same in each case is for ease only and could be relaxed too. We can now write down the joint distribution of returns and forecasts. However, rather than write down the joint distribution between the forecasts and the return innovations, we shall write the joint distribution between the forecasts and the factor and idiosyncratic returns. This will allow us to derive the conditional expected factor and idiosyncratic returns. As the conditional expected return innovation is the sum of beta matrix B times conditional factor returns plus the conditional idiosyncratic returns, we have also derived the promised decomposition. The joint distribution of returns and forecasts given public information It is  0  F  f t +1  τ 2 FB ′P ′ τ 2 FPf′ 0       τ 2 DP ′ D 0  0  0  εt +1  2 2 2 2   gt +1  ~ N  0  , τ PBF τ PD τ PVP ′ + Ω τ PBFPf′    2   2 2 g τ Pf FB ′P ′ τ Pf FPf′ + Ω f 0   0   τ Pf F  f ,t +1   0  0 2  gε ,t +1  τ τ 2 Pε DP ′ P D 0   ε   

     0  τ 2 Pε DPε′ + Ωε   0 τ DPε′ τ 2 PDPε′ 2

. (22)

Given this distribution, we can calculate the expected factor and idiosyncratic returns conditioned on the forecasts. Theorem 1.2.11 of Muirhead (1982, p12) implies that

11

 τ 2 PVP ′ + Ω gt +1 ,  τ 2 PBFPf′   f t +1   τ 2 FB ′P ′ τ 2 FPf′ 0  2 2 E   τ Pf FB ′P ′ τ Pf FPf′ + Ω f  g f ,t +1 ,  =  2 2 ε ′ ′ DP 0 DP τ τ 1 + t   2 ε     τ Pε DP ′ gε ,t +1  0  

τ 2 PDPε′

  0  τ 2 Pε VPε′ + Ωε 

−1

 gt +1     g f ,t +1  . (23)  gε ,t +1   

Finally the expected stock returns given our set of forecasts is the sum of these conditioned forecasts, that is   gt +1 ,  gt +1 ,      f t +1   E  rt +1 g f ,t +1 ,  − µt +1 = [ B I ] E    g f ,t +1 ,  ε    t +1  g  gε ,t +1  ε , t +1    τ 2 PVP ′ + Ω τ 2 PBFPf′  = τ 2VP ′ τ 2 BFPf′ τ 2 DP ′  τ 2 Pf FB ′P ′ τ 2 Pf FPf′ + Ω f  τ 2 Pε DP ′ 0 

τ 2 PDPε′

  0  τ 2 Pε VPε′ + Ωε 

−1

 gt +1     g f ,t +1     gε ,t +1 

(24)

where we have used equation (25) to simplify the first term on right hand side. We will spare the reader the rather unwieldy expression for the covariance of these expectations5. However, these can be derived from the expression for joint distributions in equation (26) in an identical way to the approach adopted in the previous section. Equation (27) is a decomposition of forecasts into implied factor and idiosyncratic returns and equation (28) is the fundamental conditioning equation. Equation (29) shows how portfolio, factor and company specific forecasts can be combined to derive the best estimate of expected returns given the additional forecast information available. In the next section, we suggest an interpretation for the Grinold and Kahn multi-asset forecasting procedure. Their procedure focuses almost exclusively on processing bottom up analyst forecasts; and is best represented as assuming one has a full set of company specific forecasts. In contrast, Black and Litterman (1992), Litterman (2003) in their global asset allocation model focus almost exclusively on top down forecasts. In Section 3 we show the equivalence between their approach and one based on processing portfolio forecasts. Equation (30) can be understood as a way of unifying the two in a general framework so as to process both bottom up and top down forecasts Expression (31) can be given the following interpretation. The forecasts are first transformed or rotated into a set of independent forecasts. The rotation matrix is any square root of the 3-by3 block covariance matrix on the right hand side of (32)6. If we denote the transformed forecasts by a superscript ‘*’ and the square root in the standard way then  g *  τ 2 PVP ′ + Ω τ 2 PBFPf′  t +1    g*f ,t +1  =  τ 2 Pf FB ′P ′ τ 2 Pf FPf′ + Ω f  *   2 0  g ε ,t +1   τ Pε DP ′

5 6

Available on request. A square root of the matrix Y is any matrix X that satisfies XX’=Y.

12

τ 2 PDPε′

  0  τ 2 Pε VPε′ + Ωε 



1 2

 gt +1     g f ,t +1   gε ,t +1   

(33)

 ′ Now these forecasts are independent and have unit covariance, that is Var   gt*+1′ gt*+1′ gt*+1′   = I . We     can therefore analyse each of these forecasts separately. Each forecast is multiplied by its covariance vector; the vector of covariances between it and each of the individual asset returns series. This gives the expected asset returns conditioned on this single forecast. The best estimate of returns conditioned on all forecasts is just the sum of the implications of each of these independent forecasts. Therefore to summarise, first it is necessary to transform the forecasts into an independent set. Once independent, they can be processed separately with the conditional expected returns being just the sum of the individual effects.

The final point concerns the decomposition in equation (34). Grinold and Kahn discuss the motivation for a decomposition of this type in their Appendix to Chapter 14. It enables one to examine the factor positions implicit in a set of forecasts. Thus if we processed a set of bottom up analyst forecasts as portfolio forecasts, we could investigate the implied factor positions within these forecasts. For example, we could investigate whether these individual stock calls are consistent with value or growth call for the market as a whole. The decomposition though is very different from the one proposed by Grinold and Kahn. The decomposition in equation (35) is the ‘natural’ decomposition within this conditional framework; it is the maximum likelihood decomposition. Of all the possible decompositions, this is the one that is the most likely in a clearly defined probabilistic sense. Grinold and Kahn effect their decomposition independently of the framework used for estimating the expected returns, by projecting the vector of expected returns onto the space spanned by the set of minimum variance factor mimicking portfolios. It is not clear why this set is any more desirable than any other set; say for example the set of factor mimicking portfolios with the lowest tracking error or the set of factor mimicking portfolios closest to the benchmark. Before we discuss calibrating the model, we shall first consider the connections between our approach and that of Grinold and Kahn (2000) and the Bayesian approach of Black and Litterman (1992).

13

3. Connections to the Black and Litterman Forecasting Model Black and Litterman (1992) developed an approach for combining estimates of equilibrium returns with additional market views of the investor. Their model is set in a Theil (1971) Bayesian framework where, as Scowcroft and Satchell (2000) point out, the investors views must be seen as the priors, which are updated in the light of data on the expected equilibrium returns. In this section, we discuss the connections between this Bayesian framework and our approach based on conditional expectations. The Black and Litterman (1992) model is a Global Asset Allocation model. Hence they only consider top down or strategist forecasts; these are all phrased as portfolio forecasts. These forecasts are then combined with the market equilibrium returns. The equilibrium is modelled explicitly. Expected returns are assumed to be equal to the equilibrium expected returns that clear the market in an International Capital Asset Pricing Model (ICAPM), see Adler and Dumas (1983). In this equilibrium all investors hold the market portfolio and partially hedge their currency risk by holding the same hedging portfolio with a weight in proportion to their market risk (Black (1990)). In contrast we distinguish between expected returns conditional on publicly available information and expected returns conditional on both public and private information sets. Thus our forecasts are expressed relative to consensus views rather than to an equilibrium set of returns. This has an advantage. Both the forecasts and consensus views are observable. The relative views can therefore be incorporated directly into the model without any adjustments. To do this we simply need to assume that if a manager’s views concord with the consensus, he will hold the benchmark. In contrast, in the Black and Litterman model, forecasts are expressed relative to the market equilibrium. In their Global Asset Allocation model, this difference is rather academic. The equilibrium returns are calculated as just those returns that imply a ‘mean-variance’ optimiser would hold the global market portfolio. Therefore given that the consensus view is that the global market is in an equilibrium, the two approaches are equivalent. We shall make this assumption for the duration of this section, in order draw out the connections. Yet under different conditions, the two may not be equivalent. Say for example, the benchmark is no longer the global market; or alternatively, if the equilibrium returns are estimated from a different source, or using a different method. For this section, therefore assume that the consensus view of the expected asset returns, µt +1 , is that they are equal to the market equilibrium return. Under this assumption, we will now show the equivalence of the conditional portfolio forecasting equation and the Black and Litterman model. Invoke the matrix inversion lemma, see for example Lütkepohl (1996), to show that given the relevant matrices are invertible the following identity holds



2

PVP ′ + Ω ) = Ω −1 − Ω -1 P (τ −2V −1 + P ′Ω -1 P ) P ′Ω −1 . −1

−1

(36)

Substituting this identity into the second line of equation (37) and rearranging gives E ( rt +1 gt +1 , I t ) = (τ -2V −1 + P ′Ω −1 P )

−1



V −1 µt + 1 + P ′Ω −1 ( gt + Pµt + 1 ) ) .

-2

(38)

Details of the matrix manipulation are given in the Appendix. Now the final term in this expression is the forecast absolute return to the portfolios P, given the public and private information set,

14

g t +1 + P µt +1 = PE ( rt +1 I t+ ) + ηt +1 .

(39)

Hence if µt +1 is thought of as the market equilibrium returns, equation (40) is identical to the Black and Litterman formula. (Their formulation requires the inversion of the risk matrix, Σ, whereas equation (41) requires only the inversion of a smaller p-by-p matrix and so is probably easier to implement when the universe of stocks is large). If the two approaches are so similar, they must be making very similar assumptions. Though they never explicitly state it, Black and Litterman do assume that the uncertainty in the forecast views is uncorrelated with the uncertainty in the equilibrium returns, our Assumption 2. This is implicit in point 7 in their Appendix, for only under this condition is the joint distribution of two random variables the product of their distributions. Further, their explicit assumption (p35, 1992) about the distribution of equilibrium expected returns is in essence equivalent to our Assumption 3 about the distribution of incremental forecast returns, µt++1 .

15

4. Connections to the Grinold and Kahn Forecasting Model (1) Our approach is also similar to that of Grinold and Kahn (2000). In this section we highlight the differences in assumptions, and argue for the advantages of our more general framework. (2) Grinold and Kahn (2000) derive a forecasting ‘rule of thumb’. The purpose of the rule is to provide practitioners with a straightforward method for generating refined, or conditioned, forecasts from raw forecasts. Given a set of raw forecasts, one corresponding to each asset, they show that, under a set of assumptions, the refined forecasts equal the raw forecast score (the forecast divided by the forecast’s volatility) times an information coefficient (usually assumed to be about 0.1) times the volatility of the corresponding asset returns. The rule is abbreviated to the refined forecast equals Volatility times IC times Score. We aim to argue that their assumptions, and therefore this rule, are consistent with the forecasts being interpreted as a set of company specific forecasts. Later, we derive a similar rule for the case when we have a single factor forecast. In this case the rule should be cross-sectional return volatility times IC times Cross-Sectional Score. We therefore attempt to clarify Grinold and Kahn's rather confusing statement ‘Forecasts have the form volatility × IC × Score. Sometimes this is simply proportional to IC ⋅ cross-sectional score.’ Grinold and Kahn (2000) concentrate almost exclusively on bottom up forecasts. They begin with a simple model where there is one forecast and one asset, and then generalise it to the case where there is one forecast for every asset in a large universe. In the single asset, single forecast, case. They derive the forecasting ‘rule of thumb’ by rearranging the forecasting equation (42) as

(

)

Cov(rt +1,i , gt +1,i ) gt +1,i E µ t+,i gt +1,i = Std (rt +1,i ) × × 

Std (rt +1,i ) Std ( gt +1,i ) Std ( gt +1,i ) Volatity 



IC

(43)

Score

where we have used a subscript i to denote element i in a vector, and introduced the obvious notation Std to denote the standard error. In this case, the formula is very general. In their Appendix to Chapter 10 they do make an assumption equivalent to our Assumption 1 so as to derive some results concerning this forecasting rule. Grinold and Kahn refer to this rule as the refined forecasts equal ‘Volatility times IC times Score’. The IC refers to the Information coefficient or correlation coefficient between the forecast and the asset return, the Score to the normalised forecast and the volatility to the time series standard deviation of the asset returns. The joy of this approach is that the volatility can be estimated relatively accurately from past return data, the IC can be assumed to be approximately 0.1 (see Section 5) so that all that need be estimated so as to condition a raw forecast is the volatility of the forecast. However, generalising this formula is difficult. For as Grinold and Kahn state, in the section entitled Refining Forecasts: Multiple Assets and Multiple Forecasts, 'with multiple assets and multiple forecasts it is more difficult to apply the basic forecast rule. This is because we lack sufficient data to uncover the required structure'. They therefore need to make a number of assumptions about the structure in the multiple forecast case. These assumptions can be regarded as their equivalent of our assumptions 2 and 3. Their assumptions are: (1) there is one forecast for each asset (P is an n-by-n diagonal matrix), and that the IC is the same for each forecast.

16

(2) that Cov(rt+1 ,gt+1)=IC × Std(rt+1) × Std-1(gt+1) × Cov(gt+1 ,gt+1) where Std(rt+1) and Std(gt+1) denote the diagonal matrices where the ith diagonal elements are Std(rt+1,i) and Std(gt+1,i) respectively; equation (11.A.2) in Grinold and Kahn (2000). These two assumptions imply that the raw forecasts are related to the observed returns by the relationship

rt +1 = IC × ( Std ( rt +1 ) × Std −1 ( g t +1 ) ) gt +1 + ηt +1

(44)

where E ( g t +1ηt′+1 ) = 0 . Now equation (45) has a diagonal structure and therefore reduces to equation (46) for each asset. It can be understood as assuming that if each asset return series was regressed in turn on all the forecasts, then in every regression, the only coefficient that would be significant would be the coefficient on its corresponding forecast. That is, there is no extra information in any of the other forecasts. These assumptions have the advantage that the simplicity of the result for the single stock, single forecast case carry over to a particular multivariate case. So if we assume that we are equally good at forecasting the returns to each stock (i.e. IC=0.1 for each forecast), all that needs estimating in the multivariate case is the volatilities of the forecasts too. We believe these assumptions are only likely to be satisfied if the forecasts are pure company specific forecasts. We argue this interpretation from two angles. Firstly, we reason that the assumption that the ICs are the same for each stock is only likely to be satisfied if the forecasts are company specific. Secondly, we argue that the implicit assumption about the structure of the covariance matrix of the forecast errors is only likely to be satisfied if the forecasts are company specific too. Hence the Grinold and Kahn assumptions are inconsistent with any type of factor forecast; this rules out most forecasts using stock screens. In this case, the information in the forecast relates principally to a factor return forecast; the factor being the one most highly correlated to the characteristic or characteristics on which the stocks are sorted. Grinold and Kahn’s assumption that the ICs are equal for each asset implies their forecasting rule relates to a set of company specific forecasts. We shall argue this by contradiction. Assume the forecast is principally a factor forecast. Now each element of the forecast vector gt+1 must be some linear function of the same factor forecast, otherwise assumption (2) would not hold. This follows because if we were forecasting say a growth factor for asset 1 and a size factor for asset 2, then if asset 1 is also sensitive to the size factor, the error ηt +1,1 will be correlated with the second forecast g t +1,2 . This contradicts assumption (2). Therefore assume this is the case. Now for equation (47) to hold, two conditions must be satisfied. The first condition is that the forecast for each stock must be the estimated factor return times a constant where the constant must have the same sign as the stock β, or stock sensitivity, to this factor. This is because the forecast must correctly predict the directional sensitivity of the stock returns to the factor, otherwise the ICs will not all be positive. It is difficult (but not impossible) to see how this can be done without estimating the full return covariance or risk matrix. The second condition is that the factor being forecast must explain the same proportion of each stock’s volatility, otherwise the ICs would not be of equal magnitude. However, such an assumption seems highly implausible. For example, one would not expect a growth factor forecast to explain as much of a growth neutral stock’s return volatility as it does of a growth stock. Similarly, it is improbable that a sector factor forecast would be expected to explain as much of the return volatility of a stock not in that sector as compared to one in it. Hence assumptions (1) and (2) appear to rule out a factor forecasts. In contrast, these assumptions are consistent with a set of company specific forecasts. For in this case, assumption (2) is satisfied as long as the idiosyncratic returns are uncorrelated from one stock to another.

17

This is usually assumed in any factor model. Further assumption (1) is satisfied if we are equally successful at forecasting the idiosyncratic component of each stock. Though, of course, both these conditions are unlikely to be strictly satisfied, both appear reasonable approximations. To show this more rigorously, equation (48) implies that if we only have a set of company specific forecasts for each asset our forecasting rule reduces to

(

)

−1

E rt +1 gε ,t +1 − µt +1 = τ 2 DP ′ τ 2 Pε DPε′ + Ωε  gε ,t +1 .

(49)

Now P is diagonal by assumption (1). Further, it is standard to assume the covariance of idiosyncratic returns D is diagonal and we assume the covariance of forecast errors Ω is diagonal. Hence equation (19) reduces to equation (50) for each asset where, ICi =

τ 2 Dii

V ii1/ 2 (τ 2 Dii + Ωε ,ii )

1/ 2

.

(51)

and the subscript i denotes the ith element of a vector and ii denotes the ith diagonal element of a matrix. Assumption (1) then amounts to assuming that these terms are equal for every asset. Grinold and Kahn implicitly assume a very specific structure for the forecast error covariance matrix, which is only likely to be satisfied if the forecasts are company specific. To argue this, we write down our forecasting equation g t +1 = P µt++1 + ηt +1

(52)

where the P is assumed to be diagonal in line with their assumption (1). Given our Assumption 2 (but not 3), this implies that the respective covariances are Cov ( gt +1 , gt +1 ) = PCov( µt++1 , µt++1 ) P ′ + Ω

and

Cov ( rt +1 , gt +1 ) = Cov( µt++1 , µt++1 ) P ′

(53)

Now assumption (2) can only be satisfied in our framework if the following condition holds Ω = γ 2 PCov ( µt++1 , µt++1 ) P ′

(54)

or that the forecast error covariance matrix is proportional to the incremental forecast covariance matrix. Though one could theoretically assume any structure for the forecast error covariance, it is difficult to argue that this condition is likely to be satisfied if the forecasts are factor forecasts. It is equivalent to assuming that one knows the factor βs exactly. To show this, assume that the incremental forecasts is equal to these βs times a incremental factor forecast, µt++1 = βϕt++1 . Now assume the forecasts are equal to g t +1 = ( β + δ β ) ϕt++1 where δβ is the error in the estimate of the βs. (We have not assumed an error in the

estimate of the factor return; this is because such errors would be consistent with equation (55)). Then condition (56) is equivalent to requiring that Cov(δ β µt++1 , δ β µt++1 ) = γ 2 β Cov( µt++1 , µt++1 ) β ′

18

(57)

where the cross terms drop out because we assumed the forecasts were unbiased, Assumption 2. It therefore, implausibly, requires that the errors in βs are related to the covariance of the incremental factor returns. Therefore, the implicit supposition about the error structure within the Grinold and Kahn assumptions also seems to rule out factor forecasts. In contrast, condition (58) appears reasonable if the forecasts relate to a set of company specific forecasts. The idiosyncratic return component is assumed to have a diagonal structure. Therefore condition (59) amounts to assuming the forecast error covariance matrix Ω. has a diagonal structure too, or equivalently that the error in one company specific forecast is uncorrelated with the error in another.

Does one use Cross-Sectional Scores or Time Series Scores ? Finally in this section, we look at the involved discussion in Chapter 11 of Grinold and Kahn on whether to use cross-sectional or time-series scores. They argue that typically a fund manager does not always process time series histories of forecasts for each stock, but rather a single cross-sectional forecast for all stocks; this cross-sectional forecast being usually based on a stock screen of some form. This then leads onto the debate on whether the ‘rule of thumb’ should be Volatility times IC times Score or cross-sectional Volatility times cross-sectional IC times cross-sectional Score. We believe this debate can be interpreted in terms of our earlier discussion on whether a particular set of return forecasts is best understood as a factor forecast or as a set company specific forecasts. To illustrate; assume a set of stock forecasts is based on an analysis of recent earnings revisions. Then, as we argued in the introduction, the evidence is that earnings news is relatively company specific and so should be modelled as a set of company specific forecasts. As we have shown, this interpretation is consistent with the rule that the refined forecast is equal to Volatility × IC × Score. In contrast, assume the set of stock forecasts is constructed by screening stocks on the basis of price to book, sales growth etc. Then this information is a factor forecast. We argued that such a forecast is not consistent with Grinold and Kahn’s assumptions. It is better interpreted directly as a factor forecast or more indirectly (but more generally) as the return to a factor-mimicking portfolio. Yet if it is modelled as a factor forecast, we shall now show that the refined forecast can be interpreted as Cross-Sectional Volatility × Cross-sectional IC Cross-Sectional Score., or XS Volatility × XS IC × XS Score for short. This would therefore appear to clear up the confusion. To elaborate, let’s assume a forecast is best interpreted as a factor forecast and we choose to model it as the return to a single factor-mimicking portfolio forecast, Pi . The subscript i is to reinforce the notion that this is a single row vector. Then our forecasting equation implies that the best estimate of the incremental forecasts is

(

)

(

′ E µt++1 g t +1 , I t = τ 2VPi′ τ 2 PVP i i + Ω

)

−1

g t +1 .

(60)

The term VPi′ is the vector of sensitivities, the βs, of each stock to the portfolio returns Pi rt +1 . Thus the expression g t +1 = VPi′ g t +1 can be understood as a forecast screen. It has ranked stocks by their sensitivity to the characteristic Pr i t +1 . Now we can calculate the expected cross-sectional variances and covariances, denoted as XS-Var and XS-Cov, given our earlier assumptions as

19

(

)

XSVar (rt +1 ) = E ( rt +1 − µt +1 )′ ( rt +1 − µt +1 ) = trace (V ) / n

)(

(

 ′ XSVar ( g t +1 ) = E  VPi′ gt +1 VPi′ gt +1 

(



′

)  =  PVn P  (τ PVP′ + Ω ) 2

i

i

2

i

(

XSCov ( ( rt +1 − µt +1 ) , g t +1 ) = E ( rt +1 − µt +1 )′ VPi′ g t +1

(61)

i

)) = τ PV P′ / n 2

2

i

i

These terms can now be used to re-express the forecasting equation (62) as

(

E µt++1

    2 2 ′   τ / n PV P V   i i  gt +1 , I t = trace1/ 2   ×  1/ 2 n   2 ′    1/ 2  

Pi i ′  trace1/ 2  V   PV   τ 2 PVP XS Volatity of rt +1 i i +Ω    n   n   

)

(

Cross Sectional IC

×

)

(63)

VPi′ gt +1 1/ 2

1/ 2  PV 2 P ′  i ′  i  τ 2 PVP i i +Ω  n   



(

)

Cross-Sectional Score

Again the expression has the form of a volatility times an IC times a score but now the expressions are calculated from the cross-sections. Hence if the set of forecasts is equivalent to a factor forecast the 'rule of thumb' is that the refined forecast is equal to XS Volatility ×XS IC × XS Score. To summarise the argument. If the forecast is best interpreted as a factor forecast then the forecasting ‘rule of thumb’ is XS Volatility × XS IC × XS Score. In contrast if the forecasts are best interpreted as a set of company specific forecasts then ‘rule of thumb’ is Volatility × IC × Score. This clarifies Grinold and Kahn's rather ambiguous statement ‘Forecasts have the form volatility × IC × Score. Sometimes this is simply proportional to IC × cross-sectional score.’ It might be possible to think of examples where it is not immediately apparent whether the forecast is better interpreted as a set of company specific forecasts or as a factor forecast. In Section 7 we demonstrate how the decomposition in equation (64) can be used to analyse a set of forecasts for their implied factor views. This could be then be used to determine whether the set of forecasts is principally a set of company specific forecasts or a number of factor forecasts. This is the advantage of our approach. It provides a rigorous way of combining all types of forecasts, be they portfolio, factor or company specific. One can combine the bottom up Grinold and Kahn approach with the more top down asset allocation approach of Black and Litterman. One does not need to make assumptions about the structure of the forecasts; such as there is one corresponding to every asset. If some assets are not covered, our approach estimates an implied return based on past correlations. Further, as we shall see it is also possible to audit these forecasts to check for mutual consistency.

20

5. Calibration of the model In this section we shall discuss calibration of the forecasting model. In Section 2 we derived the forecasting equation Error! Reference source not found.. This equation is expressed in terms of the return covariance matrix, V, the portfolio composition matrix, P, the forecast error covariance matrix, Ω and the forecastability constant τ2. Of these the first is estimated from data, the second is determined by the forecaster, leaving only the final two to calibrate. We shall assume that the forecast errors are uncorrelated. This implies that the forecast error covariance matrix, Ω , is diagonal and allows us to consider each forecast separately. Lets first consider the single portfolio forecast, gt +1,i = Pi µt++1 + ηt +1,i

where Std (ηt +1,i ) = Ωii1/ 2

(65)

where Pi denotes the ith row of P. Given a time history of these forecasts, we can observe two statistics. The first is the ratio of the portfolio returns to the volatility of the forecasts,

κ=

Std ( g t +1,i ) Std ( Pi rt +1 )

=

Std ( g t +1,i )

(

′ PVP i i

(66)

)

1/ 2

and the second is the Information or correlation coefficient of the forecasts IC =

Cov( gt +1,i , Pi rt +1 )

(67)

Std ( Pr i t +1 ) Std ( g t +1, i )

We can use these two statistics to estimate the two parameters, Ωii and τ2 where under our assumptions,

(

′ Std ( gt +1,i ) = τ 2 PVP i i + Ωii

)

1/ 2

and

′ Cov( g t +1,i , Pi rt +1 ) = τ 2 PVP i i

(68)

Straightforward substitution of Error! Reference source not found. into Error! Reference source not found. and Error! Reference source not found. and rearrangement gives, 1/ 2

τ 2 = κ ⋅ IC

and

 IC  Ωii1/ 2 = κ  1 − κ  

Std ( Pi rt +1 )

(69)

Therefore, we have expressed the two unobservable parameters as a function of observable statistics. Unfortunately, we do not always have a consistent time history of forecasts. Further, the constant τ2 is constrained to be the same for every forecast. We therefore choose to calibrate the model to average values. In Figure 2, we have plotted the distribution of Information Coefficients estimated from UBS analyst’s stock price forecasts. The analysts cover every stock in the FT European universe. They maintain continuously a price target for each stock that is defined as the expected price in 1 year’s time. At each month end over the period February 1998 to July 2003, we calculated the implied price return for each stock. If there were over 30 observations, we estimated the Information or correlation coefficient between these estimates and the realised price return over that period. Even though the periods are overlapping, this should induce no bias in the estimates; it will of course induce correlation patterns in the errors. For the

21

749 stocks that were in the universe at some point, we had sufficient data to calculate an IC for 566 of these stocks. The figure has sorted these estimates into baskets of width 0.1. The average IC is 0.18 and the median IC 0.2. Chart 2: Distribution of the ICs estimated from UBS analysts’ Stock Price Forecasts for all stocks in FT European Universe 12% % Observations per Basket

10% 8% 6% 4% 2% 0% -1

- 0 .8

- 0 .6

- 0 .4

- 0 .2

0

0 .2

0 .4

0 .6

0 .8

1

M id P o in t o f B asket Source: UBS

For the same data set, we also calculated the ratio of the volatility of the stock price return to the volatility of target price implied return, κ. The distribution of these estimates is plotted in Chart 3, with again the estimates sorted into baskets of width 0.1. This distribution is heavily right skewed, with an average of 0.65 but a median of only 0.48. Chart 3: Distribution of

κ estimated from UBS analysts’ Stock Price Forecasts for all stocks in FT European Universe

18% % of Observations per Basket

16% 14% 12% 10% 8% 6% 4% 2% 0% 0

0 .1

0 .2

0 .3

0 .4

0 .5

0 .6

0 .7

0 .8

0 .9

1

1 .1

1 .2

1 .3

M id P o in t o f B asket Source: UBS

These estimates for IC and κ give the following likely ranges for our parameters, 0.25 ≤ τ ≤ 0.35

and

Std ( Pi rt +1 ) ≤

Ωii

τ2

≤ 2 ⋅ Std ( Pi rt +1 ) .

(70)

We shall assume for the rest of the article that τ = 0.3 and adopt a default value of Ω1/ii 2 =0.45; implying that

22

Ωii / τ 2 = 1.5 × Std ( Pi rt +1 ) . If it is considered that the forecast is more accurate than average, then the

1 .4

forecast error volatility can be reduced to accurate, it can be increased to

Ωii / τ 2 = Std ( Pi rt +1 ) . In contrast if it is considered to be less

Ωii / τ 2 = 2 × Std ( Pi rt +1 ) or beyond.

The argument is identical if we consider factor or company specific forecasts. The only difference is that now the forecast error volatilities are denoted Ω1/f ,ii2 and Ωε1/,ii2 respectively.

23

6.

Auditing Forecasts

Fund managers often receive their forecasts from very different sources. It is of concern, therefore, whether these forecasts are premised on similar market expectations. If they are not, then attempts to combine ‘good’ forecasts with ‘poor’ forecasts will at best just add noise to the useful information in the forecasts or at worst remove it entirely. It is therefore imperative for the fund manager to have some way to check the consistency of his forecasts. We shall call this the audit procedure. In this section we shall discuss how to implement an audit procedure within our framework. We shall assume a fund manager has two sets of portfolio forecasts and wishes to check the consistency of these two sets. One could equally check the consistency of a set of forecasts, where the set is some combination of portfolio, factor and company specific, and another set of forecasts that are also some combination of these forecast types. The procedure is identical, but the expressions become more unwieldy. However, it should be clear from the exposition how to derive the formula for these more involved cases. Let us assume that we wish to check the compatibility of a number of portfolio forecasts. If these forecasts are written in the form of our original portfolio forecasting equation (5), g t +1 = P µt++1 + ηt +1

(71)

then the portfolio matrix P will have at least two rows. We can therefore divide up these forecasts into two subsets and rewrite the equation accordingly as  g1,t +1   P1  + η1,t +1   g  =   µt +1 + η   2,t +1   P2   2,t +1 

where

 η1,t +1    Ω1 Var     =    η 2,t +1    0

0 Ω2 

(72)

where there are p1 forecasts in the first set and p2 in the second. Without loss of generality, we shall assume p2 ≤ p1.; obviously the forecasts can always be reordered so that this condition is satisfied. The errors in the two sets of forecasts are uncorrelated; however this is done for notational ease only. Under our assumptions 1-3 the joint distribution of the forecasts is  0  τ 2 P1VP1′ + Ω1  g1,t +1  τ 2 P1VP2′   , ~ N   g   0   τ 2 P VP ′ τ 2 P2VP2′ + Ω2    2,t +1   2 1 

(73)

To simplify the expressions later, we denote the covariance matrix of the forecasts as Λ and partition it conformably,  Λ11  Λ′  12

Λ1 2  τ 2 P1VP1′ + Ω1 τ 2 P1VP2′  =   2 2 Λ22   τ P2VP1′ τ P2VP2′ + Ω2 

(74)

Finally we make the weak assumption that the submatrices Λ1 and Λ2 are invertible. This is equivalent to assuming that no subset of the forecasts can be expressed as a linear combination of the others We are now in a position to audit the forecasts. The second set of forecasts is assumed to be distributed normally, g2,t +1 ~ N ( 0, Λ22 ) . We can therefore test immediately whether the magnitudes of the forecasts are unusually large by looking at the following statistic

24

-1 g2′ Λ22 g2 ~ χ 2 ( p2 )

(75)

This statistic is a weighted sum of the squares of the forecasts, with the weight being the inverse of the unconditional covariance matrix; thus if a forecast is expected to have a higher variance it is given a smaller weighting in the statistic. Given that we have assumed the forecasts g 2 are normally distributed with covariance matrix Λ22, this statistic is distributed as a Chi-squared with p2 degrees of freedom. We shall refer to this statistic as the measure of the unconditional size of the forecasts. It can be used to test whether the magnitude of the forecasts are unexpectedly large. However it tells us nothing about the compatibility of the two sets of forecasts. Figure 4: Auditing Forecasts

5

0.25 E(g 2 |g 1 ) 0.2

4.5 4

E(g 2 )

3.5 3

Probability

0.15

2.5 0.1

2 1.5

0.05

1 0.5

0

0 -6

-5

-4

-3

-2

-1

2 Unconditional Distribribution ofgg2

0

1

2

Conditional Distribribution of gg22

3

4

5

Return Forecast

6

Relative Risk Statistic (RHS)

Source: UBS

To assess the compatibility of the forecasts, we must examine the conditional distributions; or more precisely the distribution of the second set of forecasts given the first set. We write this conditional distribution as

(

g2 | g1 ~ N Λ12′ Λ11-1 g1 , Λ22 - Λ12′ Λ11-1 Λ12

)

(76)

Therefore if the two sets of forecasts are correlated, Λ12 ≠ 0 , the information in the first set of forecasts can be used to inform expectations about the distribution of the second set of forecasts. We could then ask whether the second set of forecasts lies in the tail of this conditional distribution, using the Chi-squared statistic

( g − Λ ′ Λ g )′ ( Λ 2

12

-1 11 1

22

- Λ12′ Λ11-1 Λ12

) (g − Λ ′Λ g ) ~ χ −1

2

12

-1 11 1

2

( p2 )

(77)

This statistic is identical to the statistic in equation Error! Reference source not found. except now it is for the conditional distribution. We can therefore test whether the magnitude of a set of forecasts is unexpectedly large given the information in the first set of forecasts. We shall refer to this statistic as the measure of the conditional size of the forecasts.

25

This is a type of compatibility statistic and is closely related to Theil’s (1963) compatibility statistic. We test for compatibility of a forecast with another set of forecasts by testing whether it lies in the tail of the related conditional distribution, or equivalently whether the conditional size of the forecasts is unexpectedly large. We illustrate this in Figure 4 for the case of a single second forecast. In the figure, we have plotted the unconditional distribution of the forecast; the y-axis is the value of probability density function for different values of the forecasted return. This distribution has mean zero and a standard deviation of 2. We have also plotted the conditional distribution of this forecast given a first set of forecasts. These forecasts are assumed to be positively correlated with the second forecast. This shifts the conditional distribution to the right of the unconditional distribution; for given the information in the first set, it is more likely that the second forecast will be positive. It also reduces the variance because we now have more information. A possible example of such a set of forecasts would be forecasts for the MSCI Energy and Materials sectors. As there is strong correlation between the performance of these sectors, a positive forecast for one will make a positive forecast for the other more likely. The statistic in equation Error! Reference source not found., amounts to testing whether the second forecasts lie in the tails of the conditional distribution. Using a 95% confidence bound, this amounts to testing whether the forecasts lies within 2 standard deviations of conditional mean. As the conditional distribution has a mean of 1.5 and a variance of 1.75, the forecast need to lie between 1.5-2×1.75=-2 and 1.5+2×1.75=5. Though sensible, there are two problems with this approach. The first is that one may reject a set of forecasts as inconsistent, not because they are inconsistent but simply because they are unlikely. In our example take a forecast of 5%. It would be rejected as inconsistent, even though it is in the tail of the conditional distribution only because it is in the tail of the unconditional distribution. It could therefore be reasonably argued that this is a consistent but unusually bullish forecast. This contrasts with a forecast of -2%, which is not in the tail of the unconditional distribution but is in the tail of the conditional distribution. Such a forecast does appear to be inconsistent with the first set of forecasts. Though one might be able to work round this problem in the case of testing the consistency of a single forecast, it is not so obvious how one might do this when one is testing the consistency of many forecasts. The second problem is subtler. A compatibility statistic should be symmetric in its arguments. If forecast 2 is inconsistent with forecast 1, then that should imply that forecast 1 is inconsistent with forecast 2. Unfortunately if a forecast is in the tail of its conditional distribution with respect to another set of forecasts, this does not imply that the other set of forecasts are in the tail of the conditional distribution with respect to the original forecast. Therefore any test based only on the conditional distribution will not be symmetric. This does not mean that such a statistic is not useful; merely that it must be interpreted with caution. A better statistic for testing forecast compatibility is the relative risk statistic, RR, (Daniel 1999). In our Gaussian framework, the Relative Risk is simply a likelihood ratio statistic. If we denote the unconditional probability of observing the second set of forecasts as p(g2), and the conditional probability of observing these forecasts given a first set of forecasts as p(g2|g1), then the relative risk statistic is defined as the ratio of these probabilities, RR ( g 2 , g1 ) =

p( g 2 ) p ( g 2 g1 )

(78)

If the ratio is one, it implies that the information in the second set of forecasts makes the probability of observing the first set of forecasts no more or less likely. If the ratio is less than 1, then the second set of

26

forecasts has increased the probability of observing the first forecasts; if it is greater than 1 it has decreased the probability. This compatibility statistic is symmetric, for by Bayes’ Theorem RR( g 2 , g1 ) =

p( g 2 ) p( g 2 ) p ( g1 ) = = = RR( g1 , g 2 ) p ( g 2 g1 ) p ( g1 g 2 ) p ( g 2 ) p( g1 g 2 ) p ( g1 )

(79)

Thus if forecasts g1 are less likely given forecasts g2, then equally forecasts g2 are less likely given forecasts g1.. Thus we can infer inconsistency of the two sets of forecasts from high values of the relative risk statistic, and consistency from low values. To illustrate again using Figure 2. In this Figure, we have plotted the Relative Risk statistic for different returns forecasts, though the scale is now on the right hand axis. Now the forecast of 5% is associated with a very low value of the statistic, but –2% is associated with a very high value. Therefore this statistic clearly differentiates between these two cases. Further, from equation Error! Reference source not found., it is also symmetric. It therefore circumvents the two problems of the earlier compatibility statistic. It is, however, very close related. Given our assumptions, the Relative Risk statistic is equal to RR ( g 2 , g1 ) =

det1/ 2 ( Λ22 - Λ12′ Λ11-1 Λ12 ) det1/ 2 ( Λ22 )

-1 1  1  ′ exp  − g 2′ Λ22−1 g 2 + g 2 - Λ12′ Λ11−1 g1 ( Λ22 - Λ12′ Λ11-1 Λ12 ) g 2 - Λ12′ Λ11−1 g1  2 2  

(

)

(

)

Take logs and rearrange 2 log RR ( g 2 , g1 ) − log

(g

2

det ( Λ22 - Λ12′ Λ11-1 Λ12 ) det ( Λ22 )

- Λ12′ Λ11−1 g1

)′ ( Λ

22

- Λ12′ Λ11-1 Λ12 )

= -1

(80)

(g

2

)

- Λ12′ Λ11−1 g1 − g 2′ Λ22−1 g 2

Therefore the relative risk statistic is equal to the difference between our two earlier statistics in equations Error! Reference source not found. and Error! Reference source not found.. It is simple transform of the difference between the conditional and the unconditional size of the forecasts. If the conditional size is relatively large compared to the unconditional size then the relative risk statistic will be greater than 1, and if it is relatively smaller then the relative risk is less than 1. The next theorem establishes the distribution of the statistic. Theorem 1: Define the likelihood ratio statistic as

(

LR = g 2 - Λ12′ Λ11−1 g1

)′ ( Λ

22

- Λ12′ Λ11-1 Λ12 )

-1

(g

2

)

−1 - Λ12′ Λ11−1 g1 − g 2′ Λ22 g2

(81)

then given the forecasts are distributed as in equation Error! Reference source not found., this statistic will distributed as the weighted sum of 2p2 independent Chi-squared distributions. The weights are the eigenvalues of the p2 by p2 matrix −I p  2  I p2

27

Λ22−1 ( Λ22 - Λ12′ Λ11-1 Λ12 )   I p2 

(82)

These eigenvalues are real and are symmetrically distributed about zero, i.e. if λ is an eigenvalue then so is -λ. To clarify the procedure for estimating the confidence bounds for this statistic. First calculate the eigenvalues of the matrix in Error! Reference source not found.. If we denote the positive eigenvalues of this matrix as λ1, λ2, …λp2, then the likelihood ratio statistic has a distribution equal to the statistic i =1

i =1

∑λ u − ∑λ u i i

p2

p2

(83)

i i + p2

where ui for i=1,2,…2p2 are independent Chi-squared random variables. The confidence bounds for this statistic can be estimated using the standard procedures of Imhof (1961) and Pan (1964). We illustrate the use of the statistic with the following analytical example.

A worked example illustrating the use of the Relative Risk Statistic Assume we observe 2 single portfolio forecasts, that are jointly distributed as  0  1  g1   g  ~ N  0 ,  ρ  2   

ρ

(84)

 1  

Assume we observe the value of the two forecasts in any period, and we wish to test whether the forecasts are compatible. Firstly we can calculate the relative risk statistic as  1  1/ 2π exp  − g12  2   RR ( g 2 , g1 ) =  1 ( g1 − ρ g 2 )2 1/ 2π (1 − ρ 2 ) exp  −  2 1− ρ 2 

(85)

   

This statistic is a just a log transform of the associated likelihood ratio statistic 2ln RR( g1 g 2 ) + ln (1 − ρ 2 ) = LR = − g12 +

( g1 − ρ g 2 )

2

(86)

1− ρ 2

In this very simple example, we can give a neat interpretation to this likelihood ratio. First write the statistic in matrix form as 0   −1  g1 ′   g1  LR =  1    0 g ρ g g − ρ g 2  − 2  1 1 − ρ 2   1 

(87)

which can be rewritten in a normalised form as  0   −1  ′  g1    g1    LR =  1   =  g1 − ρ g 2   0 1 − ρ 2   g1 − ρ g 2      

28

g1 + g 2 ′  2(1 + ρ )   − ρ g1 − g 2   0  2(1 − ρ ) 

  0  ρ    

g1 + g 2   2(1 + ρ )  g1 − g 2   2(1 − ρ ) 

(88)

It is the normalised form because the two forecasts have been transformed to two other related forecasts which are uncorrelated with unit variance. That is   Var      

g1 + g 2    2(1 + ρ )   1 0  = g1 − g 2   0 1    2(1 − ρ )  

(89)

The transformed forecasts are the sum of and the difference between the two original forecasts scaled by their respective standard deviations. Therefore the likelihood ratio statistic has been rewritten as the weighted sum of the square of two uncorrelated forecasts. The weights, as proved in Theorem 1, are symmetrical about zero and in this case are equal to ±ρ. The relative risk statistic, therefore tests the compatibility of the forecasts by testing whether the difference between the forecasts is of the right order given the sum of the forecasts. If the difference is much less than would be expected the statistic is less than 1, if it is greater than would be expected it is greater than 1. It is worth making the following points about this example; The statistic is symmetrical in the two forecasts, g1 and g2; therefore if g1 is incompatible with g2 then g2 is incompatible with g1 It tests the difference between the forecasts against the sum of the forecasts. Therefore, even if the unconditional size of the forecasts is large, the forecasts will not necessarily fail the relative risk compatibility test. It will only fail if the difference between forecasts is large relative to the unconditional size.

29

7. Decomposing forecasts into Company specific and factor forecasts Analysts make a large number of single stock forecasts, whereas Strategists make a smaller number of market or sector level forecasts. Hence analysts are principally forecasting the stock specific component of stock returns, and strategists the factor return component. However, implicit within the analyst forecasts there is also a factor position, and, to a lesser extent, within the strategist forecast there is a stock specific component. It is important that the implicit position of the analysts is consistent with the strategist’s forecasts (and visa versa), if these forecasts are to be combined reliably. In the previous section, we derived a set of compatibility statistics for checking the mutual consistency of the forecasts. However, one may also wish to analyse or quantify these implicit positions. In this section, we shall look at the problem of how to decompose a set of forecasts into their implied factor and stock specific positions. We referred to this problem in Section 2, where we derived a basic decomposition in equation Error! Reference source not found.. This equation decomposed a complete set of forecasts into their implied factor and stock specific components. In this section we shall examine the related problem. How to use a set of top-down forecasts, which are predominantly factor forecasts, to effect an improved decomposition of a set of analyst forecasts? The fund manager can then analyse this decomposition. If the implicit factor positions in the bottom up forecasts are consistent with those in his top down forecasts, he can relax in the knowledge that his forecasts are consistent. However, if the positions are different, the fund manager has two choices. Either to incorporate these positions in his final forecasts, or to remove them and only use the company specific information in the bottom-up forecasts. We will show later how this can be done. In a symmetrical manner, one can of course use the bottom-up forecasts to inform the decomposition of the top-down forecasts; however, we do not envisage that this will be particularly useful. To detail the decomposition, we assume as in the previous section, that there are two sets of portfolio forecasts. Again the procedure can be generalised easily to incorporate the other forecast types, but the expressions become more complicated. In line with our assumptions in section 2, we can write down the joint distribution of the forecast set with both the stock factor and company specific returns,  0  F  f t +1  0 τ 2 FB ′P1′ τ 2 FB′P2′       ε  2 D τ DP1′ τ 2 DP2′    t +1  ~ N  0  ,  0  0  τ 2 P BF τ 2 P D τ 2 P VP ′ + Ω  g1,t +1  τ 2 P1VP2′   1 1 1 1    2 1    2  τ 2 P2VP1′ τ 2 P2VP2′ + Ω2    g2,t +1   0  τ P2 BF τ P2 D

(90)

We shall associate the first set of forecasts with a set of bottom-up analyst forecasts, and the second set with a set of top-down forecasts. Now rather than calculate the expected asset returns given both sets of forecasts in a single step, we perform the calculation stepwise. We first calculate the conditional distribution given the second set of forecasts (the top-down forecasts). The mean of this conditional distribution is  f   τ 2 FB ′   t +1    2  −1 E   εt +1  g 2,t +1  =  τ D  P2′ (τ 2 P2VP2′ + Ω2 ) g 2,t   g1,t +1   τ 2 P1V     

30

(91)

and the covariance is   f t +1    F τ 2 FB ′′ 0 τ 2 FB ′P1′  τ 2 FB ′   1 −        D τ 2 DP1′  -  τ 2 D  P2′ (τ 2 P2VP2′ + Ω2 ) P2  τ 2 D  (92) Var   εt +1  g 2,t +1  =  0   g   τ 2 P BF τ 2 P D τ 2 PVP ′+ Ω  τ 2 PV  τ 2 PV  1 1 1 1  1   1    1,t +1    1

If the second set of forecasts are top-down forecasts, and therefore principally factor forecasts, then in equation Error! Reference source not found. we are removing from the first set of forecasts the implied factor positions in the second set. Now if the two set of forecasts are consistent, in that the implicit factor positions in both sets are similar, then the majority of the remaining information in the first set of forecasts should be company specific. We can test this by examining the difference between the implied asset factor positions after conditioning on just the second set of forecasts and after conditioning on both sets of forecasts. In the Appendix, we derive the following expression for the difference between these conditioned means,

(

  f  g1,t +1 ,   f   τ 2 FB ′  + Ω E   t +1  − E   t +1  g 2,t +1  =  2  P1′ τ 2 P1′VP  1 1   εt +1  g 2,t +1    εt +1   τ D      

)

−1

g1,t +1

(93)

where

(

′ 2 ′ g1,t +1 = g1,t +1 − τ 2 PVP g 2,t 1 2 (τ P2VP2 + Ω2 ) −1

(

P1 = P1 I + τ 2VP2′ Ω2−1 P2

)

)

−1

(94)

V = V + τ 2VP2′ Ω2−1 P2V

The expression has been written so as to draw out the similarities between itself and the earlier equation Error! Reference source not found. for the conditional means after conditioning on the first set of forecasts. By conditioning the second set of forecasts on the first set, we remove the implied factor returns from the second set of forecasts, and adjust both the portfolio and covariance matrices for the reduction in covariance in equation Error! Reference source not found.. However, if the coefficient τ2 is small, the adjustment is not likely to be large. The first line in equation Error! Reference source not found. gives the change in the implied factor positions due to the second set of forecasts. If both set of forecasts are consistent and the first set of forecasts express the expected factor returns, then one might expect these changes to be small. If they are not, then the fund manager needs to decide whether to accept these changes. If he is unaware of any good reasons why the bottom-up forecast should contain new factor information, he might decide to ignore this information. This could be done by simply setting these returns to zero. In this case the expected returns conditioned on the adjusted set of forecast will be equal to,

(

  f  g1,t +1 ,   f    0   + Ω = E   t +1  g1,t +1  +  2  P1′ τ 2 P1′VP E   t +1  1 1   εt +1  g 2,t +1    εt +1   τ D     

31

)

−1

g1,t +1

(95)

8. Conclusions In this paper, we have developed a coherent and rigorous framework for processing and auditing forecasts. It has focussed particularly on the problem of combining top-down forecasts, be they sector or country calls, with bottom up stock recommendations and checking the mutual consistency of these forecasts. Our framework has been presented as a fusion of the popular the Black and Litterman (1992) global asset allocation model, and the more micro level approach of Grinold and Kahn (2000). We implicitly make similar assumptions about the covariance of forecasts as Black and Litterman, but use the more classical approach of Grinold and Kahn. This combines the flexibility of Grinold and Kahn with the tractability of Black and Litterman. We have explored various extensions suggested by this fusion. These extensions are concerned with checking the mutual consistency of a set of forecasts. First we developed the relative risk statistic, where a statistic greater than one indicated possible incompatibility of the forecasts. We derived confidence bounds for this statistic too. Secondly we focused explicitly on the mutual consistency between a set of stock level forecasts and a set of top down forecasts. We used a set of top down forecasts to decompose a set of bottom up forecasts into their implied factor and company specific returns. The implied factor returns could then be compared directly with those implied by the top down forecasts.

32

Appendix Proof of Lemma 1: We first show that Assumption 2 implies that the return innovation and the forecasts errors are uncorrelated. If µt++1 is the best estimate of returns given the information set I t+ , then the return innovation rt +1 − µt +1 − µt++1 = θ t +1 − µt++1 must be uncorrelated with any information in I t+ . Clearly the forecasts gt +1 is in this information set so that

(

)

(

)

E (θ t +1 − µt++1 ) gt′+1 I t+ = E  (θ t +1 − µt++1 ) µt++1 P '+ ηt′ I t+  = 0  

(96)

By definition this implies that

(

)

E (θt +1 − µt++1 )ηt′ I t+ = 0

(

(97)

)

Therefore using Assumption 2 that E µt++1ηt′ I t+ = 0 implies E (θ t +1ηt′ I t+ ) = 0 . We can now write down the joint distribution of the forecasts and the return stock innovations given Assumption 1,  µ Var ( µt++1 ) P ′   V   rt +1    t +1      I N ~ ,    t    0   PVar ( µ + ) PVar ( µ + ) P ′ + Ω     gt +1   t +1 t +1   

(98)

An application of Theorem 1.2.11 Muirhead (1982, p12) proves the result. Derivation of the Black and Litterman Forecasting Equation Error! Reference source not found.: Substituting the identity



2

PVP ′ + Ω ) = Ω −1 − Ω -1 P (τ −2V −1 + P ′Ω -1 P ) P ′Ω −1 −1

−1

(99)

into the second line of equation Error! Reference source not found., gives

(

(

)

)

E rt +1 gt +1 , I t = µt +1 + τ 2VP ' Ω −1 − Ω -1 P (τ −2V −1 + P ′Ω -1 P ) P ′Ω −1 gt +1 −1

(

)

= µt +1 + τ 2V (τ −2V −1 + P ′Ω -1 P ) − P ' Ω -1 P (τ −2V −1 + P ′Ω -1 P ) P ′Ω −1 gt +1 (100) −1

= µt +1 + (τ −2V −1 + P ′Ω -1 P ) P ′Ω −1 gt +1 −1

Finally, rewriting

µt +1 = (τ −2V −1 + P ′Ω -1 P )

−1



V −1 + P ′Ω -1 P ) µt +1

−2

and rearranging gives equation

Error! Reference source not found.. Proof of Theorem 1: Write the likelihood ratio statistic as g2  ′  − Λ22 LR =    −1  g 2 - Λ12′ Λ11 g1   0

−1

0



22

- Λ12′ Λ11-1 Λ12 )

-1

 g2    −1 ′ Λ Λ g g   2 12 11 1  

Now denote the variance of the transformed vector of forecasts as D where

33

(101)

(Λ (Λ

Λ22 g2    = D = Var     −1  g 2 - Λ12′ Λ g1   Λ - Λ′ Λ-1 Λ   ( 22 12 11 12 ) 11 

22

22

- Λ12′ Λ11-1 Λ12 )   - Λ12′ Λ11-1 Λ12 ) 

(102)

Let C be Cholesky factor of D such that CC’=D. The matrix C-1 transforms the forecast vector into a set of independent unit normal variates. We can therefore rewrite LR as −1 ′  −1  g2     − Λ22  LR =  C    C ′  −1   g 2 - Λ12′ Λ11 g1     0 

0



22

- Λ12′ Λ11-1 Λ12 )

-1

  g2    C   C −1    −1  ′ Λ Λ g g    2 12 11 1    

(103)

Therefore the statistic is the sum of weighted independent Chi-squared variates, The weights are the eigenvalues of   − Λ22−1 λ C′    0  

  − Λ22−1     -1  C = λ  ( Λ22 - Λ12′ Λ11-1 Λ12 )     0 0

   -1  D ( Λ22 - Λ12′ Λ11-1 Λ12 )   0

(104)

where λ(X) denotes the set of eigenvalues of matrix X. Substituting in the expression for D gives the first result. Now to prove the eigenvalues are real, assume λ is an eigenvalue. It therefore satisfies the equation −I   I

Λ22−1 ( Λ22 - Λ12′ Λ11-1 Λ12 )   v1   v1    = λ  I v2   v2 

(105)

where [v1’ v2’ ]’ is the associated eigenvector. Multiplying out the matrix equation we get - Λ22 v1 + ( Λ22 - Λ12′ Λ11-1 Λ12 ) v2 = λ Λ22 v1

(106)

v1 + v2 = λ v2

These equations imply all of the following equations (where we have used the fact that the matrices Λ22 and ( Λ22 - Λ12′ Λ11-1 Λ12 ) are real symmetric and have denoted the complex conjugate by *) (a ) - v1* Λ22 v1 + v1* ( Λ22 - Λ12′ Λ11-1 Λ12 ) v2 = λ v1* Λ22 v1

(b) - v1* Λ22 v1 + v2* ( Λ22 - Λ12′ Λ11-1 Λ12 ) v1 = λ *v1* Λ22 v1 (c) v2* ( Λ22 - Λ12′ Λ11-1 Λ12 ) v1 + v2* ( Λ22 - Λ12′ Λ11-1 Λ12 ) v2 = λ v2* ( Λ22 - Λ12′ Λ11-1 Λ12 ) v2

(107)

(d ) v1* ( Λ22 - Λ12′ Λ11-1 Λ12 ) v2 + v2* ( Λ22 - Λ12′ Λ11-1 Λ12 ) v2 = λ *v2* ( Λ22 - Λ12′ Λ11-1 Λ12 ) v2

Now equation (b)-(a)+(d)-(c) implies that

(

0 = ( λ − λ * ) v1* Λ22 v1 + v2* ( Λ22 - Λ12′ Λ11-1 Λ12 ) v2

)

and hence λ =λ* , as the most right hand side term in brackets is strictly greater than zero. So the eigenvalues are real. To show that they are symmetric about zero, note that equation (108) implies that

34

−I   I

Λ22−1 ( Λ22 - Λ12′ Λ11-1 Λ12 )   (1 + λ ) v1   (1 + λ ) v1    = −λ   − λ 1 v ) 2 I  ( (1 − λ ) v2 

(109)

and so -λ is an eigenvalue if λ is one.

Derivation of Equation Error! Reference source not found. In equations Error! Reference source not found. and Error! Reference source not found. we state the mean and the variance of the conditional distribution of the factor, company specific and first set of forecast returns given the second set of forecasts. The derivation of equation Error! Reference source not found. comes directly from a further application of the formula for the conditional means. First we must rearrange equation Error! Reference source not found.. The (1,3) element of the block matrix on the right hand side of Error! Reference source not found. can be rewritten as (1,3) = τ 2 FB ′P1′ − τ 2 FB ′P2′ (τ 2 P2VP2′ + Ω2 ) τ 2 P2VP1′ −1

(

)

= FB ′V -1 τ 2V − τ 2VP2′ (τ 2 P2VP2′ + Ω2 ) τ 2 P2V P1′ −1

(

= FB ′V -1 τ −2V -1 + P2′Ω 2−1 P2

(

= τ 2 FB ′ I + τ 2 P2′Ω 2−1 P2V

)

)

−1

−1

(110)

P1′

P1′ = τ 2 FB ′P1′

where to go from the second to the third line we used the matrix inversion lemma (discussed in the previous section). In similar manner we can rearrange the (2,3) and the (3,3) blocks of equation Error! Reference source not found. to derive the identities

(

(2,3) = τ 2 D I + τ 2 P2′Ω 2−1 P2V

(

(

−1

P1′ = τ 2 DP1′

) P′ + Ω P ) (V + τ VP ′Ω

(3,3) = P1 τ −2V -1 + P2′Ω 2−1 P2 = τ 2 P1 I + τ 2VP2′Ω 2−1

)

−1

1

−1

2

1

2

2

−1 2

P2V

)( I + τ

2

P2′Ω 2−1 P2V

)

−1

P1′ + Ω1

(111)

  ′ + Ω = τ PVP 1 1 1 2

Equation Error! Reference source not found. now follows immediately using the Gauss Markov Theorem, restated in Theorem 1.2.11 Muirhead (1982, p12).

35

DATE

References Adler, Michael and Dumas, Bernard (1983). ‘International Portfolio Choice and Corporate Finance: A Synthesis’, Journal of Finance, 38, pp 925-984. Black, Fischer (1990). ‘Equilibrium Exchange Rate Hedging’, The Journal of Finance, Vol. 45(3), pp. 899-907. Black, F. and Litterman, R. (1992). ‘Global Portfolio Optimization’, Financial Analysts’ Journal, Sept–Oct, pp 28–43. Blake, D. and Timmermann, A. (1999). ‘Mutual Fund Performance: Evidence from the UK’, European Finance Review, 2, 1998, p57-77. Bulsing, M. and Sefton J. (2004) ‘The Hurdle of Credulity: Translating insight into alpha’ UBS April Cochrane, J. (2001). Asset Pricing, Princeton University Press. Daniel, W.D. (1999), ‘Biostatistics: A Foundation for Analysis in the Health Sciences’ , 7th Edition, John Wiley & Sons Fama, Eugene (1970). ‘Efficient Capital Markets: A Review of Theory and Empirical Work’ (in Session Topic: Stock Market Price Behavior), The Journal of Finance, Vol. 25(2), pp. 383-417. Grinblatt, Mark, and Sheridan Titman, (1993). ‘Performance measurement without benchmarks: An examination of mutual fund returns’, Journal of Business 66, 47–68. Grinold, Richard and Kahn, Ronald, (2000). Active Portfolio Management: A Quantitative approach for providing superior returns and controlling risk, 2nd Edition, McGraw-Hill, New York. Grinblatt, Mark, and Sheridan Titman, (1989). ‘Mutual fund performance: An analysis of quarterly portfolio holdings’, Journal of Business 62, 394–415. Grinblatt, Mark, and Sheridan Titman, (1992). ‘The persistence of mutual fund performance’, Journal of Finance 47, 1977–1984. Imhof, J. P. (1961) ‘Computing the distribution of quadratic forms in Normal variables’, Biometrika 48 419–426 Lütkepohl, H. (1996) ‘Handbook of Matrices’, John Wiley Litterman, B. (2003) ‘Modern Investment Management’, John Wiley Muirhead (1982). Aspects of Multivariate Statistical Theory. J Wiley and Sons

36

DATE

Pan Jie–Jian (1964) ‘Distributions of the noncircular serial correlation coefficients’, Shuxue Jinzhan 7 328–337 Satchell, S. and Scowcroft, A. (2000). ‘A demystification of the BlackLitterman model: Managing quantitative and traditional portfolio construction’, Journal of Asset Management, 1,2 pp. 138-150 Theil, H. and Goldberger, S. A. (1961). ‘On Pure and Mixed Statistical Estimation in Economics’, International Economic Review, 2, 65-78 Theil, H. (1963). ‘On the use of incomplete prior information in regression analysis.’ Journal of the American Statistical Association 58, 401-414. Theil, H. (1971) ‘Principles of Econometrics’, Wiley, New York Tonks, I. (2002). ‘Performance Persistence of Pension Fund Managers’, Financial Markets Group, Discussion Paper 423. Wermers, R. (2003). ‘Is Money Really .Smart.? New Evidence on the Relation Between Mutual Fund Flows, Manager Behavior, and Performance Persistence’ Robert H. Smith School of Business Working Paper. Wermers, R. (2000). ‘Mutual Fund Performance: An Empirical Decomposition into Stock-Picking Talent, Style, Transactions Costs, and Expenses’, Journal of Finance, 55(4). Wermers, R., (1997). ‘Momentum investment strategies of mutual funds, performance persistence, and survivorship bias’, Working paper, University of Colorado.

37