Asset Pricing with Conditioning Information: A New Test - CiteSeerX

2 downloads 246 Views 330KB Size Report
a growing literature on testing conditional asset pricing models.1 ..... Dybvig and Ross (1985), shows that security-specific information can be ignored. One can.
Asset Pricing with Conditioning Information: A New Test

Kevin Q. Wang Rotman School of Management University of Toronto 105 St. George Street Toronto, ON M5S 3E6, Canada [email protected] October 1999

¤ I would like to thank Ravi Bansal, Phelim Boyle, Michael Brandt, Xiaohong Chen, Philip Dybvig, Burton Holli¯eld, David Hsieh, Kris Jacobs, Ravi Jagannathan, Ekaterini Kyriazidou, Kai Li, Tom McCurdy, Robert McCulloch, Ruey Tsay, Guofu Zhou, seminar participants at McGill University, University of Chicago, University of North Carolina, University of Toronto, University of Waterloo, Washington University in Saint Louis, participants at the 1998 Northern Finance Association meetings in Toronto and the 1999 American Finance Association meetings in New York, and particularly Yacine AÄit-Sahalia, John Cochrane, George Constantinides, Lars Hansen, and Raymond Kan for many helpful comments and suggestions. I am grateful to Eugene Fama for providing me with data. Comments from Ren¶e Stulz (the editor) and an anonymous referee have signi¯cantly improved the article. I am responsible for any remaining errors.

Asset Pricing with Conditioning Information: A New Test

Abstract

This paper develops a new approach to testing dynamic linear factor models, which aims at time-variation in Jensen's alphas while using a nonparametric pricing kernel to incorporate conditioning information. In application we ¯nd that the conditional CAPM performs substantially better than the static CAPM, but still it is statistically rejected. The conditional CAPM exhibits time-varying Jensen's alphas that have a strong size pattern in volatility and a clear book-to-market pattern in time-series average. These features are well captured by a simple dynamic version of the Fama and French (1993) three factor model. Moreover, even with portfolios formed on past returns we can not reject this model.

Recent research has documented evidence of time-variation in expected returns, return volatilities, and betas of ¯nancial assets. Meanwhile, researchers have identi¯ed a number of anomalies against the unconditional version of the Capital Asset Pricing Model (CAPM) of Sharpe (1964) and Lintner (1965). The evidence against this constant beta model is so forceful that some argue that the CAPM is dead (see Fama and French (1992, 1996b)). However, as Dybvig and Ross (1985), and Hansen and Richard (1987) show theoretically, the conditional version of the CAPM can hold perfectly when the static CAPM exhibits serious pricing errors. In general, dynamic models that allow for time-varying betas and risk premia can perform substantially better than static models. These ¯ndings have motivated a growing literature on testing conditional asset pricing models.1 A challenging issue arises when evaluating dynamic linear factor models. Asset pricing theory does not specify how betas, risk premia, and the pricing kernel (i.e., stochastic discount factor) vary with state variables that represent conditioning information. To construct tests, previous studies typically assume statistical models relating betas, or the pricing kernel, or certain moments of asset returns to state variables. Consequently, their empirical results can be shaped by the modelling assumptions. Ghysels (1998) discusses the problem in details and argues that e®ects of misspeci¯ed auxiliary models on inference and estimation are of ¯rst-order importance. In demonstration, he examines several time-varying beta models that have been proposed in the literature. He shows that these time-varying beta models are seriously misspeci¯ed such that they give rise to even larger pricing errors than constant beta models in many cases.2 In this paper we propose a new testing approach that completely avoids speci¯cation of time-varying betas. We focus on the pricing kernel and use a °exible nonparametric approach to incorporate conditioning information into the testing. Our approach is constructed in two steps. First, we utilize the fact that the conditional CAPM implies a pricing kernel which is determined by the ¯rst two conditional moments of the market return. Using standard kernel regression estimates of the two moments, we obtain a nonparametric estimate of the 1

A partial list includes Ferson, Kandel, and Stambaugh (1987), Bollerslev, Engle, and Wooldridge (1988), Harvey (1989), Shanken (1990), Carhart, Krail, Steven, and Welch (1995), Cochrane (1996), He, Kan, Ng, and Zhang (1996), Jagannathan and Wang (1996), Ferson and Siegel (1998), and Ferson and Harvey (1999). 2 Harvey (1991), and He, Kan, Ng, and Zhang (1996) have also noted the problem. More recently, Brandt (1999) has stressed importance of this issue in estimation of portfolio and consumption choice.

1

pricing kernel. Next, we use a regression approach to test if excess returns discounted by the nonparametric pricing kernel are predictable. This design not only circumvents misspeci¯cation of the pricing kernel, but also provides a simple way to look into time-variation in Jensen's alphas.3 To test for signi¯cance of additional factors, we construct an approach for multifactor models by combining the nonparametric test with Hansen's (1982) generalized method of moments (GMM). In empirical application of the new methodology, we ¯nd that the conditional SharpeLintner CAPM performs substantially better than the static version, but still it is statistically rejected. The conditional CAPM pricing errors, or the Jensen's alphas, exhibit interesting patterns. On one hand, they are positively correlated with the stock market. The pricing errors have a strong size pattern in volatility but not in time-series average. To exploit this dynamic size e®ect, one can generate abnormal returns using simple strategies that increase holdings of small stocks and reduce holdings of large stocks when the stock market goes up, and vice versa. On the other hand, these pricing errors have a clear book-to-market pattern in time-series average but not in volatility. We ¯nd that these prominent features of the conditional CAPM are well captured by a simple version of the conditional Fama and French (FF 1993) three factor model. More surprisingly, we can not reject this conditional FF model even with momentum portfolios. The FF (1993) three factor model has become a focal point in empirical asset pricing. The empirical success of the model has received multiple competing interpretations (see FF (1996a) and MacKinlay (1995)). While the issue is far from closed, many begin to use the FF model in various applications. However, recent tests of the conditional FF model by He, Kan, Ng, and Zhang (1996), Ferson and Siegel (1998), and Ferson and Harvey (1999) cast doubts on empirical performance of the FF model. The tests strongly reject conditional versions of the FF model and generate the impression that the conditional FF model fails miserably at dynamics of asset returns. These test results seem surprising, as one would expect that by allowing for time-varying betas and risk premia, dynamic versions of the FF model should perform even better than the unconditional version. One explanation is that the Ghysels' (1998) critique applies here, i.e., 3

Jensen's alpha is a popular metric of mispricing by factor-based asset pricing models. Generally we refer to it as the di®erence between expected return implied by the data and the value implied by the asset pricing model. In conditional asset pricing, the alphas are time-varying.

2

the results against the FF model may arise from serious misspeci¯cation of the betas or the pricing kernel, and etc. Consistent with this argument, our more °exible nonparametric tests give rise to a totally di®erent conclusion. We ¯nd that the conditional FF model performs well. It captures the most signi¯cant features of deviations from the conditional CAPM and it signi¯cantly outperforms the unconditional FF model. Our testing approach has an important feature. We show that although nonparametric estimates are used in construction, the regression coe±cient estimates underlying our test p converge at the standard parametric rate ( N-convergence), regardless of the number of state variables.4 This fast convergence property is pleasant as it provides hope that our test may to some extent escape from the \curse of dimensionality" that plagues applications of nonparametric methods.5 In other words, there is a theoretical reason why the test may have good power in ¯nite sample applications. Indeed, the test works well in our Monte Carlo experiments. In the simulations, we compare it with a standard GMM approach that uses a parametric pricing kernel. Our test outperforms the parametric GMM approach even when the pricing kernel is correctly speci¯ed! In the empirical asset pricing literature, Bansal and Viswanathan (1993), and Bansal, Hsieh, and Viswanathan (1993) have advocated the idea of using a °exible pricing kernel. They propose a series expansion GMM approach to testing nonlinear APT models. In this approach, one ¯rst approximates the stochastic discount factor with a series expansion, such as a long polynomial, and then employs GMM for estimation and testing. Chapman (1997) extends this approach to evaluating consumption-based models. While the target and empirical results of these papers can be easily contrasted with ours, it is a challenging task to e®ectively compare the methodologies, since large sample and ¯nite sample properties of this series expansion GMM approach remain largely unknown. This paper is organized as follows. Section I details the testing approach and presents analytical results for the test statistic and the underlying estimator. In Section I we also 4 On implementing the regression testing idea, this paper is related to Powell, Stock, and Stoker (1989), Robinson (1989), Yoshihara (1990), Lee (1992), and Khashimov (1993). The regression estimator underlying our testing approach is similar to the estimatiors of Powell et al and Lee. We derive the limiting distribution of this estimator using an extension of classical U -statistic theorems to ¯-mixing processes. Similar extensions have been obtained by Robinson, Yoshihara, and Khashimov. 5 It is well known that nonparamentric estimates typically converge at slower rates than parametric estimates, and the problem is more serious for higher dimensional applications (i.e. a larger number of state variables in our context). This is known as the curse of dimensionality (e.g., see Silverman (1986)).

3

discuss selection of state variables and present a test for conditional multifactor models. Section II provides simulation evidence. Section III presents empirical results. Section IV concludes with a summary. The Appendix contains technical assumptions and proofs.

I. Econometric Approach A. Basic Idea We use a nonparametric discount factor to incorporate conditioning information into the testing of dynamic asset pricing models. Our econometric approach is based on a regression testing idea, which we carry out through a weighted least squares estimator. This section presents the basic idea in details We consider a framework in which there is a conditionally riskless asset. Let rp;t+1 be the return on a benchmark portfolio p in excess of the riskless rate, and ri;t+1 be the excess return on the i-th test asset for i = 1; ¢ ¢ ¢ ; n. Let xt be a k £ 1 vector of state variables such that

E(rp;t+1 jIt ) = E(rp;t+1 jxt )

(1)

2 2 E(rp;t+1 jIt ) = E(rp;t+1 jxt )

(2)

where It is the time-t information set of investors.6 The excess returns and the state variables are assumed to be strictly stationary. We present our approach as a test of conditional mean-variance e±ciency of a given portfolio. If the benchmark portfolio p is conditionally mean-variance e±cient, then E(ri;t+1 jIt ) = E(rp;t+1 jIt )

cov(ri;t+1 ; rp;t+1 jIt ) ; var(rp;t+1 jIt )

(3)

E(ri;t+1 rp;t+1 jIt ) ; 2 E(rp;t+1 jIt )

(4)

or equivalently E(ri;t+1 jIt ) = E(rp;t+1 jIt )

for i = 1; ¢ ¢ ¢ ; n. The covariance representation (3) is the familiar beta-pricing equation. Our test aims at equation (4), the `cross moment' representation.7 6

Note that (1) and (2) are only for the benchmark portfolio p. They are not requiring xt to be a full characterization of the information set It , but they are su±cient for developing the nonparametric test. 7 Conditional Jensen's alphas from (4) are smaller in absolute value than those from (3). The ratio of

4

2 Let gp (xt ) = E(rp;t+1 jxt ), gpp (xt ) = E(rp;t+1 jxt ), and b(xt ) = gp (xt )=gpp (xt ). Given that

(1) and (2) hold, conditional Jensen's alphas from (4) can be expressed as E(ri;t+1 jIt ) ¡ E(rp;t+1 jIt )

E(ri;t+1 rp;t+1 jIt ) = E(mt+1 ri;t+1 jIt ); 2 E(rp;t+1 jIt )

where mt+1 = 1 ¡ b(xt )rp;t+1 . Thus (4) is equivalent to E(mt+1 ri;t+1 jIt ) = 0:

(5)

Let ei;t+1 = mt+1 ri;t+1 (i.e., discounted excess returns) and zt be a q £1 vector of observed

variables in It . If we could observe the discount factor mt+1 , a natural way to test the condition E(ei;t+1 jIt ) = 0 would be a regression approach: regress ei;t+1 on zt and test if the regression coe±cients are zero. This is because the following regression equations ei;t+1 = zt0 ±i + ui;t+1 ;

(6)

where E(ui;t+1 jIt ) = 0, for i = 1; ¢ ¢ ¢ ; n, are always consistent with (5). Obviously, the moment condition (5) implies that the regression equations in (6) hold, and the coe±cients satisfy ± = 0, where ± = (±10 ±20 ¢ ¢ ¢ ±n0 )0 .

To implement this simple idea, we replace mt+1 with a nonparametric discount factor

m ^ t+1 , and estimate the parameter vector ±i by ±^i =

Ã

N 1 X w^t zt zt0 N t=1

!¡1 Ã

!

N 1 X w^t zt e^i;t+1 ; N t=1

(7)

^ t+1 ri;t+1 and for i = 1; ¢ ¢ ¢ ; n, where e^i;t+1 = m m ^ t+1 = 1 ¡ ^b(xt )rp;t+1 ^ t )^ with ^b(x) = g^p (x)=^ gpp (x). The weighting function is set to be w^t = f(x gpp (xt ). Here f^, g^p , and g^pp are kernel estimators de¯ned below f^(x) = N ¡1 h¡k

N X

K(

s=1

x ¡ xs ); h

the errors from (3) and (4) is one plus squared conditional Sharpe ratio. A nonparametric regression test that targets (3) is considered in a previous version. It is omitted since the test based on (4) is simpler and performs better in simulations.

5

^ ¡1 g^p (x) = N ¡1 h¡k f(x)

N X

K(

s=1 N X ¡1

g^pp (x) = N ¡1 h¡k f^(x)

x ¡ xs )rp;s+1 ; h

K(

s=1

x ¡ xs 2 )rp;s+1 : h

(8)

The nonparametric estimators in (8) are standard. f^(x) is the Rosenblatt-Parzen kernel density estimator, with kernel function K(¢) and bandwidth parameter h. g^p (x) and g^pp (x) are the Nadaraya-Watson kernel regression function estimators. The weighting function is chosen purely for a technical constraint. Because of this choice, both N ¡1

PN

w^t zt zt0 and N ¡1

PN

w^t zt e^i;t+1 can be expressed as second order generalized U -statistics, making it straightforward to analyze large sample properties of ±^i .8 In contrast, t=1

t=1

rather complex technical problems arise in developing distribution theory if we use weighting functions that do not give rise to simple U -statistic structures.9 The weighting function chosen above is the simplest among those that yield U-statistic structures. In a similar spirit, Powell, Stock, and Stoker (1989) employed density-weighting to obtain an instrumental variable estimator for limited dependent variables models, Robinson (1989) provided a class of unconditional moments tests for general econometric applications, and Lee (1992) designed a density-weighted least squares estimator for heteroskedasticity testing. The test that we propose is based on the weighted least squares estimator ±^N , where ±^N = (±^10 ±^20 ¢ ¢ ¢ ±^n0 )0 : Intuitively, ±^N converges to zero if the benchmark p is conditionally mean-variance e±cient. Otherwise, the estimator converges to a nonzero limit in general (unless ei;t+1 is orthogonal to all the components of zt for i = 1; ¢ ¢ ¢ ; n). Thus we can conduct a test by checking how far ±^N

is away from zero, using asymptotic distribution theory to account for sampling errors. Note that this regression testing approach provides a simple way to look into pricing errors. By design, our test aims at detecting time-variation in Jensen's alphas. A signi¯cant component of ±^N indicates that the expected return errors are correlated with the corresponding state variable. Thus potentially the test may lead to a simple characterization of time-variation in 8

There is a large literature on U-statistics. See Arcones (1995), Khashimov (1993), Powell et al (1989), Robinson (1989), and Yoshihara (1990) for some recent contributions. 9 In particular, the choice of setting w ^t = 1 does not yield U -statistic structures. This is why it is not used here.

6

the pricing errors. Moreover, in applications we may view (6) as a model for pricing errors. That is, zt0 ±i may serve as an approximation for E(ei;t+1 jIt ), the time-varying Jensen's alphas

from (4).

B. Asymptotics and the Test Statistic This section presents an important property of our test, that is, although we use nonparametric kernel density and regression estimates in construction, the estimator ±^N behaves like a parametric estimator. It has a standard limiting multivariate normal distribution and in particular it converges at the fast parametric convergence rate.10 This property makes it straightforward to construct and implement the testing approach. The fast convergence rate is especially pleasant, as it provides a theoretical reason to expect that the test may have good power in applications. We use the following notations to present results. Let rt+1 be a vector of scaled excess 0 returns (r1;t+1 ¢ ¢ ¢ rn;t+1 )0 ­ zt and yt+1 = (x0t zt0 rp;t+1 rt+1 )0 , where `­' is the Kronecker P ^t zt zt0 , operator. Denote wt = f (xt )gpp (xt ), A = ¶n ­ E[wt zt zt0 ], and A^N = ¶n ­ N ¡1 N t=1 w

where ¶n is the n£n identity matrix. Let ± = (±10 ¢ ¢ ¢ ±n0 )0 with ±i = [E(wt zt zt0 )]¡1 E[wt zt ei;t+1 ].

De¯ne

°(yt+1 ) = ´(yt+1 ) ¡ [¶n ­ a(yt+1 )]±;

(9)

´(yt+1 ) = f (xt )[gpp (xt )rt+1 ¡ gp (xt )rp;t+1 rt+1 2 + gr (xt )rp;t+1 ¡ gpr (xt )rp;t+1 ]; 2 a(yt+1 ) = f (xt )[gpp (xt )zt zt0 + rp;t+1 gzz (xt )];

(10) (11)

where gr (xt ) = E(rt+1 jxt ), gpr (xt ) = E(rp;t+1 rt+1 jxt ), and gzz (xt ) = E(zt zt0 jxt ). In Appendix B, we show that (i) A^N converges in probability to A, (ii) the limiting p P distribution of N A^N (±^N ¡ ±) is identical to that of N ¡1=2 N t=1 °(yt+1 ), and (iii) E°(yt+1 ) = 0. Because N ¡1

PN

t=1

°(yt+1 ) is a simple average of stationary random vectors, application

of a central limit theorem thus gives the following result. 10

This result arises from the fact that the average of p some functions of slow-converging nonparametric kernel estimates can converge at the parametric rate ( N). See Hardle and Stoker (1989), Powell, Stock, and Stoker (1989), Robinson (1989), and Lee (1992) for examples about this fact.

7

Proposition 1: Given Assumptions A1¡A6 stated in Appendix A, if h ! 0, N h2k ! 1, p and Nh2k+2 ! 0, then the weighted least squares estimator ±^N is such that N(±^N ¡ ±) has a limiting multivariate normal distribution with mean 0 and variance-covariance matrix ­, where ­ = A¡1 ¡A¡1 , ¡ =

P1

¡1

¡j , and ¡j = E[°(yt+1 )°(yt+j+1 )0 ].

Proposition 1 shows that the weighted least squares estimator ±^N has the standard limp iting properties, N -consistency and asymptotic normality, of parametric estimators. This result does not rely upon (1) and (2), nor does it require that the regression equations in (6) are correctly speci¯ed. Note that the bandwidth conditions are di®erent from those for kernel density and regression function estimators. The conditions Nh2k ! 1 and N h2k+2 ! 0 place upper and lower bounds on the rate that the bandwidth h converges to 0, for ±^N to exhibit the desired asymptotic behavior. The condition N h2k+2 ! 0 is due to the use of a

kernel of order k + 1 (Assumption A4), and hence the admissible range for the rate can be relaxed using a kernel of order higher than k + 1. To estimate the covariance matrix ­, we consider ¯rst estimation of °(yt+1 ). Replacing the functions f (x), gp (x), gpp (x), gr (x), gpr (x), and gzz (x) in (10) and (11) by standard kernel estimators,11 and replacing ± in (9) by ±^N , we obtain a natural approximation for °(yt+1 ): °^N (yt+1 ) = ´^N (yt+1 ) ¡ [¶n ­ a ^N (yt+1 )]±^N ;

(12)

gpp (xt )rt+1 ¡ g^p (xt )rp;t+1 rt+1 ´^N (yt+1 ) = f^(xt )[^ 2 + g^r (xt )rp;t+1 ¡ g^pr (xt )rp;t+1 ];

2 gpp (xt )zt zt0 + rp;t+1 g^zz (xt )]; a ^N (yt+1 ) = f^(xt )[^

(13) (14)

where f^, g^p , and g^pp are de¯ned as in (8), and g^r (x) = N ¡1 h¡k f^(x)¡1

N X

K(

s=1 N X ¡1

g^pr (x) = N ¡1 h¡k f^(x)

g^zz (x) = N ¡1 h¡k f^(x)¡1

s=1 N X

s=1

x ¡ xs )rs+1 ; h

K(

x ¡ xs )rp;s+1 rs+1 ; h

K(

x ¡ xs )zs zs0 : h

Note that gzz (xt ) = zt zt0 when zt is a ¯xed transformation of xt , for example, when zt = (1 x0t )0 . In such circumstances there is no need to use the kernel estimator g^zz (xt ) in (14). Instead one can simply replace 2 ^ t )[^ gzz (xt ) by zt zt0 , which gives a ^N (yt+1 ) = f(x gpp (xt ) + rp;t+1 ]zt zt0 . 11

8

We show in Appendix B that the estimator ^ j = N ¡1 ¡

N¡j X

°^N (yt+1 )^ °N (yt+j+1 )0

t=1

is consistent for ¡j . It is also shown that given (1) and (2), ¡j = 0 for any j 6= 0 when the regression equations in (6) hold.

Thus we propose to use the test statistic 0 ^ ¡1 ^ T^± = N ±^N ­N ±N ;

(15)

^ ^¡1 ^ N = A^¡1 where ­ N ¡0 AN , for testing conditional mean-variance e±ciency of the benchmark. The following proposition gives the limiting distribution of the test statistic. Proposition 2: Let the conditions of Proposition 1 hold. (i) Given (1) and (2), if the portfolio p is conditionally mean-variance e±cient, then the test statistic T^± has a limiting chi-squared distribution with q £ n degrees of freedom. ^ j is a consistent estimator of ¡j for any ¯xed j. (ii) ¡ The regression test is constructed using a simple covariance matrix estimator. Note that ^ j ) to obtain covariance matrix estimators Proposition 2 has also provided necessary inputs (¡ that are consistent under general circumstances. Whether there exists any advantage to use such estimators in applications remains to be studied.12 A few remarks are in order. First, the limiting distribution of ±^N does not depend on 1

the scaling constant c when we set the bandwidth as h = cN ¡ 2k+1 . This property suggests that unlike kernel density and kernel regression estimates, the estimate ±^N will become less sensitive to choice of c as the sample size increases. In other words, the bandwidth choice in our approach is not as important as in kernel density and kernel regression estimation.13 The trade-o® is that we can not justify cross-validation procedures. In applications, we adopt a simple rule supported by simulations, and we ¯nd that results are not very sensitive to small 12

In application we have attempted to use a Newey-West covariance matrix estimator with a lag lengh of six, twelve, and eighteen. However, the estimates are (or so close to be) singular such that the computer cannot invert them. On the other hand, the regression residuals look quite like martingale sequences, supplying no incentive to pursue a more complex approach for estimation of the covariance matrix. 13 This is nice as it suggests that bandwidth problems due to persistency of state variables (see Prisker (1998) and Chapman and Pearson (1999)) may not be a serious issue to our approach.

9

changes in the bandwidth. Secondly, it looks like a \free lunch" that ±^N has standard limiting distribution and the parametric convergence rate, no matter how many state variables we use. However, it should be noted that there is certain cost due to nonparametric estimation, which is embedded in the covariance matrix of ±^N . In ¯nite samples, it still pays to eliminate redundant conditioning variables. Given the sample sizes available to us, we probably have to use only a small number of conditioning variables to have powerful tests. We consider this issue in next section. Finally, note that the test will not have power if we choose some vector zt that is orthogonal to ei;t+1 (i.e., E(zt ei;t+1 jxt ) = 0, i = 1; ¢ ¢ ¢ ; n).14 However, the test will have power as long as one component of zt can signi¯cantly forecast ei;t+1 , no matter

whether the regression model (6) is correctly speci¯ed or not.

C. Selection of State Variables A challenging issue in testing conditional asset pricing model is that we can not observe the complete set of conditioning information. Even if we could, a test that is constructed using a large number of state variables is likely to have poor power. In practice, we can only use a subset of the information set, typically consisting of a small number of state variables. Therefore, a practically important question is whether we can use a much smaller subset of conditioning information to develop a valid testing procedure. The result below, implied in Dybvig and Ross (1985), shows that security-speci¯c information can be ignored. One can just use a subset as long as it contains variables that characterize the ¯rst two moments of the market.15 Proposition 3: Let xt satisfy (1) and (2), and let It¤ be such that xt 2 It¤ µ It . De¯ne "i;t ´ E(ri;t+1 jIt ) ¡ E(rp;t+1 jIt ) "¤i;t

´

E(ri;t+1 jIt¤ )

¡

E(ri;t+1 rp;t+1 jIt ) ; 2 E(rp;t+1 jIt )

¤ ¤ E(ri;t+1 rp;t+1 jIt ) E(rp;t+1 jIt ) : 2 E(rp;t+1 jIt¤ )

14 In statistical term, the test is not consistent, for a given vector zt , against certain processes that generate ei;t+1 . Chen and Fan (1999) recently propose a consistent test. The trade-o® is that the limiting distribution of their test is nonstandard and thus it is complicated to implement it in simulations and applications. 15 The proof of Proposition 3 is straightforward and hence omitted.

10

Then "¤i;t = E("i;t jIt¤ ). Thus it follows that (i) if "i;t = 0, then "¤i;t = 0; (ii) E("¤i;t ) = E("i;t );

(iii) var("¤i;t ) · var("i;t ); (iv) if zt 2 It¤ , then cov(zt ; "¤i;t ) = cov(zt ; "i;t ).

Proposition 3 shows that as long as we use a conditioning information set (It¤ ) that includes variables (xt ) characterizing the ¯rst two moments of the benchmark portfolio, our testing approach based on It¤ is indeed a test of implications of the model.16 Yet there are consequences for using a subset of the information set. First, the volatility of pricing errors based on It¤ is not equal to that based on It . Instead, it is a lower bound. Second, any test based on It¤ is likely to be inconsistent, since "¤i;t can be zero while "i;t is not. For power consideration, it is sensible to use an information set that just consists of xt . A simple way to select such variables is to use linear regression methods. Next we outline a nonparametric selection test, which is a test of (1) and (2) given a candidate vector xt . This approach is more robust to possible nonlinear relations. Let zt be a q1 £ 1 vector in the information set It , where components of zt are di®erent

from those of xt . The idea of the selection test is to check if zt can forecast the residuals 2 rp;t+1 ¡ gp (xt ) and rp;t+1 ¡ gpp (xt ). If (1) holds, i.e., if E(rp;t+1 jIt ) = gp (xt ), then

rp;t+1 ¡ gp (xt ) = zt0 ¹ + ²t+1 ; with E(²t+1 jIt ) = 0, and ¹ = 0. Under certain regularity conditions, if Nh2k ! 1 and N h2k+2 ! 0, then we can show that the following estimator of ¹ ¹ ^= is such that

Ã

N 1 X ^ t )zt z 0 f(x t N t=1

!¡1 Ã

!

N 1 X f^(xt )zt [rp;t+1 ¡ g^p (xt )] N t=1

p d N (^ ¹ ¡ ¹) ! N (0; ­¹ ).

We may construct a covariance matrix estimator as in Proposition 2. Speci¯cally, we can ¡1 PN 0 ¡1 PN ^ ^ ¹ = A^¡1 ^ ^¡1 ^ ^ use ­ ^¹;t+1 °^ 0 . The ¹ ¡¹ A¹ , where A¹ = N t=1 f (xt )zt z , and ¡¹ = N t=1 ° t

¹;t+1

input °^¹;t+1 is

°^¹;t+1 = f^(xt )[zt rp;t+1 ¡ zt g^p (xt ) ¡ rp;t+1 g^z (xt ) + g^zp (xt ) ¡ g^zz (xt )^ ¹ ¡ zt zt0 ¹ ^] 16

In practice most asset pricing tests use monthly data. To predict the stock market one month ahead, it seems reasonable to assume that a small number of variables can practically summarize all the relevant information.

11

with g^zz (x) de¯ned as that for (14), and ^ ¡1 g^z (x) = N ¡1 h¡k f(x)

N X

K(

s=1 N X ¡1

g^zp (x) = N ¡1 h¡k f^(x)

s=1

x ¡ xs )zs ; h

K(

x ¡ xs )zs rp;s+1 : h

^ ¡1 These results yield a selection test statistic N ¹ ^0 ­ ^, which has a limiting Â2 (q1 ) distribution ¹ ¹ under (1). A test of (2) can be constructed similarly, just replacing rp;t+1 and rp;s+1 with 2 2 rp;t+1 and rp;s+1 , respectively.

D. Testing Signi¯cance of Additional Factors We extend our approach to testing whether it is signi¯cant to include additional factors. We consider the case that allows a parametric structure for excess returns on the benchmark portfolio p. That is, excess returns rp;t+1 (µ) are assumed to be a function of a l £1 parameter

vector µ. The hypothesis is that the benchmark is conditionally mean-variance e±cient for some parameter value µ0 . The methodology developed in this section aims at multifactor asset pricing predictions that a portfolio of several factor mimicking portfolios is on the conditional mean-variance e±ciency frontier. For example, a conditional version of the Fama and French (1993) three factor model is

such that a benchmark portfolio with time-(t + 1) excess return rp;t+1 (µ) = MKT t+1 + µ1 SMB t+1 + µ2 HMLt+1 is conditionally mean-variance e±cient, where MKT t+1 is the excess return on the Fama and French market portfolio, SMB t+1 and HMLt+1 are returns on the mimicking portfolios for the size and the book-to-market factors.17 Another example is the model of Jagannathan and Wang (1996), which can be viewed as a conditional two factor model. This model predicts that a benchmark portfolio with returns equal to a linear combination of a stock market index and labor income growth rate is conditionally mean-variance e±cient. 17

Note that this is not the most general version of the conditional FF model. Here the proportions of the size and book-to-market portfolios in the benchmark are ¯xed over time. That is, µ1 and µ2 are constants. More generally, we can let µ1 and µ2 be time-varying, as functions of state variables. In principle, we can completely avoid any parametric structure of the benchmark return. To do so, however, we need to cope with generalized U -statistics of order k + 1 for a k-factor model. This possibility is left for future research.

12

For any given value of µ, one can obtain an estimator ±^N (µ) from equations (7) and (8), ^ N (µ) denote the covariance matrix of ±^N (µ) and with rp;t+1 (µ) replacing rp;t+1 . Let ­(µ) and ­ the estimator for this matrix, respectively, de¯ned as those for Proposition 2 with rp;t+1 (µ) replacing rp;t+1 . Let µ^N be the parameter value that minimizes ^ N ±^N (µ); ±^N (µ)0 W p ^ N is regarded as ¯xed with respect to µ, W ^N ! where the weighting matrix W ­¡1 0 , and ­0 ´ ­(µ0 ). Let DN (µ) denote the l £ q matrix of partial derivatives of ±^N (µ) with respect

to the parameter vector µ: DN (µ) ´ @ ±^N (µ)0 =@µ. The estimator µ^N is assumed to satisfy the

following ¯rst-order condition

^ N ±^N (µ^N ) = 0: DN (µ^N )W

(16)

^ N ±^N (µ^N ). The following proposition gives We propose to use the test statistic N ±^N (µ^N )0 W the limiting distributions of the estimator µ^N and the test statistic under the hypothesis that the benchmark p is conditionally mean-variance e±cient.18 Proposition 4: Let the conditions of Propositions 1 and 2 hold with rp;t+1 (µ0 ) replacing rp;t+1 . Let ±^N (µ) be di®erentiable in µ and µ^N be the estimator satisfying (16) with l < q. Let p ^ N g1 be a sequence of positive de¯nite matrices such that W ^N ! fW ­¡1 0 . Suppose further N=1

p p ¤ 1 ¤ ¤ that (i) µ^N ! µ0 , and (ii) for any sequence fµN gN=1 satisfying µN ! µ0 , plimDN (µN ) =

plimDN (µ0 ) ´ D0 , with the l rows of D0 linearly independent. Then p L 0 ¡1 N(µ^N ¡ µ0 ) ¡! N (0; (D0 ­¡1 0 D0 ) );

^ N ±^N (µ^N ) has a limiting chi-squared and the conditional e±ciency test statistic N ±^N (µ^N )0 W distribution with qn ¡ l degrees of freedom. ^ N to be the inverse of the covariance matrix In applications we set the weighting matrix W ^ N (µ) and update value of µ through iteration. The weighting matrix is evaluated estimate ­ at zero (µ = 0) for the initial round estimation. Then the ¯rst stage estimate of µ is used to update the weighting matrix at the second stage. And so on. 18

Proof of this proposition is straightforward and hence omitted.

13

II. Simulation Evidence A. Returns and Conditioning Variables We use two sets of test portfolios in the simulations. The ¯rst set consists of ¯ve NYSE stock portfolios, which are the value-weighted NYSE size decile 1, 3, 5, 7, and 9 portfolios (SZ1, SZ3, SZ5, SZ7, and SZ9 for short). With this set of returns, we use the value-weighted portfolio of NYSE stocks as the market portfolio. In the second set are ¯ve of the Fama and French (FF 1993) size and book-to-market portfolios which have the size-BE/ME quintile combinations SZ1/BM1, SZ1/BM5, SZ3/BM3, SZ5/BM1, and SZ5/BM5.19 SZ1 through SZ5 and BM1 through BM5 stand for FF quintiles on size and BE/ME. See Table I for details. SZ1/BM1 refers to, for example, the portfolio of stocks in the smallest size quintile (SZ1) and the lowest book-to-market equity quintile (BM1). Along with these test assets, we use the FF three factor mimicking portfolios in design of the simulations. The conditioning variables are the dividend/price ratio (DPR), the default premium (DEF), the one-month Treasury bill rate (RTB), excess return on the NYSE equally-weighted portfolio (EWR), and the term premium (TERM). These variables are selected out of a larger set of ten variables including industry growth rate, in°ation rate, short-end term structure slope, January dummy, and excess return on the NYSE value-weighted index. The ten variables are lagged one-month behind the stock returns. We proceed in two steps. We apply ¯rst the ordinary least squares regression method and then the nonparametric selection test of Section I (C). The linear regressions indicate that joint use of the three popular forecasters DPR, DEF, and RTB drives out other variables in predicting the market, and EWR shows up strongly in the second moment. The nonparametric tests do not reject that the four variables DPR, DEF, RTB, and EWR are su±cient to characterize the ¯rst two moments of the market, but produce rejections against dropping any one of the four. We keep TERM as a redundant state variable in order to check sensitivity of test results to selection of conditioning variables. 19

When using all the 25 FF portfolios, the test depends on the inverse of a 125£125 matrix. In simulations with a large number of replications, the matrix is near singular from time to time, causing serious problems. So we decide to use these ¯ve portfolios, to reduce the dimension of the matrix while keeping a good representation of the FF portfolios.

14

B. Data-Generating Models We generate the conditioning variables DPR, DEF, RTB, EWR, and TERM through a simple ARCH model. Let y1t = ln(DPR t ), y2t = ln(DEF t ), y3t = ln(RTB t ), y4t = EWR t , and y5t = TERM t . The vector yt = (y1t y2t y3t y4t y5t )0 is assumed to follow the process yt ¡ y¹ = ©(yt¡1 ¡ y¹) + ²t ; where y¹ is the mean vector of yt , and the residual ²t has a normal distribution conditional on time-(t ¡ 1) information. We let the conditional correlation matrix of ²t be constant but the conditional variance of each component be

2 = ai;0 + ai;1 ²2i;t¡1 : ¾i;t¡1

The joint conditional distribution of all asset returns and factors is multivariate normal with a constant correlation matrix. We generate returns and factor returns using two models, denoted by H0 and H1 . Under H1 , the ¯rst two time-t conditional moments (i.e., Et (xt+1 ) and Et (x2t+1 )) of all asset returns and factors are linear functions of DPRt , DEF t , RTB t , and EWRt . For example, the ¯rst two moments of the excess return on the FF market portfolio (MKT ) are Et (MKT t+1 ) = a0 + a1 DPR t + a2 DEF t + a3 RTB t + a4 EWR t

(17)

Et (MKT 2t+1 ) = b0 + b1 DPR t + b2 DEF t + b3 RTB t + b4 EWR t :

(18)

Under H0 , we generate the test portfolios' conditional expected returns according to the conditional CAPM (i.e., by (3) in Section I). We estimate all of these models with historical data and use the parameter estimates as the `true' parameter value in the simulations. By doing so, the simulated data have many features similar to those of the actual data.20 We have used log-linear and quadratic speci¯cations of the moments as well, in addition to the linear functional forms of the moments. The performances of our test are similar across these functional forms. The linear speci¯cations above are chosen for following reasons. On one hand, this allows us to conveniently consider GMM tests with correctly speci¯ed pricing kernels, which serve as a benchmark in evaluation of our nonparametric test. Otherwise, 20

For brevity, we do not report the estimates of the parameters. They are available upon request.

15

nonlinear speci¯cations of the moments imply that correctly speci¯ed pricing kernels are nonlinear, giving rise to complicated multivariate nonlinear optimization problems in the simulations. On the other hand, there are no reasons that linear speci¯cations would give any advantage to our approach.21 In contrast, the GMM tests that we use to compare with our test may bene¯t from the linearity speci¯cations, since they are based on parametric pricing kernels of linear form.

C. Kernel and Bandwidth We use two kernel functions in the simulations. The ¯rst kernel is an independent multivariate normal density function k Y

K(u) =

Ái (ui )

i=1

where Ái is the density of a univariate normal with mean zero and variance ¾i2 , and ¾i is the standard deviation of the i-th state variable. In computation, ¾i is replaced by the sample standard deviation estimate. Note that scale adjustment of the state variables is already made in the above kernels through the standard deviations ¾i . An independent multivariate normal kernel is a popular choice in kernel estimation methods. This kernel is referred to as the normal kernel in the following discussion. The second one is a \bias-corrected" kernel, which is constructed from the normal kernel as follows (see Schucany and Sommers (1977) or Powell et al (1989)) ¤

K (u) =

K(u) ¡

Pk

aj b¡k j K(u=bj ) Pk 1 ¡ j=1 aj j=1

where a = B ¡1 e. Here a = (a1 ; : : : ; ak )0 , B is a k £ k matrix with the (i; j)-th component Bij = bij , e is a k £ 1 vector of ones, and bj = k + j for j = 1; : : : ; k. It is easy to verify that this is a kernel of order k + 1 satisfying Assumption A4 in Appendix A. Optimal bandwidth selection is an unresolved issue. We let 1

h = cN ¡ 2k+1 : 21

The kernel regression estimates are designed to handle all functional forms. There is no reason that they perform better for linear functions than for others. In fact, our test has slightly better power in the simulations when we use log-linear or quardratic speci¯cations.

16

Obviously, this takes into account the bandwidth convergence rate conditions of Proposition 1. However, as pointed out in Section I, the limiting distribution of our test statistic does not depend on the constant c: Thus commonly used cross-validation procedures can not be justi¯ed in our context. For a practical choice, we set c = 1. This simple bandwidth rule is regarded as an objective starting point and adopted by many authors (e.g., Silverman (1986), Pagan and Schwert (1990), Harvey (1991), among others).22 In the simulations we ¯nd that this simple rule gives rise to good test size when using the NYSE size portfolios and momentum portfolios. For the FF size and BE/ME portfolios, the test size is o® by a few percent. We adjust the rule to correct the test size by searching over an interval (0.9,1.1). The test has satisfactory size at c = 0:93: In our empirical investigation with actual data presented in Section III, we use exactly the same bandwidths and kernel functions.

D. Simulation Results Table II presents performance of our test in the simulations. The vector of conditioning variables we use is xt = (DPR t DEF t RTB t EWR t )0 ; except for Panel B where we add the redundant variable TERM. The vector of regressors is zt = (1 x0t )0 in our WLS regressions throughout the simulations. The test performs rather well. Panel A records rejection rates for testing the conditional CAPM. By construction the conditional CAPM holds under H0 , but not under H1 . The test is nearly perfect for the set of the FF Size-BE/ME portfolios. The rejection rates are close to one under H1 but close to the signi¯cance levels under H0 . For the NYSE size test portfolios, the power under H1 is less strong but still impressive. Panel B presents the case in which the redundant variable TERM is added to the conditioning information set. That is, xt is now composed of the ¯ve variables DPR, DEF, RTB, EWR, and TERM. Presence of this redundant variable a®ects power of the test to some degree. For the NYSE size portfolios the rejection probabilities drop, with an amount between 5% and 15%. However, the test performance is nearly una®ected in other cases. The results based on the higher order kernel are similar for the size-BE/ME portfolios, but much weaker for the NYSE size portfolios. As one would expect, the rejection probabilities 1

We tried a cross-validation method to determine the constant c in h = cN ¡ 2k+1 . For the postwar period (January 1947 to December 1995), c = 1:04. 22

17

with the higher order kernel are always lower. This is consistent with the fact that higher order kernels tend to produce more variable estimates (e.g., see HÄardle 1990). We also ¯nd that in terms of bias-reduction, the higher order kernel is not e®ective in this ¯nite sample context. Thus from this point on, we focus on the normal kernel. We have studied sensitivity of the test to selection of state variables. We have considered cases in which one component of x is missing and cases in which the redundant variable TERM replaces one of the components of x. Pleasantly, the test is still well behaved. The rejection rates under H0 are all fairly close to the corresponding signi¯cance levels, and the powers under the alternative remain strong. For brevity, these results are not reported. Also omitted are performance of the WLS regression estimates. Under H0 , the WLS regression estimates are well behaved in the sense that most of the estimates are within one standard deviation from zero (all within 2 standard deviations). Panel C presents results on testing the model of Jagannathan and Wang (1996) using the size and BE/ME portfolios. The hypothesis is that a portfolio of the stock market and the labor income factor, with return (1 ¡ µ)MKT + µLBR, is conditionally mean-variance e±cient. By construction, this hypothesis holds with µ = 0 under H0 , but does not hold

under H1 . The results show that our test has nearly perfect power under H1 , and for test size, the rejection rates are slightly above the benchmark levels. The estimate of µ is small and within one standard error from zero. The two stage and iterated procedures produce nearly identical results. The test also performs well in the simulations for the conditional FF model. This model states that a benchmark with excess returns of the form MKT t+1 + µ1;t SMB t+1 + µ2;t HMLt+1 is conditionally mean-variance e±cient, where MKT , SMB , and HML are the FF three factors. Panel D reports results for two cases. In the ¯rst case, µ1;t = µ1 and µ2;t = µ2 . In the second case, µ1;t = µ1 EWR 2t and µ2;t = µ2 . The test has good power and size in both cases. The parameter estimates have means within one standard deviation from zero under H0 . Under the alternative H1 , there is no prediction what the estimates should be like. The estimates are all positive on average, but the averages are below two standard errors. Again, the two stage procedure and the iterated procedure produce similar results.

18

E. GMM Tests with Parametric Pricing Kernels We study GMM tests of the conditional CAPM using a set of parametric pricing kernels. Our purpose is to provide a performance benchmark in evaluation of our nonparametric test, and to demonstrate e®ects of pricing kernel misspeci¯cation. The pricing kernels that we use are listed in Table III. All the speci¯cations are of linear form as recommended by Carhart et al (1995) and Cochrane (1996). The ¯rst (SP1) is the one implied by the unconditional CAPM or the unconditional mean-variance e±ciency of the market. The speci¯cations SP2, SP3, and SP4 (or their variants) have been employed by Carhart et al (1995), Cochrane (1996), Jagannathan and Wang (1996), in their tests of conditional asset pricing models.23 The above speci¯cations are clearly misspeci¯ed in the sense that they are not implied by the conditional CAPM, given that the ¯rst two moments of the market are generated according to equations (17) and (18). SP5, SP6, and SP7 are extensions of the above choices. Given (17) and (18), SP 6 is the correctly speci¯ed pricing kernel, the one implied by the conditional CAPM. SP7 is even more general (with a redundant variable TERM), and hence also correctly speci¯ed. The tests based on SP6 and SP7 provide an interesting benchmark for comparison. Including SP2 through SP4 as special cases, SP5 is close to being correctly speci¯ed, as it only misses one variable. Table III presents the results. First, Panel A shows that arbitrary functional form assumptions can easily create erroneous rejections. SP1 through SP4 have generated tests that reject much too often under H0 . At 10% level, the rejection rates of these tests range from 53% to 99% under H0 ! Panel A also shows that the more °exible speci¯cations SP5, SP6, and SP7 generate tests with weak power. For example, for the NYSE size portfolios, the rejection rates are all below 20% under H1 . They perform better for the size-BE/ME portfolios. In either case our test has better power than the GMM tests based on the correctly speci¯ed pricing kernels SP6 and SP7 (see Panels A and B of Table II). There is an intuitive explanation why our test outperforms. Note that in SP5 through SP7, there are a substantial number of parameters. All these parameters are used to ¯t the more or less similar moment conditions in the standard GMM testing, giving rise to problems of over¯tting.24 In contrast, 23

Cochrane studies a two-factor model. Jagannathan and Wang consider a framework without riskless rate. We ignore such details when drawing similarities in speci¯cations. 24 Our results on GMM are broadly consistent with the ¯nding of Ferson and Feorster (1994) that, in

19

our nonparametric estimate of the pricing kernel does not use any of the test portfolios, and hence our approach does not over¯t the cross-section. Secondly, Panel B shows that pricing error estimates are also sensitive to pricing kernel speci¯cation. The estimated expected return errors vary wildly across the speci¯cations, suggesting that either signi¯cant biases or substantial noises exist in the pricing error estimation.

III. Empirical Results A. Testing the Conditional CAPM We apply the same test procedures that we have carried out in the simulations to actual data. The set-up is identical. We report results based on the normal kernel and the set of the four conditioning variables that have been selected in Section II: the dividend/price ratio (DPR), the default premium (DEF), the one-month Treasury bill rate (RTB), and excess return on the NYSE equally-weighted index (EWR). In sensitivity analyses, we have used the higher order kernel. We have considered small changes in the bandwidth. We have also checked results when adding the term premium (TERM) to the conditioning information set. The test results are qualitatively robust. For brevity, these robustness checking results are omitted. Our tests strongly reject the conditional Sharpe-Lintner CAPM. In Table IV, Panel A shows that the conditional mean-variance e±ciency of the NYSE market proxy is rejected at p-value of 0.008, using the NYSE size portfolios as test assets. The signi¯cance tests of individual regressors show that this rejection comes largely from the conditioning variable EWR, which produces a huge test statistic (around 30 for Â2 (5)). For the FF size and bookto-market portfolios our test rejects with a p-value of 0.0 percent that the FF market portfolio is conditionally mean-variance e±cient. Still, the lagged market index EWR generates the largest test statistic in the regressor signi¯cance tests. The intercepts for the FF portfolios are also highly signi¯cant with a p-value around one percent.25 absence of misspeci¯cation, GMM tests for models with more parameters are more likely to have problematic performance in ¯nite samples. For GMM tests in presence of some form of misspeci¯cation, see Kan and Zhang (1999). 25 We let the four regressors be the conditioning variables minus their means, so that the intercepts are just average pricing errors. This transformation does not a®ect estimation and inference for slopes on the

20

The WLS regression results in Panel B provide more details about the Jensen's alphas. The conditional CAPM does not seem to have any di±culty pricing average returns on the NYSE size portfolios. The intercepts or average pricing errors are small, only between -0.02% and 0.08%, and are statistically insigni¯cant. However, for small to medium size portfolios (SZ1, SZ3, SZ5), the pricing errors are positively correlated with the stock market index EWR, and the slopes on EWR are salient, about three or four standard errors above zero. These slopes are also large in economic terms, in the sense that they indicate very volatile pricing errors. Note that the pricing error components 0.21EWR, 0.11EWR, and 0.09EWR in the three regressions have standard deviation of 1.02%, 0.53%, and 0.44%, respectively. The slopes on EWR are clearly related to size. The slope falls strictly with size from 0.21 to -0.01. To much extent, this determines a negative size pattern in the pricing error volatilities. The regressions with the FF size and BE/ME portfolios con¯rm the pattern of loadings on EWR. We also ¯nd the same pattern when replacing EWR with the value-weighted index (VWR). Are there signi¯cant size and book-to-market e®ects in the pricing errors of the conditional CAPM? In Panel C we present the time-series means and standard deviations of the pricing errors using the twenty-¯ve FF portfolios. The panel shows that there are strong size and book-to-market e®ects, and interestingly, the e®ects appear to be in di®erent dimensions. The average pricing errors are obviously related to book-to-market equity. In every size quintile, the means tend to increase with BE/ME. On average, they increase strictly from -0.16% in the lowest BE/ME quintile to 0.33% in the highest. There is, however, no clear size pattern in the average pricing errors, which is consistent with the regression results for the NYSE size portfolios. On the other hand, the standard deviations of the pricing errors show a strong negative relation to size, but no clear BE/ME pattern.26 Ghysels (1998) points out that if we correctly specify the dynamics of the beta risk, timevarying beta models are sure to outperform constant beta models. Yet he argues that if beta nonconstant regressors. Nor does it alter the joint statistic and pricing error estimates. For inference about the intercepts (which is a®ected), we ignore estimation noise in the sample means of the regressors. This is equivalent to assuming that the regressors are observed in deviation form. 26 The absence of size e®ect in average pricing errors seems consistent with results of Knez and Ready (1997), who questioned the robustness of the size factor in explaining average returns. Our pricing error estimates are based the WLS method. We also used the kernel regression method. (See Appendix C for details of the estimation methods.) The conclusions are identical.

21

risk is misspeci¯ed, we may commit serious pricing errors which could be even bigger than a constant beta model, and he shows that this has indeed happened in many cases. With our nonparametric formulation of the conditional CAPM, can it outperform the unconditional CAPM? We ¯nd that the conditional CAPM performs much better than the constant beta version, for each and every one of our test portfolios! In Figure 1 we plot the means and standard deviations of the pricing errors. Clearly, the conditional CAPM is a substantial improvement over the static CAPM, in terms of both the average and the volatility of the Jensen's alphas. To some extent, Figure 1 con¯rms that our nonparametric approach works well. It also illustrates that there are signi¯cant payo®s to study conditional asset pricing models.

B. Labor Income Risk Jagannathan and Wang (JW 1996) argue that it is important to include labor income risk. They ¯nd that a labor income risk factor plays a signi¯cant role in explaining cross-sectional variation in average stock returns.27 Their labor income risk factor is motivated as a proxy for return on human capital. Alternatively, their model may be viewed as a conditional two factor model. We test the two factor model assuming that the benchmark portfolio has return of the form (1 ¡ µ)MKT + µLBR, where LBR is the labor income risk factor, measured as the per capita labor income growth rate. MKT is a stock market index. Our tests strongly reject

that the benchmark is conditionally mean-variance e±cient. Panel A of Table V shows that the model is rejected with p-values below one percent for both the NYSE size portfolios and the FF size-BE/ME portfolios. Estimates of the parameter µ are small and statistically insigni¯cant, showing no support for signi¯cance of the labor income risk factor. The labor income risk factor does not help much in capturing dynamics of asset returns. Panel B presents minimums of three summary measures of the pricing errors that can be achieved with the free parameter µ. The measures are average absolute bias (AAB), average standard deviation (ASD), average root mean squared error (ARMSE). Detailed de¯nitions are given in Appendix C. As the panel shows, the ARMSE cannot achieve any signi¯cant 27

Using Japanese stock return data but an unconditional four factor model, Jagannathan, Kubota and Takehara (1998) also ¯nd that a labor income risk factor is signi¯cant in explaining average stock returns.

22

reduction with the free parameter µ. The minimums of ARMSE in the two cases are 0.59% and 0.82%, respectively, compared to 0.61% and 0.84% for the conditional CAPM without the labor income risk factor. Why are our results di®erent from those of JW? Figure 2(a) points to an explanation. Plotted in Figure 2(a) are the two summary measures AAB and ASD of the pricing errors for the twenty-¯ve FF size and BE/ME portfolios. It shows that the JW labor income factor can hardly reduce the pricing error volatility to any signi¯cant extent. This is why it receives strong rejections in our tests. Consistent with JW, however, the labor income factor can signi¯cantly reduce average pricing errors. The AAB has the minimum value of 0.13%, obtained at µ = 0:94, which is more than thirty percent reduction of the bias 0.19% at µ = 0.28 This suggests that JW obtain favorable results because their tests are focused on average pricing errors. In fact, JW consider only pricing average returns and do not challenge their model with dynamics of asset returns.29

C. Size and Book-to-Market We conduct tests of the conditional FF three factor model. In its most general form, the model predicts the conditional mean-variance e±ciency of a benchmark portfolio with time(t + 1) excess return MKT t+1 + µ1;t SMB t+1 + µ2;t HMLt+1 ; where MKT is excess return on the Fama-French market portfolio, SMB and HML are returns on the FF mimicking portfolios for the size factor and the book-to-market factor, respectively.30 We consider three simple versions of the model: (i) µ1;t = µ1 and µ2;t = µ2 , (ii) µ1;t = µ1 EWR t and µ2;t = µ2 , (iii) µ1;t = µ1 EWR 2t and µ2;t = µ2 . In (i), the proportions µ1;t and µ2;t are assumed to be constant over time. This may be restrictive, as Table IV shows that the size and book-to-market e®ects seem to appear in 28

Similarly, JW get estimates of the fraction parameter µ that are close to 1.0. In their GMM tests, they do not use any conditioning instrument to scale returns on test portfolios. 30 Note that, without any restrictions on µ1;t and µ2;t , MKT t+1 +µ1;t SMB t+1 +µ2;t HMLt+1 indeed represents excess return on a risky portfolio. 29

23

di®erent dimensions. In (ii) and (iii), we let the size factor be \dynamic", as we let the proportion for the size portfolio change according to a stock market signal. The signal is about the market level in (ii) and about the market volatility in (iii). Panel A of Table VI presents the test results using the size and BE/ME portfolios as test assets. The version (i) is strongly rejected with a p-value as low as 0.1 percent. The estimates of µ1 and µ2 are above two standard errors from zero, showing some support for the two factors. For (ii), the test statistic drops substantially with p-value of 0.046, indicating a marginal rejection at the 5% level. The estimates µ1 and µ2 are positive and statistically signi¯cant. The version (iii) is not rejected, with p-value of around 38 percent! Again, the estimates µ1 and µ2 are positive and statistically signi¯cant.31 Next we look at the pricing errors computed using the dynamic version (iii) of the FF model. Panel B reports minimums of the three summary measures of pricing errors that can be achieved with the free parameters µ1 and µ2 . The test portfolios are the twenty-¯ve FF size and book-to-market portfolios. The cases with constraint µ1 = 0 or µ2 = 0 are also included. Figure 2(b) plots the AAB and ASD measures against µ1 while letting µ2 = 0. Figure 2(c) presents the case in which µ2 is free but µ1 is ¯xed at zero. The conditional FF model captures the prominent features of the conditional CAPM pricing errors. Used together, the size and book-to-market factors can signi¯cantly reduce either the bias measure (AAB) or the volatility measure (ASD) of the pricing errors. The minimum AAB is more than a sixty percent reduction while the minimum ASD is about a ¯fty-¯ve percent reduction, compared to the measures from the conditional CAPM. The size factor can signi¯cantly reduce both the pricing error volatility and the bias. This is evident from Panel B and Figure 2(b). Used alone, the size factor can achieve an ASD of 0.35%, the same as the minimum from joint use of both factors, and a 40% drop in AAB (0.11% compared to the CAPM value 0.19%). On the other hand, the book-to-market factor is e®ective on bias reduction. Used alone, it gives minimum AAB of 0.09%, almost the same as that from joint use of the two factors. However, Panel B and Figure 2(c) show that the book-to-market factor can hardly reduce the ASD. The two factors jointly produce minimum ARMSE of 0.37%, which is less than a half of the conditional CAPM value (0.84%). 31

Note that (iii) is a special form of the most general conditional version of the FF model. If we can not reject it, we certainly can not reject the conditional FF model in its most general form.

24

Our results di®er from those of recent studies by He, Kan, Ng, and Zhang (1996), Ferson and Siegel (1998), and Ferson and Harvey (1999). These tests strongly reject some dynamic speci¯cations of the FF model. However, the results against the FF model are based on auxiliary functional form assumptions that are subject to Ghysels' (1998) critique. For example, Ferson and Harvey (1999) assume linear forms for time-varying betas. Ferson and Siegel (1998) assume a parametric form of the pricing kernel.32 He et al (1996) also make auxiliary functional form assumptions. Therefore, there is a real possibility that these tests produce strong rejections largely because of serious misspeci¯cation of betas, the pricing kernel, and etc. Consistent with this explanation, using our more °exible conditioning information approach, we ¯nd evidence supporting the conditional FF model.

D. Portfolios Formed on Past Returns How does the conditional FF model perform with portfolios formed on past returns? Portfolios formed on past returns have created much controversy in ¯nance. The reversal of longterm returns documented by DeBondt and Thaler (1985) and the continuation of short-term returns documented by Jegadeesh and Titman (1993) appear to be serious challenges to the CAPM. Recently, the continuation of short-term returns or the momentum has attracted special attention as FF (1996a) ¯nd that it is the main embarrassment of their three factor model. We test if these dynamic patterns of asset returns present any serious problems to the conditional FF model. We focus on the simple conditional version (iii). In Table VII we present the test results, where three sets of winner-loser decile portfolios are used as test assets. The portfolios are formed monthly on short-term (11 month) and long-term (up to 5 years) past returns. The data sets are identical to those used in Tables VI and VII of FF (1996a). For comparison, we also report test statistics for the conditional CAPM. We can not reject the conditional FF model with the portfolios. The test statistics are much smaller than those of the conditional CAPM, with p-values all above 0.9! The size factor shows up strongly in all three cases. The conditional FF model has considerably smaller 32

Ferson and Siegel (1998) extend the variance bound approach of Hansen and Jagannathan (1991). They have considered a pricing kernel of form mt+1 = a(zt ) + B(zt )Ft+1 , where Ft+1 is the vector of the FF factors, a(zt ) and B(zt ) are linear functions of lagged instruments, a Treasury bill and a dividend yield.

25

pricing errors than the conditional CAPM, as Panel B shows. Interestingly, like for the size and BE/ME portfolios, the size and book-to-market factors appear to work in di®erent dimensions of the pricing errors: the former reduces the volatility but the latter reduces the bias. Using the set of momentum portfolios (those formed on 11-month returns), Figure 3 compares the pricing errors of the conditional FF model with those of the unconditional FF model. Figure 3 shows unambiguously that the conditional FF model has much smaller pricing errors than the unconditional FF model.

IV. Summary Conditional asset pricing models have received considerable attention over recent years. Numerous authors have examined or applied various linear factor models with time-varying betas and time-varying risk premia. However, as Ghysels (1998) has emphasized recently, there is an unresolved methodological issue. Ghysels points out that empirical performance of conditional models is sensitive to speci¯cation of time-varying betas. Using some well known time-varying beta models from the literature, he shows that these models are seriously misspeci¯ed such that in many cases the pricing errors with the time-varying beta models are even larger than those with constant beta models. Ghysels' critique is challenging, as it suggests that the empirical results we have about conditional asset pricing models may be unreliable or even misleading. In this paper we have proposed a new testing approach that completely avoids speci¯cation of time-varying betas. We focus on the pricing kernel and estimate it nonparametrically. We use the nonparametric pricing kernel to incorporate conditioning information into the testing and to circumvent misspeci¯cation of the pricing kernel. Our test is designed to aim directly at time-variation in Jensen's alphas, providing a straightforward way to look into dynamic features of the pricing errors. As with any nonparametric tests, one of the most critical questions is whether the test has power in applications. Nonparametric tests can avoid e®ects of misspeci¯cation, but typically the underlying nonparametric estimators converge at rates slower than parametric estimators. In contrast, our approach has a very pleasant property. We show that although our approach is based on a nonparametric pricing kernel, the estimator underlying our test

26

converges at the standard parametric rate, no matter how many conditioning variables we use. Our simulation results are consistent with this fast convergence rate property. In the Monte Carlo experiments, we ¯nd that our test performs rather well when compared with a set of standard GMM tests. In the empirical investigation we ¯nd that the conditional CAPM is a substantial improvement over the unconditional CAPM (as one would expect in absence of misspeci¯ed time-varying betas and risk premia). But the model is still rejected by our test. There are signi¯cant size and book-to-market e®ects in the conditional CAPM pricing errors. Interestingly, the e®ects appear to be in di®erent dimensions. For average pricing errors, there is a clear book-to-market pattern but no clear size pattern. In contrast, there is a strong size pattern but no sign of book-to-market e®ect in pricing error volatility. We ¯nd that the labor income risk factor is not signi¯cant in capturing dynamics of the deviations from the conditional CAPM. The factor can hardly reduce the pricing error volatility, giving rise to strong rejections in our tests. However, we ¯nd that the labor income factor is signi¯cant in reducing the average pricing errors, which is consistent with the results of Jagannathan and Wang (1996). Our results show that the size and book-to-market factors do play an important role in explaining dynamics of the cross-section of asset returns. Speci¯cally, we ¯nd that a simple version of the conditional Fama and French (1993) three factor model performs well. This model captures the prominent size and book-to-market features of the conditional CAPM pricing errors. The conditional FF model signi¯cantly outperforms the unconditional FF model. Even the momentum e®ect, the \main embarrassment" of the unconditional three factor model (FF 1996a), does not seem to be a serious challenge to the conditional FF model. With portfolios formed on past returns, the pricing errors of the conditional FF model are jointly statistically insigni¯cant and they are much smaller than those of the unconditional FF model.

27

Appendix A. Assumptions There is a large statistical literature on the U -statistics introduced by Hoe®ding (1948). Numerous asymptotic results for U-statistics or generalized U -statistics have been established under various statistical setups. In particular, Robinson (1989), Yoshihara (1990), and Khashimov (1993) have provided conditions and established central limit theorems of second order generalized U-statistics for ¯-mixing processes. We present a set of conditions oriented to the asset pricing applications. In particular, simplicity of conditions on the bandwidth and the kernel is pursued to facilitate practical choices. In addition, restrictions on higher order moment existence are kept at a minimal level, due to concerns about possibly fat-tailed distributions of asset returns. Moreover, the ¯-mixing condition below requires no change for higher order extensions. Under this set of assumptions, we provide a large sample justi¯cation for the regression testing approach. Notations: For a matrix of random variables X ´ (xij ) and any positive real ½, de¯ne kXk½ ´ (kxij k½ ) and kxij k½ ´ [Ejxij j½ ]1=½ . The matrix kXk½ is referred to as the ½-norm of X. Let jXj½ denote (jxij j½ ). These notations are also used for vectors. Let kxk stand for the Euclidean norm of a vector x. For convenience, we use an inequality sign between a matrix (vector) and a scalar or between two matrices (vectors) to denote component-by-component inequalities. As usual, the information set It stands for a ¾-¯eld which contains in particular the ¾-¯eld generated by the data sequence fyt ; yt¡1 ; ¢ ¢ ¢g. In addition, de¯ne 2 2 W1 (t; s) ´ (rp;s+1 ¡ rp;s+1 rp;t+1 )rt+1 + (rp;t+1 ¡ rp;t+1 rp;s+1 )rs+1 ; 2 2 zt zt0 + rp;t+1 zs zs0 : W2 (t; s) ´ rp;s+1

A function Á(x) is said to satisfy local Lipschitz condition for some function m(x) if jÁ(x + º) ¡ Á(x)j < m(x)kºk: Let rl1 ;¢¢¢;lj Á(x) stand for the partial derivative @ j Á(x)=@xl1 ¢ ¢ ¢ @xlj where xl is the l-th element of x. Assumptions: Assumption A1: The data sequence fyt+1 g is a strictly stationary ¯-mixing process, and the subvector xt has absolutely continuous distribution with density f (xt ). For some ½ > 2, P (½¡2)=½ the mixing numbers ¯n , n = 1; 2; : : :, satisfy 1 < 1. n=1 n¯n 2 Assumption A2: (i) rp;t+1 , rt+1 , and rp;t+1 rt+1 have ¯nite ¯rst moment; (ii) kW1 (t; s)k½ < 1 and kW2 (t; s)k½ < 1 for all t < s; (iii) k´(yt+1 )k½ < 1 and ka(yt+1 )k½ < 1.

Assumption A3: fgp , fgpp , f gr , f gpr , and fgzz satisfy the local Lipschitz condition for 2 some m(x), where m(xt )rt+1 , m(xt )rp;t+1 rt+1 , m(xt )rp;t+1 , m(xt )rp;t+1 , and m(xt )zt zt0 have ¯nite ½-norm.

28

Assumption A4: The kernel K(u) is a bounded symmetric function satisfying

Z

Z

Z

K(u)du = 1;

jujj jK(u)jdu < 1

ul11 ¢ ¢ ¢ ulkk K(u)du = 0

if 0 · j · k + 1 ; if 0 < l1 + ¢ ¢ ¢ + lk < k + 1;

where uj is the j-th element of vector u. That is, the kernel K(u) is of order k + 1. Assumption A5: (i) The j-th partial derivatives of f gp , f gpp , fgr , f gpr , and f gzz exist for all j · k +1; (ii) The expectations E[gpr rl1 ;¢¢¢;lj (f gp )], E[gr rl1 ;¢¢¢;lj (f gpp )], E[gpp rl1 ;¢¢¢;lj (f gr )], E[gp rl1 ;¢¢¢;lj (f gpr )], E[gpp rl1 ;¢¢¢;lj (f gzz )] exist for all j · k + 1, where the functions and the partial derivatives are evaluated at xt . Assumption A6: The matrices A and ¡0 are nonsingular. The mixing condition in Assumption A1 restricts the amount of dependence allowed in the data sequence, permitting among other things a central limit theorem to be applied. Conditions that require ¯n to vanish as a power of n do not seem to be restrictive for most ¯nancial data processes and such assumptions are commonly used in the literature. For instance, a ¯-mixing condition of this type is adopted by AÄit-Sahalia (1996) for continuous time models of interest rates. Note that Assumption A2 is related to A1. As the moment restrictions become stronger (larger ½), more dependence is allowable. Such a trade-o® is not uncommon in establishing asymptotic results for serially correlated data (e.g., White and Domowitz (1984)). Lipschitz conditions and higher order kernel assumptions, similar to Assumptions A3 and A4, have been employed by many authors (e.g., HÄardle and Stoker (1989)). Assumption A5 is a regular condition for asymptotic bias correction through use of a higher order kernel (see Powell et al (1989)). For the moment conditions in A2, note ¯rst that ½ can be set arbitrarily close to 2 if ¯n decays exponentially. Next, it is straightforward to verify that Assumption A2 holds for all ½ > 2 if the joint distribution of all the variables is normal, or lognormal, or mixture normal. For a more interesting example, consider the case in which the return vector is generated as follows: r^t+1 = ¹(zt ) + ¾(zt )²t+1 ; 0 0 where r^t+1 = (rp;t+1 ; rt+1 ) . zt has ¯nite second moment and it contains xt as a subvector. Being independent from zt , the shock ²t+1 has zero mean and ¯nite moment of any order. Then Assumption A2 holds for all ½ > 2 if ¹(¢), ¾(¢), and f (¢) are bounded functions. Of course, the conditional mean and standard deviation do not have to be bounded to obtain A2. For instance, ¹(¢) and ¾(¢) can be polynomials of any order when zt has ¯nite moments of any order, which is enough to deliver A2 in the example above. These examples suggest that the moment conditions in A2 are not as restrictive as they may seem.

Appendix B. Proofs of the Propositions We use three basic lemmas for the proofs of the propositions. Proofs of these lemmas are omitted but available upon request. The ¯rst lemma is a modi¯ed version of Yoshihara's 29

(1976) lemma 1 which has been an indispensable tool to analyze U -statistics for ¯-mixing processes. Lemma 1 (Yoshihara's Fundamental Lemma): Let fyt g be a strictly stationary ¯-mixing process with the mixing numbers ¯n , n = 1; 2; : : :. For any given j, 1 · j · m ¡ 1, and t1 < ¢ ¢ ¢ < tm , let »j+1 , . . . , »m be m ¡ j random vectors which are identical in joint distribution to ytj+1 ; : : : ; ytm , but independent from yt1 ; : : : ; ytj . Let Á(yt1 ; : : : ; ytm ) be a function such that E[Á(yt1 ; : : : ; ytj ; »j+1 ; : : : ; »m )] = 0; and sup 1·t1 2. If there exists BN = o( N) such that k°N (yt+1 )k½ · BN and sup 1·t