Testing the Mean-Variance Efficiency of a Portfolio: a Multivariate Approach

Traditional studies of the CAPM had used one of two approaches to test for the existence of a positive linear relationship between asset returns and β coefficients (or, alternatively, for the mean-variance efficiency of the proxy used). Fama & MacBeth [1973], Litzenberger & Ramaswamy [1979], inter alios, used non-linear terms e. g. total variance, β2 or dividends, as well as univariate tests e. g. the standard t-statistic, to assess the significance of such additional terms. For example, Black, Jensen & Scholes [1972] estimated equation (5.15) for a sample of assets and reported a series of univariate t-statistics for the null hypothesis, such that the α coefficient on each portfolio was equal to zero. The limitation of this first approach outlined in section 5.3.3 - is that the linearity of the relationship between returns and βs is tested against a specific alternative hypothesis. It is also conditional on previously calculated estimates of the β coefficients, thus reducing the power of the test.

The limitation of the second approach - outlined in section 5.3.6 - is that it relies on a series of univariate statistics on separate time-series, without giving any assessment of the joint significance of observed deviations from linearity. Again, this reduces the power of the test.

As mentioned in section 5.3.2, multivariate techniques can also be used to estimate factor risk premia associated with, for example, βs or residual risk. The procedures of Black, Jensen & Scholes (section 5.3.3) and Fama & MacBeth (section 5.3.3) are both two-pass. In the first pass, time-series regressions are run to estimate, for example, βs and residual risk. In the second pass, cross-sectional regressions are run to estimate factor risk premia. Multivariate techniques essentially amount to running the time-series and cross-sectional regressions at the same time, thus eliminating the problems that arise from using estimated βs which are measured with error in the second stage.

Econometrically, multivariate procedures are superior to two-pass regression procedures. They have not been used more frequently, however, due to their programming complexity and the usual requirement that the number of time-series observations exceed the number of cross-sectional observations.

MacBeth [1975] was the first to use a multivariate test for linearity based on the Hotelling T2-statistic, a generalisation of the standard t-statistic. However, inference based on MacBeth's test is complicated by the fact that it requires the observability of market βs. Since estimates of these are substituted in the test statistic, the resulting distribution is unknown.

Gibbons [1982] improved upon MacBeth's approach by using maximum likelihood estimation in a multivariate test of the mean-variance efficiency of the proxy used. The test statistic used was based on a standard likelihood-ratio test ("LRT") statistic, in conjunction with its limiting chi-squared distribution. Since the distribution of the LRT accounts for the fact that all parameters in the relationship tested, including β coefficients, are jointly estimated, this technique dominates the T2-statistic, at least asymptotically.

Gibbons's technique can be described as follows. A 'market model' relationship is assumed:

where Ri is a time-series of returns on asset i, ιT is a vector of ones, Rm, is the time- series of returns on the market proxy, and εi is a vector of residuals, assumed to be multivariate normal with a diagonal variance-covariance matrix. If the index is mean- variance efficient, then:

for i = 1,..., N.

The null hypothesis therefore places a non-linear restriction on a series of N regression equations. To obtain a test statistic, the restricted and unrestricted versions of equation (5.26) can be estimated by maximum likelihood. The LRT, which compares the statistical fit of the unrestricted model with that of the restricted model, is thus obtained.41

Using this methodology, Gibbons tested for the mean-variance efficiency of the equal-weighted CRSP index using monthly data for the period 1926-1975. To allow for non-stationarity, this period was divided into ten equal five-year sub-periods. To avoid having more assets than observations (which would determine singularity in the variance-covariance matrix), forty equally-weighted portfolios were formed based on the β coefficients of individual assets.42 Based on the asymptotic chi-square distribution of the LRT statistic, the mean-variance efficiency of the index was rejected in five out of ten periods. When the results for individual sub-periods were aggregated into an overall statistic, the null hypothesis was strongly rejected.

A problem with Gibbons's tests is that inferences were drawn based on the asymptoticdistribution of the test statistic. Stambaugh [1982] examined the small sample validityof Gibbons's results and the impact of performing multivariate tests with alternative market proxies. He reported some simulation evidence that the LRT does not conform closely to its limiting distribution, and suggests that the null hypothesis is rejected too often. A related Lagrange multiplier test ("LMT") was proposed which appeared to conform more closely in small samples to the limiting distribution. When the Gibbons test was repeated based on the LMT, the null hypothesis of mean-variance efficiencyof the equal-weighted CRSP index could no longer be rejected. Similar results were obtained by Jobson & Korkie [1982], who modified the LRT statistics using Bartlett's correction factor. Transformed in this manner, the statistics reported by Gibbons no longer rejected the null hypothesis.

The LMT statistic was also used by Stambaugh [1982; 1983] to test for the efficiency of a variety of market proxies. This was to verify one of the implicit assumptions forming the basis of Roll's critique, namely that inferences about the model's validity are sensitive to the incorrect specification of the market index portfolio. As Stambaugh [1982, p. 235] put it:

'that one index portfolio can reverse inferences about the model made with another index portfolio is certainly true and is not an empirical question. The empirical question is whether such a reversal occurs with indices that approximate returns on portfolios of aggregate wealth. It is the latter question that bears on the testability of the CAPM.'

Four market indices were constructed, using weighted estimates of the market value of each class of assets:

  • the first index consisted of the value-weighted NYSE equity index;
  • the second combined the first index with corporate bonds, government bonds and Treasury bills;
  • the third added home furnishings, automobiles and real estate; and
  • the fourth assigned a weight of only 0.10 to equity and a weight of 0.90 to the other six categories in the third index.

The test was also repeated to explain the returns of alternative sets of assets including industrial equity portfolios, β-sorted equity portfolios, preferred equities, and bond portfolios. The general conclusion was that linearity tests are more sensitive to the selection of assets than to the composition of the market index. In particular, based on the most comprehensive set of assets, linearity could not be rejected for any of the indices.

A fourth multivariate test statistic, denominated the `cross-sectional regression test', was proposed by Shanken [1985; 1986b].43 This statistic has the same limiting chi-squared distribution as the LRT and LMT statistics, but its small sample distribution can be bounded above and below. Using this test statistic, Shanken was able to reject the mean-variance efficiency of the equally-weighted CRSP index of a sample of twenty portfolios stratified by company size. This provided evidence for the existence of a `small company effect'. When the test statistic was recomputed by excluding observations in January,44 the mean-variance efficiency of the index was still rejected, although no significant relationship emerged between excess return and size.

More recently, Gibbons, Ross & Shanken [1989] have obtained an analytical formula and an exact small-sample distribution for the LRT, and Kandel & Stambaugh [1989] extensively investigated the geometric interpretation of this test in the context of the mean-variance frontier.

Gibbons, Ross & Shanken also applied the LRT statistic to test the efficiency of the equally-weighted CSRP index. Using a data set very similar to that used by Black, Jensen & Scholes [1972], and ten β-sorted portfolios, they could not reject the ex ante efficiency of the index. The same conclusion was reached when the efficiency of the index was tested against ten portfolios stratified by company size. While this result contrasts with existing evidence on the 'small company effect', the authors suggested that this evidence might simply be due to the inappropriateness of previous univariate tests.45 In particular, Gibbons, Ross & Shanken showed the existence of a negativesample correlation between the highest decile portfolio and the other decile portfolios. This implies that if the lowest decile portfolios perform well in the sample, the highestdecile would be expected to perform poorly. As this correlation across portfolios is taken into account by statistics from multivariate tests, but not those from univariate statistics, it may explain the difference in results. However, the 'size effect' was significant, and the efficiency of the index was rejected, when only January returns were considered.46


41Gibbons actually uses an equivalent one-step Gauss-Newton procedure, which linearises the restriction in equation (5.27), using a Taylor series expansion around the OLS estimates.

42Since the multivariate regression model does not use the βs as an explanatory variable, there is no bias from selecting securities based on contemporaneous βs. The use of contemporaneous βs increases the dispersion of portfolio βs, thus improving the power of the test.

43Ro11 [1985] discusses the geometric interpretation of the cross-sectional regression test statistic in the context of the mean-variance-efficient frontier. MacKinlay [1987] provides a review.

44This is consistent with the evidence reported by Keim [1983] and Bogle [1994] that the size effect is more pronounced in January than in any other month.

45See for example, Blume & Stambaugh [1983], Keim [1983] and Roll [1983].

46See footnote 44 above.