Example: Fama-MacBeth regression
Estimating the Risk Premia using Fama-MacBeth Regressions¶
This example highlights how to implement a Fama-MacBeth 2-stage regression to estimate factor risk premia, make inference on the risk premia, and test whether a linear factor model can explain a cross-section of portfolio returns. This example closely follows [Cochrane::2001] (See also [JagannathanSkoulakisWang::2010]). As in the previous example, the first segment contains the imports.
from numpy import mat, cov, mean, hstack, multiply,sqrt,diag, \
squeeze, ones, array, vstack, kron, zeros, eye, savez_compressed
from numpy.linalg import inv
from scipy.stats import chi2
from pandas import read_csv
import statsmodels.api as sm
Next, the data are imported. I formatted the data downloaded from Ken French's website into an easy-to-import CSV which can be read by pandas.read_csv
. The data is split using named columns for the small sets of variables and ix
for the portfolios. The code uses pure NumPy arrays, and so values
is used to retrieve the array from the DataFrame. The dimensions are determined using shape
. Finally the risk free rate is forced to have 2 dimensions so that it will be broadcastable with the portfolio returns in the construction of the excess returns to the Size and Value-weighted portfolios. asmatrix
is used to return matrix views of all of the arrays. This code is linear algebra-heavy and so matrices are easier to use than arrays.
data = read_csv('FamaFrench.csv')
# Split using both named colums and ix for larger blocks
dates = data['date'].values
factors = data[['VWMe', 'SMB', 'HML']].values
riskfree = data['RF'].values
portfolios = data.iloc[:, 5:].values
# Use mat for easier linear algebra
factors = mat(factors)
riskfree = mat(riskfree)
portfolios = mat(portfolios)
# Shape information
T,K = factors.shape
T,N = portfolios.shape
# Reshape rf and compute excess returns
riskfree.shape = T,1
excessReturns = portfolios - riskfree
The next block does 2 things:
- Compute the time-series $\beta$s. This is done be regressing the full array of excess returns on the factors (augmented with a constant) using lstsq.
- Compute the risk premia using a cross-sectional regression of average excess returns on the estimates $\beta$s. This is a standard regression where the step 1 $\beta$ estimates are used as regressors, and the dependent variable is the average excess return.
# Time series regressions
X = sm.add_constant(factors)
ts_res = sm.OLS(excessReturns, X).fit()
alpha = ts_res.params[0]
beta = ts_res.params[1:]
avgExcessReturns = mean(excessReturns, 0)
# Cross-section regression
cs_res = sm.OLS(avgExcessReturns.T, beta.T).fit()
riskPremia = cs_res.params
The asymptotic variance requires computing the covariance of the demeaned returns and the weighted pricing errors. The problem is formulated using 2-step GMM where the moment conditions are \begin{equation} g_{t}\left(\theta\right)=\left[\begin{array}{c} \epsilon_{1t}\\ \epsilon_{1t}f_{t}\\ \epsilon_{2t}\\ \epsilon_{2t}f_{t}\\ \vdots\\ \epsilon_{Nt}\\ \epsilon_{Nt}f_{t}\\ \beta u_{t} \end{array}\right] \end{equation}
where $\epsilon_{it}=r_{it}^{e}-\alpha_{i}-\beta_{i}^{\prime}f_{t}$, $\beta_{i}$ is a $K$ by 1 vector of factor loadings, $f_{t}$ is a $K$ by 1 set of factors, $\beta=\left[\beta_{1}\,\beta_{2}\ldots\beta_{N}\right]$ is a $K$ by $N$ matrix of all factor loadings, $u_{t}=r_{t}^{e}-\beta'\lambda$ are the $N$ by 1 vector of pricing errors and $\lambda$ is a $K$ by 1 vector of risk premia. The vector of parameters is then $\theta= \left[\alpha_{1}\:\beta_{1}^{\prime}\:\alpha_{2}\:\beta_{2}^{\prime}\:\ldots\:\alpha_{N}\,\beta_{N}^{\prime}\:\lambda'\right]'$ To make inference on this problem, the derivative of the moments with respect to the parameters, $\partial g_{t}\left(\theta\right)/\partial\theta^{\prime}$ is needed. With some work, the estimator of this matrix can be seen to be
\begin{equation} G=E\left[\frac{\partial g_{t}\left(\theta\right)}{\partial\theta^{\prime}}\right]=\left[\begin{array}{cc} -I_{n}\otimes\Sigma_{X} & 0\\ G_{21} & -\beta\beta^{\prime} \end{array}\right]. \end{equation}where $X_{t}=\left[1\: f_{t}^{\prime}\right]'$ and $\Sigma_{X}=E\left[X_{t}X_{t}^{\prime}\right]$. $G_{21}$ is a matrix with the structure
\begin{equation} G_{21}=\left[G_{21,1}\, G_{21,2}\,\ldots G_{21,N}\right] \end{equation}where
\begin{equation} G_{21,i}=\left[\begin{array}{cc} 0_{K,1} & \textrm{diag}\left(E\left[u_{i}\right]-\beta_{i}\odot\lambda\right)\end{array}\right]\end{equation}and where $E\left[u_{i}\right]$ is the expected pricing error. In estimation, all expectations are replaced with their sample analogues.
# Moment conditions
X = sm.add_constant(factors)
p = vstack((alpha, beta))
epsilon = excessReturns - X @ p
moments1 = kron(epsilon, ones((1, K + 1)))
moments1 = multiply(moments1, kron(ones((1, N)), X))
u = excessReturns - riskPremia[None,:] @ beta
moments2 = u * beta.T
# Score covariance
S = mat(cov(hstack((moments1, moments2)).T))
# Jacobian
G = mat(zeros((N * K + N + K, N * K + N + K)))
SigmaX = (X.T @ X) / T
G[:N * K + N, :N * K + N] = kron(eye(N), SigmaX)
G[N * K + N:, N * K + N:] = -beta @ beta.T
for i in range(N):
temp = zeros((K, K + 1))
values = mean(u[:, i]) - multiply(beta[:, i], riskPremia)
temp[:, 1:] = diag(values)
G[N * K + N:, i * (K + 1):(i + 1) * (K + 1)] = temp
vcv = inv(G.T) * S * inv(G) / T
The $J$-test examines whether the average pricing errors, $\hat{\alpha}$, are zero. The $J$ statistic has an asymptotic $\chi_{N}^{2}$ distribution, and the model is badly rejected.
vcvAlpha = vcv[0:N * K + N:4, 0:N * K + N:4]
J = alpha @ inv(vcvAlpha) @ alpha.T
J = J[0, 0]
Jpval = 1 - chi2(25).cdf(J)
The final block using formatted output to present all of the results in a readable manner.
vcvRiskPremia = vcv[N * K + N:, N * K + N:]
annualizedRP = 12 * riskPremia
arp = list(squeeze(annualizedRP))
arpSE = list(sqrt(12 * diag(vcvRiskPremia)))
print(' Annualized Risk Premia')
print(' Market SMB HML')
print('--------------------------------------')
print('Premia {0:0.4f} {1:0.4f} {2:0.4f}'.format(arp[0], arp[1], arp[2]))
print('Std. Err. {0:0.4f} {1:0.4f} {2:0.4f}'.format(arpSE[0], arpSE[1], arpSE[2]))
print('\n\n')
print('J-test: {:0.4f}'.format(J))
print('P-value: {:0.4f}'.format(Jpval))
i = 0
betaSE = []
for j in range(5):
for k in range(5):
a = alpha[i]
b = beta[:, i]
variances = diag(vcv[(K + 1) * i:(K + 1) * (i + 1), (K + 1) * i:(K + 1) * (i + 1)])
betaSE.append(sqrt(variances))
s = sqrt(variances)
c = hstack((a, b))
t = c / s
print('Size: {:}, Value:{:} Alpha Beta(VWM) Beta(SMB) Beta(HML)'.format(j + 1, k + 1))
print('Coefficients: {:>10,.4f} {:>10,.4f} {:>10,.4f} {:>10,.4f}'.format(a, b[0], b[1], b[2]))
print('Std Err. {:>10,.4f} {:>10,.4f} {:>10,.4f} {:>10,.4f}'.format(s[0], s[1], s[2], s[3]))
print('T-stat {:>10,.4f} {:>10,.4f} {:>10,.4f} {:>10,.4f}'.format(t[0], t[1], t[2], t[3]))
print('')
i += 1
The final block converts the standard errors of $\beta$ to be an array and saves the results.
betaSE = array(betaSE)
savez_compressed('fama-macbeth-results', alpha=alpha, beta=beta,
betaSE=betaSE, arpSE=arpSE, arp=arp, J=J, Jpval=Jpval)
Save Results¶
Save the estimated values for use in the $\LaTeX$ notebook.
from numpy import savez
savez('fama-macBeth-results.npz', arp=arp, beta=beta, arpSE=arpSE,
betaSE=betaSE, J=J, Jpval=Jpval)