# Seemingly Unrelated Regression (SUR/SURE)¶

Seemingly unrelated regression is a system regression estimator which jointly estimates multiple models. This allows joint hypothesis testing of parameters across models since the parameter covariance is robust to correlation of residuals across models. It can also lead to more precise parameter estimates if some residuals are conditionally homoskedastic and regressors differ across equations.

## Basic Notation¶

Assume \(K\) series are observed for \(N\) time periods. Denote the stacked observations of series \(i\) as \(Y_{i}\) and its corresponding regressor matrix \(X_{i}\). The complete model spanning all series can be specified as

The OLS estimator of \(\beta\) is then

A GLS estimator can be similarly defined as

where \(\Omega=\Sigma\otimes I_{N}\) is the joint covariance of the residuals. In practice \(\Sigma\) is not known as so a feasible GLS (FGLS) is implemented in two steps. The first uses OLS to estimate \(\hat{\epsilon}\) and then estimates the residual covariance as

The feasible GLS estimator is then

where \(\hat{\Omega}=\hat{\Sigma}\otimes I_{N}\).

# Covariance Estimation¶

**Homoskedastic Data**

When residuals are assumed to be homoskedastic, the covariance can be consistently estimated by

where \(\Delta\) is the weighting matrix used in the parameter estimator. For example, in OLS \(\Delta=I_{NK}\) while in FGLS\(\Delta=\hat{\Omega}\). The estimator supports using FGLS with an assumption that \(\hat{\Sigma}\) is diagonal or using a user-specified value of \(\Sigma\). When the FGLS estimator is used, this simplifies to

**Heteroskedastic Data**

When residuals are heteroskedastic, the covariance can be consistently estimated by

where \(\hat{S}\) is a covariance estimator that is clustered by time. Define the matrix of scores as

where \(\iota_{M}\) is a vector of ones with \(M\) elements where \(M\) is the number of columns in \(X\) and \(\odot\) is element by element multiplication. The clustered-by-time score covariance estimator is then

where

is the sum of the scores across models \(\left(j\right)\) that have the same observation index \(i\). This estimator allows arbitrary correlation of residuals with the same observation index.

**Debiased Estimators**

When the debiased flag is set, a small sample adjustment if applied so that element \(ij\) of \(\hat{\Sigma}\) is scaled by

## Other Statistics¶

**Goodness of fit**

The reported \(R^{2}\) is always for the data, or weighted data is weights are used, for either OLS or GLS. The means that the reported \(R^{2}\) for the GLS estimator may be negative

**F statistics**

When the debiased covariance estimator is used (small sample adjustment) the reported \(F\) statistics use \(K\left(T-\bar{P}\right)\) where \(\bar{P}=K^{-1}\sum_{i=1}^{K}P_{i}\) where \(P_{i}\) is the number of variables including the constant in model \(i\). When models include restrictions it may be the case that the covariance is singular. When this occurs, the \(F\) statistic cannot be calculated.

# Memory efficient calculations¶

The parameter estimates in the SUR model can are

where

where \(\sigma^{ij}=\left[\Sigma^{-1}\right]_{ij}\). Since \(X\) is block diagonal,

and

Similarly

This suggests that constructing the high dimensional \(\Omega^{-1}\)
can be avoided and many needless multiplications can be avoided by
directly computing these two components and the solution can be found
using `solve`

.

# Three Stage Least Squares (3SLS)¶

Three-stage least squares extends SUR to the case of endogenous variables. It is the system extension of two-stage least squares (2SLS). Like SUR, 3SLS jointly estimates multiple models which allows joint hypothesis testing of parameters. It can also lead to more precise parameter estimates if some residuals are conditionally homoskedastic and regressors differ across equations.

## Basic Notation¶

Assume \(K\) series are observed for \(N\) time periods. Denote the stacked observations of series \(i\) as \(Y_{i}\) and its corresponding regressor matrix \(X_{i}\). The complete model spanning all series can be specified as

where \(X_{i}\) contains both exogenous and endogenous variables. For each equation denote the available instruments as \(Z_{i}\). The instrument include both exogenous regressors and excluded instrumental variables. Define the fitted values of the exogenous variables as

The IV estimator of \(\beta\) is then

where

A GLS estimator can be similarly defined as

Inference is identical to the SUR replacing \(X\) with \(\hat{X}\) where model residuals are estimated using \(X\) and \(Y\).

# System Generalized Method of Moments (GMM)¶

The system GMM estimator uses a set of moment conditions of the form

where \(g_{ji}=z_{ji}^{\prime}\left(y_{ji}-x_{ji}\beta\right)\) where \(j\) is the equation index, \(z_{ji}\) is a set of instruments for equation \(j\) at time period i and \(x_{ji}\) and \(y_{ji}\) are similarly defined. Define

then the parameters are estimated to minimize

The solution to this minimization problem is

\(W\) is the moment weighting matrix and for efficiency should be set to the covariance of the moment conditions. In practice a 2-step estimator is required since estimators of \(W\) depend on \(\beta\). In the first step, \(W=N^{-1}Z^{\prime}Z\) is used to construct an initial estimate of \(\hat{\beta}\). This would be efficient if residuals were homoskedastic and uncorrelated. This choice will produce estimates that are identical to system 2SLS. Using the initial parameter estimated, the covariance of the moment conditions is computed used one of two forms using the estimated model residuals \(\hat{\epsilon}\).

## Homoskedastic Weighting¶

When residuals are assumed to be conditionally homoskedastic, then

where \(\hat{\Sigma}\)is the \(K\) by \(K\) covariance of the model residuals. This specification assumes that residuals are not correlated across observation index (only across models)

## Heteroskedastic Weighting¶

When residuals are heteroskedastic, the weighting matrix is estimated from

which allows cross-sectional dependence but no dependence across observation index.

The efficient estimates are computed using the same expression only with \(\hat{W}\),

Both weighting estimators can be centered which will enforce an exact mean 0 to the moment conditions. When models are over-identified, this can improve accuracy of 2-step estimators.

## Parameter Covariance¶

Covariance estimators are available for either homoskedastic or heteroskedastic data. The only difference is in how the covariance of the moments is computed. In either case, the parameter covariance can be estimated by

The covariance of the scores \(\hat{\Omega}\) estimated using either the homoskedastic weighting formula or the heteroskedastic weighting formula immediately above. This allow the choice of weighting matrix to be set separately from the score covariance estimator, although in most cases these should be the same, and so the covariance of the estimated parameters will simplify to

# Testing Covariance and Correlations¶

Two tests are available to test whether the residual covariance is diagonal. These are useful diagnostics when considering GLS estimation. If the tests reject the null, then the data suggest that GLS estimation should improve efficiency as long as the regressors are not all common. If the null is not rejected, then the covariance is not statistically different from a diagonal covariance and there are unlikely to be gains to using GLS. The Breusch-Pagan test directly examines the correlations of the residuals, and is defined as

The likelihood ratio is defined as the difference between the log determinants of a diagonal covariance matrix and the full unrestricted covariance matrix,

The asymptotic distribution of the likelihood ratio test requires homoskedasticity.

# System Measures of Fit (\(R^{2}\))¶

Most measures of fit for systems of equations assume that all equations contains a constant (or equivalent). Caution is needed when interpreting if equations exclude constant terms.

## Overall \(R^{2}\)¶

The overall \(R^{2}\) is defined as

where \(TSS_{i}\) is centered if equation \(i\) contains a constant and uncentered if it does not. When all equations contain constants, it is identical to Judge’s measure.

## McElroy¶

McElroy’s (1977) measure is defined as

where \(\iota\) is a \(N\) by 1 vector of 1s. This is implemented as

where

and

where the vector of mean parameters is estimated by fitting a SURE to the data (using user specified weights, if provided) where \(X_{i}=\iota\) contains only a constant. Greene provides an alternative formulation of this measure as

where \(\hat{\Psi}=N^{-1}\tilde{Y}^{\prime}\tilde{Y}\) is the covariance of the demeaned data.

## Berndt¶

Berndt’s measure is defined as

## Judge¶

Judge’s measure is the naive OLS \(R^{2}\) for the system,

## Dhrymes¶

Dhrymes’ measure of fit is a weighted average of the \(R^{2}\) of each equation,

where \(R_{i}^{2}\) is the coefficient of determination from equation \(i\).

Sources used in writing the code include [Greene], [Stata] and [Henningsen].