System Regression Estimators

Seemingly Unrelated Regression (SUR/SURE)

class SUR(equations, *, sigma=None)[source]

Seemingly unrelated regression estimation (SUR/SURE)

Parameters:
  • equations (dict) – Dictionary-like structure containing dependent and exogenous variable values. Each key is an equations label and must be a string. Each value must be either a tuple of the form (dependent, exog, [weights]) or a dictionary with keys ‘dependent’ and ‘exog’ and the optional key ‘weights’.
  • sigma (array-like) – Pre-specified residual covariance to use in GLS estimation. If not provided, FGLS is implemented based on an estimate of sigma.

Notes

Estimates a set of regressions which are seemingly unrelated in the sense that separate estimation would lead to consistent parameter estimates. Each equation is of the form

\[y_{i,k} = x_{i,k}\beta_i + \epsilon_{i,k}\]

where k denotes the equation and i denoted the observation index. By stacking vertically arrays of dependent and placing the exogenous variables into a block diagonal array, the entire system can be compactly expressed as

\[Y = X\beta + \epsilon\]

where

\[\begin{split}Y = \left[\begin{array}{x}Y_1 \\ Y_2 \\ \vdots \\ Y_K\end{array}\right]\end{split}\]

and

\[\begin{split}X = \left[\begin{array}{cccc} X_1 & 0 & \ldots & 0 \\ 0 & X_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & X_K \end{array}\right]\end{split}\]

The system OLS estimator is

\[\hat{\beta}_{OLS} = (X'X)^{-1}X'Y\]

When certain conditions are satisfied, a GLS estimator of the form

\[\hat{\beta}_{GLS} = (X'\Omega^{-1}X)^{-1}X'\Omega^{-1}Y\]

can improve accuracy of coefficient estimates where

\[\Omega = \Sigma \otimes I_N\]

where \(\Sigma\) is the covariance matrix of the residuals.

SUR is a special case of 3SLS where there are no endogenous regressors and no instruments.

add_constraints(r, q=None)
Parameters:
  • r (DataFrame) – Constraint matrix. nconstraints by nparameters
  • q (Series, optional) – Constraint values (nconstraints). If not set, set to 0

Notes

Constraints are of the form

\[r \beta = q\]

The property param_names can be used to determine the order of parameters.

constraints

Model constraints

Returns:cons – Constraint object
Return type:LinearConstraint
fit(*, method=None, full_cov=True, iterate=False, iter_limit=100, tol=1e-06, cov_type='robust', **cov_config)

Estimate model parameters

Parameters:
  • method ({None, 'gls', 'ols'}) – Estimation method. Default auto selects based on regressors, using OLS only if all regressors are identical. The other two arguments force the use of GLS or OLS.
  • full_cov (bool) – Flag indicating whether to utilize information in correlations when estimating the model with GLS
  • iterate (bool) – Flag indicating to iterate GLS until convergence of iter limit iterations have been completed
  • iter_limit (int) – Maximum number of iterations for iterative GLS
  • tol (float) – Tolerance to use when checking for convergence in iterative GLS
  • cov_type (str) –

    Name of covariance estimator. Valid options are

    • ’unadjusted’, ‘homoskedastic’ - Classic covariance estimator
    • ’robust’, ‘heteroskedastic’ - Heteroskedasticity robust covariance estimator
    • ’kernel’ - Allows for heteroskedasticity and autocorrelation
  • **cov_config – Additional parameters to pass to covariance estimator. All estimators support debiased which employs a small-sample adjustment
Returns:

results – Estimation results

Return type:

SystemResults

formula

Set or get the formula used to construct the model

classmethod from_formula(formula, data, *, sigma=None, weights=None)[source]

Specify a SUR using the formula interface

Parameters:
  • formula ({str, dict-like}) – Either a string or a dictionary of strings where each value in the dictionary represents a single equation. See Notes for a description of the accepted syntax
  • data (DataFrame) – Frame containing named variables
  • sigma (array-like) – Pre-specified residual covariance to use in GLS estimation. If not provided, FGLS is implemented based on an estimate of sigma.
  • weights (dict-like) – Dictionary like object (e.g. a DataFrame) containing variable weights. Each entry must have the same number of observations as data. If an equation label is not a key weights, the weights will be set to unity
Returns:

model – Model instance

Return type:

SUR

Notes

Models can be specified in one of two ways. The first uses curly braces to encapsulate equations. The second uses a dictionary where each key is an equation name.

Examples

The simplest format uses standard Patsy formulas for each equation in a dictionary. Best practice is to use an Ordered Dictionary

>>> import pandas as pd
>>> import numpy as np
>>> data = pd.DataFrame(np.random.randn(500, 4), columns=['y1', 'x1_1', 'y2', 'x2_1'])
>>> from linearmodels.system import SUR
>>> formula = {'eq1': 'y1 ~ 1 + x1_1', 'eq2': 'y2 ~ 1 + x2_1'}
>>> mod = SUR.from_formula(formula, data)

The second format uses curly braces {} to surround distinct equations

>>> formula = '{y1 ~ 1 + x1_1} {y2 ~ 1 + x2_1}'
>>> mod = SUR.from_formula(formula, data)

It is also possible to include equation labels when using curly braces

>>> formula = '{eq1: y1 ~ 1 + x1_1} {eq2: y2 ~ 1 + x2_1}'
>>> mod = SUR.from_formula(formula, data)
has_constant

Vector indicating which equations contain constants

classmethod multivariate_ls(dependent, exog)[source]

Interface for specification of multivariate regression models

Parameters:
  • dependent (array-like) – nobs by ndep array of dependent variables
  • exog (array-like) – nobs by nvar array of exogenous regressors common to all models
Returns:

model – Model instance

Return type:

SUR

Notes

Utility function to simplify the construction of multivariate regression models which all use the same regressors. Constructs the dictionary of equations from the variables using the common exogenous variable.

Examples

A simple CAP-M can be estimated as a multivariate regression

>>> from linearmodels.datasets import french
>>> from linearmodels.system import SUR
>>> data = french.load()
>>> portfolios = data[['S1V1','S1V5','S5V1','S5V5']]
>>> factors = data[['MktRF']].copy()
>>> factors['alpha'] = 1
>>> mod = SUR.multivariate_ls(portfolios, factors)
param_names

Model parameter names

Returns:names – Normalized, unique model parameter names
Return type:list[str]
predict(params, *, equations=None, data=None, eval_env=8)

Predict values for additional data

Parameters:
  • params (array-like) – Model parameters (nvar by 1)
  • equations (dict) – Dictionary-like structure containing exogenous and endogenous variables. Each key is an equations label and must match the labels used to fir the model. Each value must be either a tuple of the form (exog, endog) or a dictionary with keys ‘exog’ and ‘endog’. If predictions are not required for one of more of the model equations, these keys can be omitted.
  • data (DataFrame) – Values to use when making predictions from a model constructed from a formula
  • eval_env (int) – Depth of use when evaluating formulas using Patsy.
Returns:

predictions – Fitted values from supplied data and parameters

Return type:

DataFrame

Notes

If data is not none, then equations must be none. Predictions from models constructed using formulas can be computed using either equations, which will treat these are arrays of values corresponding to the formula-process data, or using data which will be processed using the formula used to construct the values corresponding to the original model specification.

When using exog and endog, the regressor array for a particular equation is assembled as [equations[eqn][‘exog’], equations[eqn][‘endog’]] where eqn is an equation label. These must correspond to the columns in the estimated model.

reset_constraints()

Remove all model constraints

Three-Stage Least Squares (3SLS)

class IV3SLS(equations, *, sigma=None)[source]

Three-stage Least Squares (3SLS) Estimator

Parameters:
  • equations (dict) – Dictionary-like structure containing dependent, exogenous, endogenous and instrumental variables. Each key is an equations label and must be a string. Each value must be either a tuple of the form (dependent, exog, endog, instrument[, weights]) or a dictionary with keys ‘dependent’, and at least one of ‘exog’ or ‘endog’ and ‘instruments’. When using a tuple, values must be provided for all 4 variables, although either empty arrays or None can be passed if a category of variable is not included in a model. The dictionary may contain optional keys for ‘exog’, ‘endog’, ‘instruments’, and ‘weights’. ‘exog’ can be omitted if all variables in an equation are endogenous. Alternatively, ‘exog’ can contain either an empty array or None to indicate that an equation contains no exogenous regressors. Similarly ‘endog’ and ‘instruments’ can either be omitted or may contain an empty array (or None) if all variables in an equation are exogenous.
  • sigma (array-like) – Pre-specified residual covariance to use in GLS estimation. If not provided, FGLS is implemented based on an estimate of sigma.

Notes

Estimates a set of regressions which are seemingly unrelated in the sense that separate estimation would lead to consistent parameter estimates. Each equation is of the form

\[y_{i,k} = x_{i,k}\beta_i + \epsilon_{i,k}\]

where k denotes the equation and i denoted the observation index. By stacking vertically arrays of dependent and placing the exogenous variables into a block diagonal array, the entire system can be compactly expressed as

\[Y = X\beta + \epsilon\]

where

\[\begin{split}Y = \left[\begin{array}{x}Y_1 \\ Y_2 \\ \vdots \\ Y_K\end{array}\right]\end{split}\]

and

\[\begin{split}X = \left[\begin{array}{cccc} X_1 & 0 & \ldots & 0 \\ 0 & X_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & X_K \end{array}\right]\end{split}\]

The system instrumental variable (IV) estimator is

\[\begin{split}\hat{\beta}_{IV} & = (X'Z(Z'Z)^{-1}Z'X)^{-1}X'Z(Z'Z)^{-1}Z'Y \\ & = (\hat{X}'\hat{X})^{-1}\hat{X}'Y\end{split}\]

where \(\hat{X} = Z(Z'Z)^{-1}Z'X\) and. When certain conditions are satisfied, a GLS estimator of the form

\[\hat{\beta}_{3SLS} = (\hat{X}'\Omega^{-1}\hat{X})^{-1}\hat{X}'\Omega^{-1}Y\]

can improve accuracy of coefficient estimates where

\[\Omega = \Sigma \otimes I_N\]

where \(\Sigma\) is the covariance matrix of the residuals.

add_constraints(r, q=None)[source]
Parameters:
  • r (DataFrame) – Constraint matrix. nconstraints by nparameters
  • q (Series, optional) – Constraint values (nconstraints). If not set, set to 0

Notes

Constraints are of the form

\[r \beta = q\]

The property param_names can be used to determine the order of parameters.

constraints

Model constraints

Returns:cons – Constraint object
Return type:LinearConstraint
fit(*, method=None, full_cov=True, iterate=False, iter_limit=100, tol=1e-06, cov_type='robust', **cov_config)[source]

Estimate model parameters

Parameters:
  • method ({None, 'gls', 'ols'}) – Estimation method. Default auto selects based on regressors, using OLS only if all regressors are identical. The other two arguments force the use of GLS or OLS.
  • full_cov (bool) – Flag indicating whether to utilize information in correlations when estimating the model with GLS
  • iterate (bool) – Flag indicating to iterate GLS until convergence of iter limit iterations have been completed
  • iter_limit (int) – Maximum number of iterations for iterative GLS
  • tol (float) – Tolerance to use when checking for convergence in iterative GLS
  • cov_type (str) –

    Name of covariance estimator. Valid options are

    • ’unadjusted’, ‘homoskedastic’ - Classic covariance estimator
    • ’robust’, ‘heteroskedastic’ - Heteroskedasticity robust covariance estimator
    • ’kernel’ - Allows for heteroskedasticity and autocorrelation
  • **cov_config – Additional parameters to pass to covariance estimator. All estimators support debiased which employs a small-sample adjustment
Returns:

results – Estimation results

Return type:

SystemResults

formula

Set or get the formula used to construct the model

classmethod from_formula(formula, data, *, sigma=None, weights=None)[source]

Specify a 3SLS using the formula interface

Parameters:
  • formula ({str, dict-like}) – Either a string or a dictionary of strings where each value in the dictionary represents a single equation. See Notes for a description of the accepted syntax
  • data (DataFrame) – Frame containing named variables
  • sigma (array-like) – Pre-specified residual covariance to use in GLS estimation. If not provided, FGLS is implemented based on an estimate of sigma.
  • weights (dict-like) – Dictionary like object (e.g. a DataFrame) containing variable weights. Each entry must have the same number of observations as data. If an equation label is not a key weights, the weights will be set to unity
Returns:

model – Model instance

Return type:

IV3SLS

Notes

Models can be specified in one of two ways. The first uses curly braces to encapsulate equations. The second uses a dictionary where each key is an equation name.

Examples

The simplest format uses standard Patsy formulas for each equation in a dictionary. Best practice is to use an Ordered Dictionary

>>> import pandas as pd
>>> import numpy as np
>>> cols = ['y1', 'x1_1', 'x1_2', 'z1', 'y2', 'x2_1', 'x2_2', 'z2']
>>> data = pd.DataFrame(np.random.randn(500, 8), columns=cols)
>>> from linearmodels.system import IV3SLS
>>> formula = {'eq1': 'y1 ~ 1 + x1_1 + [x1_2 ~ z1]',
...            'eq2': 'y2 ~ 1 + x2_1 + [x2_2 ~ z2]'}
>>> mod = IV3SLS.from_formula(formula, data)

The second format uses curly braces {} to surround distinct equations

>>> formula = '{y1 ~ 1 + x1_1 + [x1_2 ~ z1]} {y2 ~ 1 + x2_1 + [x2_2 ~ z2]}'
>>> mod = IV3SLS.from_formula(formula, data)

It is also possible to include equation labels when using curly braces

>>> formula = '{eq1: y1 ~ 1 + x1_1 + [x1_2 ~ z1]} {eq2: y2 ~ 1 + x2_1 + [x2_2 ~ z2]}'
>>> mod = IV3SLS.from_formula(formula, data)
has_constant

Vector indicating which equations contain constants

classmethod multivariate_ls(dependent, exog=None, endog=None, instruments=None)[source]

Interface for specification of multivariate IV models

Parameters:
  • dependent (array-like) – nobs by ndep array of dependent variables
  • exog (array-like, optional) – nobs by nexog array of exogenous regressors common to all models
  • endog (array-like, optional) – nobs by nengod array of endogenous regressors common to all models
  • instruments (array-like, optional) – nobs by ninstr array of instruments to use in all equations
Returns:

model – Model instance

Return type:

IV3SLS

Notes

At least one of exog or endog must be provided.

Utility function to simplify the construction of multivariate IV models which all use the same regressors and instruments. Constructs the dictionary of equations from the variables using the common exogenous, endogenous and instrumental variables.

param_names

Model parameter names

Returns:names – Normalized, unique model parameter names
Return type:list[str]
predict(params, *, equations=None, data=None, eval_env=8)[source]

Predict values for additional data

Parameters:
  • params (array-like) – Model parameters (nvar by 1)
  • equations (dict) – Dictionary-like structure containing exogenous and endogenous variables. Each key is an equations label and must match the labels used to fir the model. Each value must be either a tuple of the form (exog, endog) or a dictionary with keys ‘exog’ and ‘endog’. If predictions are not required for one of more of the model equations, these keys can be omitted.
  • data (DataFrame) – Values to use when making predictions from a model constructed from a formula
  • eval_env (int) – Depth of use when evaluating formulas using Patsy.
Returns:

predictions – Fitted values from supplied data and parameters

Return type:

DataFrame

Notes

If data is not none, then equations must be none. Predictions from models constructed using formulas can be computed using either equations, which will treat these are arrays of values corresponding to the formula-process data, or using data which will be processed using the formula used to construct the values corresponding to the original model specification.

When using exog and endog, the regressor array for a particular equation is assembled as [equations[eqn][‘exog’], equations[eqn][‘endog’]] where eqn is an equation label. These must correspond to the columns in the estimated model.

reset_constraints()[source]

Remove all model constraints

Generalized Method of Moments (GMM) Estimation of Systems

class IVSystemGMM(equations, *, sigma=None, weight_type='robust', **weight_config)[source]

System Generalized Method of Moments (GMM) estimation of linear IV models

Parameters:
  • equations (dict) – Dictionary-like structure containing dependent, exogenous, endogenous and instrumental variables. Each key is an equations label and must be a string. Each value must be either a tuple of the form (dependent, exog, endog, instrument[, weights]) or a dictionary with keys ‘dependent’, ‘exog’. The dictionary may contain optional keys for ‘endog’, ‘instruments’, and ‘weights’. Endogenous and/or Instrument can be empty if all variables in an equation are exogenous.
  • sigma (array-like) – Pre-specified residual covariance to use in GLS estimation. If not provided, FGLS is implemented based on an estimate of sigma. Only used if weight_type is ‘unadjusted’
  • weight_type (str) – Name of moment condition weight function to use in the GMM estimation
  • **weight_config – Additional keyword arguments to pass to the moment condition weight function

Notes

Estimates a linear model using GMM. Each equation is of the form

\[y_{i,k} = x_{i,k}\beta_i + \epsilon_{i,k}\]

where k denotes the equation and i denoted the observation index. By stacking vertically arrays of dependent and placing the exogenous variables into a block diagonal array, the entire system can be compactly expressed as

\[Y = X\beta + \epsilon\]

where

\[\begin{split}Y = \left[\begin{array}{x}Y_1 \\ Y_2 \\ \vdots \\ Y_K\end{array}\right]\end{split}\]

and

\[\begin{split}X = \left[\begin{array}{cccc} X_1 & 0 & \ldots & 0 \\ 0 & X_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & X_K \end{array}\right]\end{split}\]

The system GMM estimator uses the moment condition

\[z_{ij}(y_{ij} - x_{ij}\beta_j) = 0\]

where j indexes the equation. The estimator for the coefficients is given by

\[\begin{split}\hat{\beta}_{GMM} & = (X'ZW^{-1}Z'X)^{-1}X'ZW^{-1}Z'Y \\\end{split}\]

where \(W\) is a positive definite weighting matrix.

add_constraints(r, q=None)
Parameters:
  • r (DataFrame) – Constraint matrix. nconstraints by nparameters
  • q (Series, optional) – Constraint values (nconstraints). If not set, set to 0

Notes

Constraints are of the form

\[r \beta = q\]

The property param_names can be used to determine the order of parameters.

constraints

Model constraints

Returns:cons – Constraint object
Return type:LinearConstraint
fit(*, iter_limit=2, tol=1e-06, initial_weight=None, cov_type='robust', **cov_config)[source]

Estimate model parameters

Parameters:
  • iter_limit (int) – Maximum number of iterations for iterative GLS
  • tol (float) – Tolerance to use when checking for convergence in iterative GLS
  • initial_weight (ndarray, optional) – Initial weighting matrix to use in the first step. If not specified, uses the average outer-product of the set containing the exogenous variables and instruments.
  • cov_type (str) –

    Name of covariance estimator. Valid options are

    • ’unadjusted’, ‘homoskedastic’ - Classic covariance estimator
    • ’robust’, ‘heteroskedastic’ - Heteroskedasticity robust covariance estimator
  • **cov_config – Additional parameters to pass to covariance estimator. All estimators support debiased which employs a small-sample adjustment
Returns:

results – Estimation results

Return type:

GMMSystemResults

formula

Set or get the formula used to construct the model

classmethod from_formula(formula, data, *, weights=None, weight_type='robust', **weight_config)[source]

Specify a 3SLS using the formula interface

Parameters:
  • formula ({str, dict-like}) – Either a string or a dictionary of strings where each value in the dictionary represents a single equation. See Notes for a description of the accepted syntax
  • data (DataFrame) – Frame containing named variables
  • weights (dict-like) – Dictionary like object (e.g. a DataFrame) containing variable weights. Each entry must have the same number of observations as data. If an equation label is not a key weights, the weights will be set to unity
  • weight_type (str) –

    Name of moment condition weight function to use in the GMM estimation. Valid options are:

    • ’unadjusted’, ‘homoskedastic’ - Assume moments are homoskedastic
    • ’robust’, ‘heteroskedastic’ - Allow for heteroskedasticity
  • **weight_config – Additional keyword arguments to pass to the moment condition weight function
Returns:

model – Model instance

Return type:

IVSystemGMM

Notes

Models can be specified in one of two ways. The first uses curly braces to encapsulate equations. The second uses a dictionary where each key is an equation name.

Examples

The simplest format uses standard Patsy formulas for each equation in a dictionary. Best practice is to use an Ordered Dictionary

>>> import pandas as pd
>>> import numpy as np
>>> cols = ['y1', 'x1_1', 'x1_2', 'z1', 'y2', 'x2_1', 'x2_2', 'z2']
>>> data = pd.DataFrame(np.random.randn(500, 8), columns=cols)
>>> from linearmodels.system import IVSystemGMM
>>> formula = {'eq1': 'y1 ~ 1 + x1_1 + [x1_2 ~ z1]',
...            'eq2': 'y2 ~ 1 + x2_1 + [x2_2 ~ z2]'}
>>> mod = IVSystemGMM.from_formula(formula, data)

The second format uses curly braces {} to surround distinct equations

>>> formula = '{y1 ~ 1 + x1_1 + [x1_2 ~ z1]} {y2 ~ 1 + x2_1 + [x2_2 ~ z2]}'
>>> mod = IVSystemGMM.from_formula(formula, data)

It is also possible to include equation labels when using curly braces

>>> formula = '{eq1: y1 ~ 1 + x1_1 + [x1_2 ~ z1]} {eq2: y2 ~ 1 + x2_1 + [x2_2 ~ z2]}'
>>> mod = IVSystemGMM.from_formula(formula, data)
has_constant

Vector indicating which equations contain constants

multivariate_ls(dependent, exog=None, endog=None, instruments=None)

Interface for specification of multivariate IV models

Parameters:
  • dependent (array-like) – nobs by ndep array of dependent variables
  • exog (array-like, optional) – nobs by nexog array of exogenous regressors common to all models
  • endog (array-like, optional) – nobs by nengod array of endogenous regressors common to all models
  • instruments (array-like, optional) – nobs by ninstr array of instruments to use in all equations
Returns:

model – Model instance

Return type:

IV3SLS

Notes

At least one of exog or endog must be provided.

Utility function to simplify the construction of multivariate IV models which all use the same regressors and instruments. Constructs the dictionary of equations from the variables using the common exogenous, endogenous and instrumental variables.

param_names

Model parameter names

Returns:names – Normalized, unique model parameter names
Return type:list[str]
predict(params, *, equations=None, data=None, eval_env=8)

Predict values for additional data

Parameters:
  • params (array-like) – Model parameters (nvar by 1)
  • equations (dict) – Dictionary-like structure containing exogenous and endogenous variables. Each key is an equations label and must match the labels used to fir the model. Each value must be either a tuple of the form (exog, endog) or a dictionary with keys ‘exog’ and ‘endog’. If predictions are not required for one of more of the model equations, these keys can be omitted.
  • data (DataFrame) – Values to use when making predictions from a model constructed from a formula
  • eval_env (int) – Depth of use when evaluating formulas using Patsy.
Returns:

predictions – Fitted values from supplied data and parameters

Return type:

DataFrame

Notes

If data is not none, then equations must be none. Predictions from models constructed using formulas can be computed using either equations, which will treat these are arrays of values corresponding to the formula-process data, or using data which will be processed using the formula used to construct the values corresponding to the original model specification.

When using exog and endog, the regressor array for a particular equation is assembled as [equations[eqn][‘exog’], equations[eqn][‘endog’]] where eqn is an equation label. These must correspond to the columns in the estimated model.

reset_constraints()

Remove all model constraints