Introduction

Instrumental variable models are used when regressors are endogenous or there is measurement error on the variable. These models make use of instruments which are correlated with the endogenous variable but not with the model error. All model estimated by this package can be described as

\[\begin{split}y_i & = x_{1i}\beta_1 + x_{2i}\beta_2 + \epsilon_i \\ x_{2i} & = z_{1i}\delta + z_{2i}\gamma + \nu_i\end{split}\]

In this expression, \(x_{1i}\) is a set of \(k_1\) regressors that are exogenous while \(x_{2i}\) is a set of \(k_2\) regressors that are endogenous in the sense that \(Cov(x_{2i},\epsilon_i)\neq 0\). In total there are The \(k\) regressors in the model. The \(p_2\) element vector \(z_{2i}\) are instruments that explain \(x_{2i}\) but not \(y_i\). Note that \(x_{1i}\) and \(z_{1i}\) are the same since variables are also valid to use when projecting the endogenous variables. In total there are \(p=p_1+p_2=k_1+p_2\) variables available to use when projecting the endogenous regressors.

There are four estimation methods available to fit models of this type. All accept the same four required inputs:

  • dependent - The variable to be modeled, \(y_i\) in the model

  • exog - The exogenous regressors, \(x_{1i}\) in the model. Note that \(x_{1i}\) and \(z_{1i}\) are the same since variables are also valid to use when projecting the endogenous variables.

  • endog - The endogenous regressors, \(x_{2i}\) in the model

  • instruments - The instruments, \(z_{2i}\) in the model

import pandas as pd
import numpy as np
import statsmodels.api as sm
from linearmodels.iv import IV2SLS
from linearmodels.datasets import wage
data = wage.load()
dependent = np.log(data.wage)
exog = sm.add_constant(data.exper)
endog = data.educ
instruments = data.sibs

mod = IV2SLS(dependent, exog, endog, instruments)
res = mod.fit(cov_type='unadjusted')
res
                          IV-2SLS Estimation Summary
==============================================================================
Dep. Variable:                   wage   R-squared:                      0.0459
Estimator:                    IV-2SLS   Adj. R-squared:                 0.0438
No. Observations:                 934   F-statistic:                    23.872
Date:                Mon, Mar 13 2017   P-value (F-stat)                0.0000
Time:                        14:52:30   Distribution:                  chi2(2)
Cov. Estimator:            unadjusted

                             Parameter Estimates
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
const          4.4912     0.4692     9.5719     0.0000      3.5716      5.4108
exper          0.0341     0.0073     4.6649     0.0000      0.0198      0.0485
educ           0.1405     0.0290     4.8434     0.0000      0.0837      0.1974
==============================================================================

Endogenous: educ
Instruments: sibs
Unadjusted Covariance (Homoskedastic)
Debiased: False

Estimators

Four methods to estimate models are available.

  • Two-stage least squares (2SLS) IV2SLS

  • Limited Information Maximum Likelihood (LIML) and related k-class estimators IVLIML

  • Generalized Method of Moments (GMM) IVGMM

  • Generalized Method of Moments using the Continuously Updating Estimator (CUE) IVGMMCUE

All estimator require the same four key inputs, dependent, exog , endog and instruments. In addition to these four required parameters, optional arguments can be used to alter the default configuration.

Optional Arguments

2SLS Estimation

The 2SLS estimator is the simplest and has no optional arguments. The 2SLS estimator nests OLS and so it is possible to estimate models using OLS by specifying both endog and instruments as None.

mod = IV2SLS(dependent, exog, None, None)
ols_res = mod.fit()

LIML Estimation

Two optional arguments can be used to alter the estimation method when using IVLIML

  • fuller allows Fuller’s \(\alpha\) to be specified, which provides a finite sample correction to the usual LIML estimator.

  • kappa allows a user-specified value of \(\kappa\) to be provided in which case the LIML estimated value of \(\kappa\) is ignored.

GMM and GMM-CUE Estimation

  • weight_type accepts a string which indicates the type of weighting matrix to use in the GMM estimation procedure. There are four classes if weighting matrices available:

    • ‘unadjusted’ - Assumes the GMM moment conditions are homoskedastic. See HomoskedasticWeightMatrix.

    • ‘robust’ - Allows the GMM moment conditions to be heteroskedastic while assuming they are not correlated across observations. See HeteroskedasticWeightMatrix.

    • ‘kernel’ - Allows for both heteroskedasticity and autocorrelation in the moment conditions. See KernelWeightMatrix.

    • ‘cluster’ - Allows for a one-way cluster structure where moment conditions within a cluster are correlated. See OneWayClusteredWeightMatrix.

    Each weight type accepts a set of additional parameters which are similar to those for the corresponding covariance estimator.

Model Estimation and Covariance Specification

All models are estimated using the fit method which provides an opportunity to customize the parameter covariance estimator used to perform inference. Four classes of covariance estimators are available:

  • ‘unadjusted’ - Assumes the model scores are homoskedastic. See HomoskedasticCovariance.

  • ‘robust’, ‘heteroskedastic’ - Allows the model scores to be heteroskedastic while assuming they are not correlated across observations. See HeteroskedasticCovariance.

  • ‘kernel’ - Allows for both heteroskedasticity and autocorrelation in the model scores. The estimator allows the kernel to be selected from

    • ‘bartlett’, ‘newey-west` - Triangular kernel utilized in the common Newey-West estimator.

    • ‘parzen’ - Parzen’s kernel.

    • ‘qs’, ‘quadratic-spectral’ - The quadratic spectral kernel studied by Andrews.

    The bandwidth can also be specified. If not provided, an estimate of the optimal value is used.

    See KernelCovariance.

  • ‘clustered’, ‘one-way’ - Allows for a one-way cluster structure where model scores within a cluster are correlated. See ClusteredCovariance. Using clustered covariance requires passing an array containing information containing cluster membership information.

mod = IV2SLS(dependent, exog, endog, instruments)
iq_bands = data.IQ // 20
res = mod.fit(cov_type='clustered', clusters=iq_bands)

GMM Estimation

GMM allows additional inputs that affect the method of estimation. In particular, the default is to use two-step GMM. One-step (inefficient) GMM can be forced by setting iter_limit to 1. If iter_limit is raised above 2, then an iterative method is used where multiple steps are used to estimate the model parameters. If normalized model parameters change by less than tol across successive iterations, then the estimation is assumed to converge and the iterations are stopped.

By default, the first-step uses the average outer-product of the instruments as the weighting matrix. initial_weight allows a user-specified choice of weighting matrix to be used instead.

GMM-CUE Estimation

GMM CUE uses a non-linear optimizer to optimize the GMM objective directly where both the moment condition and the moment score estimator change with parameter values. starting allows a user-specified set of starting values to be used in-place of the default starting values and display controls whether iterative output is printed during estimation.