Introduction¶
Instrumental variable models are used when regressors are endogenous or there is measurement error on the variable. These models make use of instruments which are correlated with the endogenous variable but not with the model error. All model estimated by this package can be described as
In this expression, \(x_{1i}\) is a set of \(k_1\) regressors that are exogenous while \(x_{2i}\) is a set of \(k_2\) regressors that are endogenous in the sense that \(Cov(x_{2i},\epsilon_i)\neq 0\). In total there are The \(k\) regressors in the model. The \(p_2\) element vector \(z_{2i}\) are instruments that explain \(x_{2i}\) but not \(y_i\). Note that \(x_{1i}\) and \(z_{1i}\) are the same since variables are also valid to use when projecting the endogenous variables. In total there are \(p=p_1+p_2=k_1+p_2\) variables available to use when projecting the endogenous regressors.
There are four estimation methods available to fit models of this type. All accept the same four required inputs:
dependent
- The variable to be modeled, \(y_i\) in the modelexog
- The exogenous regressors, \(x_{1i}\) in the model. Note that \(x_{1i}\) and \(z_{1i}\) are the same since variables are also valid to use when projecting the endogenous variables.endog
- The endogenous regressors, \(x_{2i}\) in the modelinstruments
- The instruments, \(z_{2i}\) in the model
import pandas as pd
import numpy as np
import statsmodels.api as sm
from linearmodels.iv import IV2SLS
from linearmodels.datasets import wage
data = wage.load()
dependent = np.log(data.wage)
exog = sm.add_constant(data.exper)
endog = data.educ
instruments = data.sibs
mod = IV2SLS(dependent, exog, endog, instruments)
res = mod.fit(cov_type='unadjusted')
res
IV-2SLS Estimation Summary
==============================================================================
Dep. Variable: wage R-squared: 0.0459
Estimator: IV-2SLS Adj. R-squared: 0.0438
No. Observations: 934 F-statistic: 23.872
Date: Mon, Mar 13 2017 P-value (F-stat) 0.0000
Time: 14:52:30 Distribution: chi2(2)
Cov. Estimator: unadjusted
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
const 4.4912 0.4692 9.5719 0.0000 3.5716 5.4108
exper 0.0341 0.0073 4.6649 0.0000 0.0198 0.0485
educ 0.1405 0.0290 4.8434 0.0000 0.0837 0.1974
==============================================================================
Endogenous: educ
Instruments: sibs
Unadjusted Covariance (Homoskedastic)
Debiased: False
Estimators¶
Four methods to estimate models are available.
Two-stage least squares (2SLS)
IV2SLS
Limited Information Maximum Likelihood (LIML) and related k-class estimators
IVLIML
Generalized Method of Moments (GMM)
IVGMM
Generalized Method of Moments using the Continuously Updating Estimator (CUE)
IVGMMCUE
All estimator require the same four key inputs, dependent
, exog
,
endog
and instruments
. In addition to these four required
parameters, optional arguments can be used to alter the default configuration.
Optional Arguments¶
2SLS Estimation¶
The 2SLS estimator is the simplest and has no optional arguments. The 2SLS
estimator nests OLS and so it is possible to estimate models using OLS by
specifying both endog
and instruments
as None
.
mod = IV2SLS(dependent, exog, None, None)
ols_res = mod.fit()
LIML Estimation¶
Two optional arguments can be used to alter the estimation method when using IVLIML
fuller
allows Fuller’s \(\alpha\) to be specified, which provides a finite sample correction to the usual LIML estimator.kappa
allows a user-specified value of \(\kappa\) to be provided in which case the LIML estimated value of \(\kappa\) is ignored.
GMM and GMM-CUE Estimation¶
weight_type
accepts a string which indicates the type of weighting matrix to use in the GMM estimation procedure. There are four classes if weighting matrices available:‘unadjusted’ - Assumes the GMM moment conditions are homoskedastic. See
HomoskedasticWeightMatrix
.‘robust’ - Allows the GMM moment conditions to be heteroskedastic while assuming they are not correlated across observations. See
HeteroskedasticWeightMatrix
.‘kernel’ - Allows for both heteroskedasticity and autocorrelation in the moment conditions. See
KernelWeightMatrix
.‘cluster’ - Allows for a one-way cluster structure where moment conditions within a cluster are correlated. See
OneWayClusteredWeightMatrix
.
Each weight type accepts a set of additional parameters which are similar to those for the corresponding covariance estimator.
Model Estimation and Covariance Specification¶
All models are estimated using the fit
method which provides an
opportunity to customize the parameter covariance estimator used to
perform inference. Four classes of covariance estimators are available:
‘unadjusted’ - Assumes the model scores are homoskedastic. See
HomoskedasticCovariance
.‘robust’, ‘heteroskedastic’ - Allows the model scores to be heteroskedastic while assuming they are not correlated across observations. See
HeteroskedasticCovariance
.‘kernel’ - Allows for both heteroskedasticity and autocorrelation in the model scores. The estimator allows the
kernel
to be selected from‘bartlett’, ‘newey-west` - Triangular kernel utilized in the common Newey-West estimator.
‘parzen’ - Parzen’s kernel.
‘qs’, ‘quadratic-spectral’ - The quadratic spectral kernel studied by Andrews.
The
bandwidth
can also be specified. If not provided, an estimate of the optimal value is used.See
KernelCovariance
.‘clustered’, ‘one-way’ - Allows for a one-way cluster structure where model scores within a cluster are correlated. See
ClusteredCovariance
. Using clustered covariance requires passing an array containing information containing cluster membership information.
mod = IV2SLS(dependent, exog, endog, instruments)
iq_bands = data.IQ // 20
res = mod.fit(cov_type='clustered', clusters=iq_bands)
GMM Estimation¶
GMM allows additional inputs that affect the method of estimation. In
particular, the default is to use two-step GMM. One-step (inefficient)
GMM can be forced by setting iter_limit
to 1. If iter_limit
is
raised above 2, then an iterative method is used where multiple steps
are used to estimate the model parameters. If normalized model parameters
change by less than tol
across successive iterations, then the estimation
is assumed to converge and the iterations are stopped.
By default, the first-step uses the average outer-product of the instruments
as the weighting matrix. initial_weight
allows a user-specified choice of
weighting matrix to be used instead.
GMM-CUE Estimation¶
GMM CUE uses a non-linear optimizer to optimize the GMM objective directly
where both the moment condition and the moment score estimator change with
parameter values. starting
allows a user-specified set of starting values
to be used in-place of the default starting values and display
controls
whether iterative output is printed during estimation.