.. _iv-introduction: Introduction ------------ Instrumental variable models are used when regressors are endogenous or there is measurement error on the variable. These models make use of instruments which are correlated with the endogenous variable but not with the model error. All model estimated by this package can be described as .. math:: y_i & = x_{1i}\beta_1 + x_{2i}\beta_2 + \epsilon_i \\ x_{2i} & = z_{1i}\delta + z_{2i}\gamma + \nu_i In this expression, :math:`x_{1i}` is a set of :math:`k_1` regressors that are exogenous while :math:`x_{2i}` is a set of :math:`k_2` regressors that are endogenous in the sense that :math:`Cov(x_{2i},\epsilon_i)\neq 0`. In total there are The :math:`k` regressors in the model. The :math:`p_2` element vector :math:`z_{2i}` are instruments that explain :math:`x_{2i}` but not :math:`y_i`. Note that :math:`x_{1i}` and :math:`z_{1i}` are the same since variables are also valid to use when projecting the endogenous variables. In total there are :math:`p=p_1+p_2=k_1+p_2` variables available to use when projecting the endogenous regressors. There are four estimation methods available to fit models of this type. All accept the same four required inputs: * ``dependent`` - The variable to be modeled, :math:`y_i` in the model * ``exog`` - The exogenous regressors, :math:`x_{1i}` in the model. Note that :math:`x_{1i}` and :math:`z_{1i}` are the same since variables are also valid to use when projecting the endogenous variables. * ``endog`` - The endogenous regressors, :math:`x_{2i}` in the model * ``instruments`` - The instruments, :math:`z_{2i}` in the model .. code-block:: python import pandas as pd import numpy as np import statsmodels.api as sm from linearmodels.iv import IV2SLS from linearmodels.datasets import wage data = wage.load() dependent = np.log(data.wage) exog = sm.add_constant(data.exper) endog = data.educ instruments = data.sibs mod = IV2SLS(dependent, exog, endog, instruments) res = mod.fit(cov_type='unadjusted') res :: IV-2SLS Estimation Summary ============================================================================== Dep. Variable: wage R-squared: 0.0459 Estimator: IV-2SLS Adj. R-squared: 0.0438 No. Observations: 934 F-statistic: 23.872 Date: Mon, Mar 13 2017 P-value (F-stat) 0.0000 Time: 14:52:30 Distribution: chi2(2) Cov. Estimator: unadjusted Parameter Estimates ============================================================================== Parameter Std. Err. T-stat P-value Lower CI Upper CI ------------------------------------------------------------------------------ const 4.4912 0.4692 9.5719 0.0000 3.5716 5.4108 exper 0.0341 0.0073 4.6649 0.0000 0.0198 0.0485 educ 0.1405 0.0290 4.8434 0.0000 0.0837 0.1974 ============================================================================== Endogenous: educ Instruments: sibs Unadjusted Covariance (Homoskedastic) Debiased: False Estimators ========== Four methods to estimate models are available. * Two-stage least squares (2SLS) :class:`~linearmodels.iv.model.IV2SLS` * Limited Information Maximum Likelihood (LIML) and related k-class estimators :class:`~linearmodels.iv.model.IVLIML` * Generalized Method of Moments (GMM) :class:`~linearmodels.iv.model.IVGMM` * Generalized Method of Moments using the Continuously Updating Estimator (CUE) :class:`~linearmodels.iv.model.IVGMMCUE` All estimator require the same four key inputs, ``dependent``, ``exog`` , ``endog`` and ``instruments``. In addition to these four required parameters, optional arguments can be used to alter the default configuration. Optional Arguments ****************** 2SLS Estimation ^^^^^^^^^^^^^^^ The 2SLS estimator is the simplest and has no optional arguments. The 2SLS estimator nests OLS and so it is possible to estimate models using OLS by specifying both ``endog`` and ``instruments`` as ``None``. .. code-block:: python mod = IV2SLS(dependent, exog, None, None) ols_res = mod.fit() LIML Estimation ^^^^^^^^^^^^^^^ Two optional arguments can be used to alter the estimation method when using IVLIML * ``fuller`` allows Fuller's :math:`\alpha` to be specified, which provides a finite sample correction to the usual LIML estimator. * ``kappa`` allows a user-specified value of :math:`\kappa` to be provided in which case the LIML estimated value of :math:`\kappa` is ignored. GMM and GMM-CUE Estimation ^^^^^^^^^^^^^^^^^^^^^^^^^^ * ``weight_type`` accepts a string which indicates the type of weighting matrix to use in the GMM estimation procedure. There are four classes if weighting matrices available: * 'unadjusted' - Assumes the GMM moment conditions are homoskedastic. See :class:`~linearmodels.iv.gmm.HomoskedasticWeightMatrix`. * 'robust' - Allows the GMM moment conditions to be heteroskedastic while assuming they are not correlated across observations. See :class:`~linearmodels.iv.gmm.HeteroskedasticWeightMatrix`. * 'kernel' - Allows for both heteroskedasticity and autocorrelation in the moment conditions. See :class:`~linearmodels.iv.gmm.KernelWeightMatrix`. * 'cluster' - Allows for a one-way cluster structure where moment conditions within a cluster are correlated. See :class:`~linearmodels.iv.gmm.OneWayClusteredWeightMatrix`. Each weight type accepts a set of additional parameters which are similar to those for the corresponding covariance estimator. Model Estimation and Covariance Specification ============================================= All models are estimated using the ``fit`` method which provides an opportunity to customize the parameter covariance estimator used to perform inference. Four classes of covariance estimators are available: * 'unadjusted' - Assumes the model scores are homoskedastic. See :class:`~linearmodels.iv.covariance.HomoskedasticCovariance`. * 'robust', 'heteroskedastic' - Allows the model scores to be heteroskedastic while assuming they are not correlated across observations. See :class:`~linearmodels.iv.covariance.HeteroskedasticCovariance`. * 'kernel' - Allows for both heteroskedasticity and autocorrelation in the model scores. The estimator allows the ``kernel`` to be selected from * 'bartlett', 'newey-west` - Triangular kernel utilized in the common Newey-West estimator. * 'parzen' - Parzen's kernel. * 'qs', 'quadratic-spectral' - The quadratic spectral kernel studied by Andrews. The ``bandwidth`` can also be specified. If not provided, an estimate of the optimal value is used. See :class:`~linearmodels.iv.covariance.KernelCovariance`. * 'clustered', 'one-way' - Allows for a one-way cluster structure where model scores within a cluster are correlated. See :class:`~linearmodels.iv.covariance.ClusteredCovariance`. Using clustered covariance requires passing an array containing information containing cluster membership information. .. code-block:: python mod = IV2SLS(dependent, exog, endog, instruments) iq_bands = data.IQ // 20 res = mod.fit(cov_type='clustered', clusters=iq_bands) GMM Estimation ************** GMM allows additional inputs that affect the method of estimation. In particular, the default is to use two-step GMM. One-step (inefficient) GMM can be forced by setting ``iter_limit`` to 1. If ``iter_limit`` is raised above 2, then an iterative method is used where multiple steps are used to estimate the model parameters. If normalized model parameters change by less than ``tol`` across successive iterations, then the estimation is assumed to converge and the iterations are stopped. By default, the first-step uses the average outer-product of the instruments as the weighting matrix. ``initial_weight`` allows a user-specified choice of weighting matrix to be used instead. GMM-CUE Estimation ****************** GMM CUE uses a non-linear optimizer to optimize the GMM objective directly where both the moment condition and the moment score estimator change with parameter values. ``starting`` allows a user-specified set of starting values to be used in-place of the default starting values and ``display`` controls whether iterative output is printed during estimation.