Further Examples

Linear Instrumental-Variables Regression

These examples follow those in Chapter 6 of Microeconometrics Using Stata by Cameron & Trivedi.

The first step is to import the main estimator for linear IV models:

  • IV2SLS - standard two-stage least squares

  • IVLIML - Limited information maximum likelihood and k-class estimators

  • IVGMM - Generalized method of moment estimation

  • IVGMMCUE - Generalized method of moment estimation using continuously updating

[1]:
from linearmodels import IV2SLS, IVGMM, IVGMMCUE, IVLIML

Importing data

The data uses comes from the Medical Expenditure Panel Survey (MEPS) and includes data on out-of-pocket drug expenditure (in logs), individual characteristics, whether an individual was insured through an employer or union (a likely endogenous variable), and some candidate instruments including the percentage of income from Social Security Income, the size of the individual”s firm and whether the firm has multiple locations.

[2]:
from linearmodels.datasets import meps

data = meps.load()
data = data.dropna()
print(meps.DESCR)

age               Age
age2              Age-squared
black             Black
blhisp            Black or Hispanic
drugexp           Presc-drugs expense
educyr            Years of education
fair              Fair health
female            Female
firmsz            Firm size
fph               fair or poor health
good              Good health
hi_empunion       Insured thro emp/union
hisp              Hiapanic
income            Income
ldrugexp          log(drugexp)
linc              log(income)
lowincome         Low income
marry             Married
midincome         Middle income
msa               Metropolitan stat area
multlc            Multiple locations
poor              Poor health
poverty           Poor
priolist          Priority list cond
private           Private insurance
ssiratio          SSI/Income ratio
totchr            Total chronic cond
vegood            V-good health
vgh               vg or good health

Next the data – dependent, endogenous and controls – are summarized. The controls are grouped into a list to simplify model building.

[3]:
controls = ["totchr", "female", "age", "linc", "blhisp"]
print(data[["ldrugexp", "hi_empunion"] + controls].describe(percentiles=[]))
           ldrugexp   hi_empunion        totchr        female           age  \
count  10089.000000  10089.000000  10089.000000  10089.000000  10089.000000
mean       6.481361      0.382198      1.860938      0.577064     75.051740
std        1.362052      0.485949      1.292858      0.494050      6.682109
min        0.000000      0.000000      0.000000      0.000000     65.000000
50%        6.678342      0.000000      2.000000      1.000000     74.000000
max       10.180172      1.000000      9.000000      1.000000     91.000000

               linc        blhisp
count  10089.000000  10089.000000
mean       2.743275      0.163544
std        0.913143      0.369880
min       -6.907755      0.000000
50%        2.743160      0.000000
max        5.744476      1.000000

It is also worth examining the instruments.

[4]:
instruments = ["ssiratio", "lowincome", "multlc", "firmsz"]
print(data[instruments].describe(percentiles=[]))
           ssiratio     lowincome        multlc        firmsz
count  10089.000000  10089.000000  10089.000000  10089.000000
mean       0.536544      0.187432      0.062048      0.140529
std        0.367818      0.390277      0.241254      2.170389
min        0.000000      0.000000      0.000000      0.000000
50%        0.504522      0.000000      0.000000      0.000000
max        9.250620      1.000000      1.000000     50.000000

And finally the simple correlation between the endogenous variable and the instruments. Instruments must be correlated to be relevant (but also must be exogenous, which can”t be examined using simple correlation). The correlation of firmsz is especially low, which might lead to the weak instruments problem if used exclusively.

[5]:
data[["hi_empunion"] + instruments].corr()
[5]:
hi_empunion ssiratio lowincome multlc firmsz
hi_empunion 1.000000 -0.212431 -0.116419 0.119849 0.037352
ssiratio -0.212431 1.000000 0.253946 -0.190433 -0.044578
lowincome -0.116419 0.253946 1.000000 -0.062465 -0.008232
multlc 0.119849 -0.190433 -0.062465 1.000000 0.187275
firmsz 0.037352 -0.044578 -0.008232 0.187275 1.000000

add_constant from statsmodels is used to simplify the process of adding a constant column to the data.

[6]:
data["const"] = 1
controls = ["const"] + controls

2SLS as OLS

Before examining the IV estimators, it is worth noting that 2SLS nests the OLS estimator, so that a call to IV2SLS using None for the endogenous and instruments will produce OLS estimates of parameters.

The OLS estimates indicate that insurance through an employer or union leads to an increase in out-of-pocket drug expenditure.

[7]:
ivolsmod = IV2SLS(data.ldrugexp, data[["hi_empunion"] + controls], None, None)
res_ols = ivolsmod.fit()
print(res_ols)
                            OLS Estimation Summary
==============================================================================
Dep. Variable:               ldrugexp   R-squared:                      0.1770
Estimator:                        OLS   Adj. R-squared:                 0.1765
No. Observations:               10089   F-statistic:                    2262.6
Date:                Wed, Dec 11 2024   P-value (F-stat)                0.0000
Time:                        10:16:05   Distribution:                  chi2(6)
Cov. Estimator:                robust

                              Parameter Estimates
===============================================================================
             Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
-------------------------------------------------------------------------------
hi_empunion     0.0739     0.0260     2.8441     0.0045      0.0230      0.1248
const           5.8611     0.1570     37.320     0.0000      5.5533      6.1689
totchr          0.4404     0.0094     47.049     0.0000      0.4220      0.4587
female          0.0578     0.0254     2.2797     0.0226      0.0081      0.1075
age            -0.0035     0.0019    -1.8228     0.0683     -0.0073      0.0003
linc            0.0105     0.0137     0.7646     0.4445     -0.0164      0.0373
blhisp         -0.1513     0.0341    -4.4353     0.0000     -0.2182     -0.0844
===============================================================================

Just identified 2SLS

The just identified two-stage LS estimator uses as many instruments as endogenous variables. In this example there is one of each, using the SSI ratio as the instrument. The with the instrument, the effect of insurance through employer or union has a strong negative effect on drug expenditure.

[8]:
ivmod = IV2SLS(data.ldrugexp, data[controls], data.hi_empunion, data.ssiratio)
res_2sls = ivmod.fit()
print(res_2sls.summary)
                          IV-2SLS Estimation Summary
==============================================================================
Dep. Variable:               ldrugexp   R-squared:                      0.0640
Estimator:                    IV-2SLS   Adj. R-squared:                 0.0634
No. Observations:               10089   F-statistic:                    2000.9
Date:                Wed, Dec 11 2024   P-value (F-stat)                0.0000
Time:                        10:16:05   Distribution:                  chi2(6)
Cov. Estimator:                robust

                              Parameter Estimates
===============================================================================
             Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
-------------------------------------------------------------------------------
const           6.7872     0.2688     25.246     0.0000      6.2602      7.3141
totchr          0.4503     0.0102     44.157     0.0000      0.4303      0.4703
female         -0.0204     0.0326    -0.6257     0.5315     -0.0843      0.0435
age            -0.0132     0.0030    -4.4092     0.0000     -0.0191     -0.0073
linc            0.0870     0.0226     3.8436     0.0001      0.0426      0.1314
blhisp         -0.2174     0.0395    -5.5052     0.0000     -0.2948     -0.1400
hi_empunion    -0.8976     0.2211    -4.0592     0.0000     -1.3310     -0.4642
===============================================================================

Endogenous: hi_empunion
Instruments: ssiratio
Robust Covariance (Heteroskedastic)
Debiased: False

Multiple Instruments

Using multiple instruments only requires expanding the data array in the instruments input.

[9]:
ivmod = IV2SLS(
    data.ldrugexp, data[controls], data.hi_empunion, data[["ssiratio", "multlc"]]
)
res_2sls_robust = ivmod.fit()

Alternative covariance estimators

All estimator allow for three types of parameter covariance estimator:

  • "unadjusted" is the classic homoskedastic estimator

  • "robust" is robust to heteroskedasticity

  • "clustered" allows one- or two-way clustering to account for additional sources of dependence between the model scores

  • "kernel" produces a heteroskedasticity-autocorrelation robust covariance estimator

The default is "robust".

These are all passed using the keyword input cov_type. Using clustered requires also passing the clustering variable(s).

[10]:
ivmod = IV2SLS(
    data.ldrugexp, data[controls], data.hi_empunion, data[["ssiratio", "multlc"]]
)
res_2sls_std = ivmod.fit(cov_type="unadjusted")

GMM Estimation

GMM estimation can be more efficient than 2SLS when there are more than one instrument. By default, 2-step efficient GMM is used (assuming the weighting matrix is correctly specified). It is possible to iterate until convergence using the optional keyword input iter_limit, which is naturally 2 by default. Generally, GMM-CUE would be preferred to using multiple iterations of standard GMM.

The default weighting matrix is robust to heteroskedasticity (but not clustering).

[11]:
ivmod = IVGMM(
    data.ldrugexp, data[controls], data.hi_empunion, data[["ssiratio", "multlc"]]
)
res_gmm = ivmod.fit()

Changing the weighting matrix structure in GMM estimation

The weighting matrix in the GMM objective function can be altered when creating the model. This example uses clustered weight by age. The covariance estimator should usually match the weighting matrix, and so clustering is also used here.

[12]:
ivmod = IVGMM(
    data.ldrugexp,
    data[controls],
    data.hi_empunion,
    data[["ssiratio", "multlc"]],
    weight_type="clustered",
    clusters=data.age,
)
res_gmm_clustered = ivmod.fit(cov_type="clustered", clusters=data.age)

Continuously updating GMM

The continuously updating GMM estimator simultaneously optimizes the moment conditions and the weighting matrix. It can be more efficient (in the second order sense) than standard 2-step GMM, although it can also be fragile. Here the optional input display is used to produce the output of the non-linear optimizer used to estimate the parameters.

[13]:
ivmod = IVGMMCUE(
    data.ldrugexp, data[controls], data.hi_empunion, data[["ssiratio", "multlc"]]
)
res_gmm_cue = ivmod.fit(cov_type="robust", display=True)
         Current function value: 1.045365
         Iterations: 10
         Function evaluations: 420
         Gradient evaluations: 51
/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/scipy/optimize/_minimize.py:726: OptimizeWarning: Desired error not necessarily achieved due to precision loss.
  res = _minimize_bfgs(fun, x0, args, jac, callback, **options)

Comparing results

The function compare can be used to compare the results of multiple models, possibly with different variables, estimators and/or instruments. Usually a dictionary or OrderedDict is used to hold results since the keys are used as model names. The advantage of an OrderedDict is that it will preserve the order of the models in the presentation.

With the expectation of the OLS estimate, the parameter estimates are fairly consistent. Standard errors vary slightly although the conclusions reached are not sensitive to the choice of covariance estimator either. T-stats are reported in parentheses.

[14]:
from collections import OrderedDict

from linearmodels.iv.results import compare

res = OrderedDict()
res["OLS"] = res_ols
res["2SLS"] = res_2sls
res["2SLS-Homo"] = res_2sls_std
res["2SLS-Hetero"] = res_2sls_robust
res["GMM"] = res_gmm
res["GMM Cluster(Age)"] = res_gmm_clustered
res["GMM-CUE"] = res_gmm_cue
print(compare(res))
                                                     Model Comparison
==========================================================================================================================
                               OLS          2SLS      2SLS-Homo   2SLS-Hetero           GMM GMM Cluster(Age)       GMM-CUE
--------------------------------------------------------------------------------------------------------------------------
Dep. Variable             ldrugexp      ldrugexp       ldrugexp      ldrugexp      ldrugexp         ldrugexp      ldrugexp
Estimator                      OLS       IV-2SLS        IV-2SLS       IV-2SLS        IV-GMM           IV-GMM        IV-GMM
No. Observations             10089         10089          10089         10089         10089            10089         10089
Cov. Est.                   robust        robust     unadjusted        robust        robust        clustered        robust
R-squared                   0.1770        0.0640         0.0414        0.0414        0.0406           0.0292        0.0388
Adj. R-squared              0.1765        0.0634         0.0409        0.0409        0.0400           0.0286        0.0382
F-statistic                 2262.6        2000.9         1882.3        1955.4        1952.6           1700.8        1949.2
P-value (F-stat)            0.0000        0.0000         0.0000        0.0000        0.0000           0.0000        0.0000
==================     ===========   ===========   ============   ===========   ===========      ===========   ===========
hi_empunion                 0.0739       -0.8976        -0.9899       -0.9899       -0.9933          -1.0359       -1.0002
                          (2.8441)     (-4.0592)      (-5.1501)     (-4.8386)     (-4.8530)        (-5.0683)     (-4.8810)
const                       5.8611        6.7872         6.8752        6.8752        6.8778           6.7277        6.8847
                          (37.320)      (25.246)       (28.030)      (26.660)      (26.658)         (13.299)      (26.657)
totchr                      0.4404        0.4503         0.4512        0.4512        0.4510           0.4482        0.4510
                          (47.049)      (44.157)       (42.942)      (43.769)      (43.738)         (33.833)      (43.701)
female                      0.0578       -0.0204        -0.0278       -0.0278       -0.0282          -0.0245       -0.0288
                          (2.2797)     (-0.6257)      (-0.8933)     (-0.8653)     (-0.8752)        (-0.8398)     (-0.8928)
age                        -0.0035       -0.0132        -0.0141       -0.0141       -0.0142          -0.0119       -0.0142
                         (-1.8228)     (-4.4092)      (-5.0834)     (-4.8753)     (-4.8773)        (-1.8928)     (-4.8969)
linc                        0.0105        0.0870         0.0943        0.0943        0.0945           0.0957        0.0950
                          (0.7646)      (3.8436)       (4.4400)      (4.3079)      (4.3142)         (6.4934)      (4.3333)
blhisp                     -0.1513       -0.2174        -0.2237       -0.2237       -0.2231          -0.2091       -0.2236
                         (-4.4353)     (-5.5052)      (-5.7805)     (-5.6514)     (-5.6344)        (-4.1662)     (-5.6421)
==================== ============= ============= ============== ============= =============    ============= =============
Instruments                             ssiratio       ssiratio      ssiratio      ssiratio         ssiratio      ssiratio
                                                         multlc        multlc        multlc           multlc        multlc
--------------------------------------------------------------------------------------------------------------------------

T-stats reported in parentheses

Testing endogeneity

The Durbin test is a classic of endogeneity which compares OLS estimates with 2SLS and exploits the fact that OLS estimates will be relatively efficient. Durbin”s test is not robust to heteroskedasticity.

[15]:
res_2sls.durbin()
[15]:
Durbin test of exogeneity
H0: All endogenous variables are exogenous
Statistic: 25.2819
P-value: 0.0000
Distributed: chi2(1)
WaldTestStatistic, id: 0x7f6664e4b5b0

The Wu-Hausman test is a variant of the Durbin test that uses a slightly different form.

[16]:
res_2sls.wu_hausman()
[16]:
Wu-Hausman test of exogeneity
H0: All endogenous variables are exogenous
Statistic: 25.3253
P-value: 0.0000
Distributed: F(1,10081)
WaldTestStatistic, id: 0x7f666758b1f0

The test statistic can be directly replicated using the squared t-stat in a 2-stage approach where the first stage regresses the endogenous variable on the controls and instrument and the second stage regresses the dependent variable on the controls, the endogenous regressor and the residuals. If the regressor was in fact exogenous, the residuals should not be correlated with the dependent variable.

[17]:
import pandas as pd

step1 = IV2SLS(data.hi_empunion, data[["ssiratio"] + controls], None, None).fit()
resids = step1.resids
exog = pd.concat([data[["hi_empunion"] + controls], resids], axis=1)
step2 = IV2SLS(data.ldrugexp, exog, None, None).fit(cov_type="unadjusted")
print(step2.tstats.residual**2)
25.34541049972333

Wooldridge”s regression-based test of exogeneity is robust to heteroskedasticity since it inherits the covariance estimator from the model. Here there is little difference.

[18]:
res_2sls.wooldridge_regression
[18]:
Wooldridge's regression test of exogeneity
H0: Endogenous variables are exogenous
Statistic: 26.4542
P-value: 0.0000
Distributed: chi2(1)
WaldTestStatistic, id: 0x7f666485ab60

Wooldridge”s score test is an alternative to the regression test, although it usually has slightly less power since it is an LM rather than a Wald type test.

[19]:
res_2sls.wooldridge_score
[19]:
Wooldridge's score test of exogeneity
H0: Endogenous variables are exogenous
Statistic: 24.9350
P-value: 0.0000
Distributed: chi2(1)
WaldTestStatistic, id: 0x7f666485b0a0

Exogeneity Testing

When there is more than one instrument (the model is overidentified), the J test can be used in GMM models to test whether the model is overidentified – in other words, whether the instruments are actually exogenous (assuming they are relevant). In the case with 2 instruments there is no evidence that against the null.

[20]:
res_gmm.j_stat
[20]:
H0: Expected moment conditions are equal to 0
Statistic: 1.0475
P-value: 0.3061
Distributed: chi2(1)
WaldTestStatistic, id: 0x7f6664e4add0

When all instruments are included the story changes, and some of the additional instrument (lowincome or firmsz) appear to be endogenous.

[21]:
ivmod = IVGMM(data.ldrugexp, data[controls], data.hi_empunion, data[instruments])
res_gmm_all = ivmod.fit()
res_gmm_all.j_stat
[21]:
H0: Expected moment conditions are equal to 0
Statistic: 11.5903
P-value: 0.0089
Distributed: chi2(3)
WaldTestStatistic, id: 0x7f6664934df0

Single Instrument Regressions

It can be useful to run the just identified regressions to see how the IV estimate varies by instrument. The OLS model is included for comparison. The coefficient when using lowincome is very similar to the OLS as is the \(R^2\) which indicates this variable may be endogenous. The coefficient using firmsz is also very different, but this is probably due to the low correlation between firmsz and the endogenous regressor so that this is a weak instrument.

[22]:
od = OrderedDict()
for col in instruments:
    od[col] = IV2SLS(data.ldrugexp, data[controls], data.hi_empunion, data[col]).fit(
        cov_type="robust"
    )
od["OLS"] = res_ols
print(compare(od))
                                     Model Comparison
==========================================================================================
                          ssiratio     lowincome        multlc        firmsz           OLS
------------------------------------------------------------------------------------------
Dep. Variable             ldrugexp      ldrugexp      ldrugexp      ldrugexp      ldrugexp
Estimator                  IV-2SLS       IV-2SLS       IV-2SLS       IV-2SLS           OLS
No. Observations             10089         10089         10089         10089         10089
Cov. Est.                   robust        robust        robust        robust        robust
R-squared                   0.0640        0.1768       -0.0644       -0.9053        0.1770
Adj. R-squared              0.0634        0.1763       -0.0651       -0.9064        0.1765
F-statistic                 2000.9        2250.6        1734.1        950.64        2262.6
P-value (F-stat)            0.0000        0.0000        0.0000        0.0000        0.0000
==================     ===========   ===========   ===========   ===========   ===========
const                       6.7872        5.8201        7.2145        8.7267        5.8611
                          (25.246)      (15.267)      (16.326)      (6.4196)      (37.320)
totchr                      0.4503        0.4399        0.4548        0.4710        0.4404
                          (44.157)      (43.834)      (39.193)      (23.163)      (47.049)
female                     -0.0204        0.0613       -0.0565       -0.1842        0.0578
                         (-0.6257)      (1.6066)     (-1.2596)     (-1.5312)      (2.2797)
age                        -0.0132       -0.0031       -0.0177       -0.0335       -0.0035
                         (-4.4092)     (-0.7521)     (-3.7086)     (-2.3433)     (-1.8228)
linc                        0.0870        0.0071        0.1223        0.2473        0.0105
                          (3.8436)      (0.2280)      (3.2965)      (2.1876)      (0.7646)
blhisp                     -0.2174       -0.1484       -0.2479       -0.3559       -0.1513
                         (-5.5052)     (-3.5639)     (-5.0727)     (-3.2425)     (-4.4353)
hi_empunion                -0.8976        0.1170       -1.3459       -2.9323        0.0739
                         (-4.0592)      (0.3254)     (-3.1760)     (-2.0908)      (2.8441)
==================== ============= ============= ============= ============= =============
Instruments               ssiratio     lowincome        multlc        firmsz
------------------------------------------------------------------------------------------

T-stats reported in parentheses

First Stage Diagnostics

First stage diagnostics are available to assess whether the instruments appear to be credible for the endogenous regressor. The Partial F-statistic is the F-statistic for all instruments once controls have been partialed out. In the case of a single instrument, it is just the squared t-stat.

[23]:
print(res_2sls.first_stage)
    First Stage Estimation Results
======================================
                           hi_empunion
--------------------------------------
R-squared                       0.0761
Partial R-squared               0.0179
Shea's R-squared                0.0179
Partial F-statistic             65.806
P-value (Partial F-stat)     4.441e-16
Partial F-stat Distn           chi2(1)
========================== ===========
const                           1.0290
                              (17.705)
totchr                          0.0128
                              (3.4896)
female                         -0.0734
                             (-7.6226)
age                            -0.0086
                             (-12.184)
linc                            0.0484
                              (7.3266)
blhisp                         -0.0627
                             (-5.1084)
ssiratio                       -0.1916
                             (-8.1121)
--------------------------------------

T-stats reported in parentheses
T-stats use same covariance type as original model

The F-statistic actually has a \(chi^2\) distribution since it is just a Wald test that all of the coefficients are 0. This breaks the “rule-of-thumb” but it can be applied by dividing the F-stat by the number of instruments.

[24]:
ivmod = IV2SLS(data.ldrugexp, data[controls], data.hi_empunion, data[instruments])
res_2sls_all = ivmod.fit()
print(res_2sls_all.first_stage)
    First Stage Estimation Results
======================================
                           hi_empunion
--------------------------------------
R-squared                       0.0821
Partial R-squared               0.0243
Shea's R-squared                0.0243
Partial F-statistic             179.47
P-value (Partial F-stat)        0.0000
Partial F-stat Distn           chi2(4)
========================== ===========
const                           0.9899
                              (16.959)
totchr                          0.0133
                              (3.6494)
female                         -0.0724
                             (-7.5497)
age                            -0.0080
                             (-11.206)
linc                            0.0410
                              (6.3552)
blhisp                         -0.0676
                             (-5.5369)
ssiratio                       -0.1690
                             (-7.3289)
lowincome                      -0.0637
                             (-5.1947)
multlc                          0.1151
                              (5.4799)
firmsz                          0.0037
                              (1.9286)
--------------------------------------

T-stats reported in parentheses
T-stats use same covariance type as original model

LIML

The LIML estimator and related k-class estimators can be used through IVLIML. LIML can have better finite sample properties if the model is not strongly identified. By default the \(\kappa\) parameter is estimated. In this dataset it is very close to 1 and to the results for LIML are similar to 2SLS (they would be exact if \(\kappa=1\)).

[25]:
ivmod = IVLIML(
    data.ldrugexp, data[controls], data.hi_empunion, data[["ssiratio", "multlc"]]
)
res_liml = ivmod.fit(cov_type="robust")
print(compare({"2SLS": res_2sls_robust, "LIML": res_liml, "GMM": res_gmm}))
                       Model Comparison
==============================================================
                              2SLS          LIML           GMM
--------------------------------------------------------------
Dep. Variable             ldrugexp      ldrugexp      ldrugexp
Estimator                  IV-2SLS       IV-LIML        IV-GMM
No. Observations             10089         10089         10089
Cov. Est.                   robust        robust        robust
R-squared                   0.0414        0.0400        0.0406
Adj. R-squared              0.0409        0.0394        0.0400
F-statistic                 1955.4        1952.3        1952.6
P-value (F-stat)            0.0000        0.0000        0.0000
==================     ===========   ===========   ===========
const                       6.8752        6.8807        6.8778
                          (26.660)      (26.577)      (26.658)
totchr                      0.4512        0.4513        0.4510
                          (43.769)      (43.730)      (43.738)
female                     -0.0278       -0.0283       -0.0282
                         (-0.8653)     (-0.8776)     (-0.8752)
age                        -0.0141       -0.0142       -0.0142
                         (-4.8753)     (-4.8781)     (-4.8773)
linc                        0.0943        0.0947        0.0945
                          (4.3079)      (4.3114)      (4.3142)
blhisp                     -0.2237       -0.2241       -0.2231
                         (-5.6514)     (-5.6531)     (-5.6344)
hi_empunion                -0.9899       -0.9957       -0.9933
                         (-4.8386)     (-4.8361)     (-4.8530)
==================== ============= ============= =============
Instruments               ssiratio      ssiratio      ssiratio
                            multlc        multlc        multlc
--------------------------------------------------------------

T-stats reported in parentheses

The estimated value of \(\kappa\).

[26]:
print(res_liml.kappa)
1.0001153166806434

IV2SLS as OLS

As one final check, the “OLS” version of IV2SLS is compared to statsmodels OLS command. The parameters are identical.

[27]:
import pandas as pd

ivolsmod = IV2SLS(data.ldrugexp, data[["hi_empunion"] + controls], None, None)
res_ivols = ivolsmod.fit()
sm_ols = res_ols.params
sm_ols.name = "sm"
print(pd.concat([res_ivols.params, sm_ols], axis=1))
             parameter        sm
hi_empunion   0.073879  0.073879
const         5.861131  5.861131
totchr        0.440381  0.440381
female        0.057806  0.057806
age          -0.003529 -0.003529
linc          0.010482  0.010482
blhisp       -0.151307 -0.151307