Absorbing Regression

An absorbing regression is a model of the form

\[y_i = x_i \beta + z_i \gamma +\epsilon_i\]

where interest is on \(\beta\) and not \(\gamma\). \(z_i\) may be high-dimensional, and may grow with the sample size (i.e., a matrix of fixed effects).

This notebook shows how this type of model can be fit in a simulate data set that mirrors some used in practice. There are three effects, one for the state of the worker (small), one one for the workers firm (large)

[1]:
import numpy as np
import pandas as pd

rs = np.random.RandomState(0)
nobs = 1_000_000
state_id = rs.randint(50, size=nobs)
state_effects = rs.standard_normal(state_id.max() + 1)
state_effects = state_effects[state_id]
# 5 workers/firm, on average
firm_id = rs.randint(nobs // 5, size=nobs)
firm_effects = rs.standard_normal(firm_id.max() + 1)
firm_effects = firm_effects[firm_id]
cats = pd.DataFrame(
    {"state": pd.Categorical(state_id), "firm": pd.Categorical(firm_id)}
)
eps = rs.standard_normal(nobs)
x = rs.standard_normal((nobs, 2))
x = np.column_stack([np.ones(nobs), x])
y = x.sum(1) + firm_effects + state_effects + eps
/tmp/ipykernel_2189/2391153249.py:2: DeprecationWarning:
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466

  import pandas as pd

Including a constant

The estimator can estimate an intercept even when all dummies are included. This is does using a mathematical trick and the intercept is not usually meaningful. This is done as-if the the dummies are orthogonalized to a constant.

[2]:
from linearmodels.iv.absorbing import AbsorbingLS

mod = AbsorbingLS(y, x, absorb=cats)
print(mod.fit())
                         Absorbing LS Estimation Summary
==================================================================================
Dep. Variable:              dependent   R-squared:                          0.8377
Estimator:               Absorbing LS   Adj. R-squared:                     0.7975
No. Observations:             1000000   F-statistic:                     1.962e+06
Date:                Sun, Jan 21 2024   P-value (F-stat):                   0.0000
Time:                        16:27:19   Distribution:                      chi2(2)
Cov. Estimator:                robust   R-squared (No Effects):             0.6664
                                        Variables Absorbed:              1.987e+05
                             Parameter Estimates
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
exog.0         0.9477     0.0009     1057.8     0.0000      0.9460      0.9495
exog.1         0.9994     0.0010     990.89     0.0000      0.9974      1.0014
exog.2         1.0008     0.0010     989.09     0.0000      0.9989      1.0028
==============================================================================

Excluding the constant

If the constant is dropped the other coefficient are identical since the dummies span the constant.

[3]:
from linearmodels.iv.absorbing import AbsorbingLS

mod = AbsorbingLS(y, x[:, 1:], absorb=cats)
print(mod.fit())
                         Absorbing LS Estimation Summary
==================================================================================
Dep. Variable:              dependent   R-squared:                          0.8377
Estimator:               Absorbing LS   Adj. R-squared:                     0.7975
No. Observations:             1000000   F-statistic:                     1.962e+06
Date:                Sun, Jan 21 2024   P-value (F-stat):                   0.0000
Time:                        16:27:29   Distribution:                      chi2(2)
Cov. Estimator:                robust   R-squared (No Effects):             0.6664
                                        Variables Absorbed:              1.987e+05
                             Parameter Estimates
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
exog.0         0.9994     0.0010     990.89     0.0000      0.9974      1.0014
exog.1         1.0008     0.0010     989.09     0.0000      0.9989      1.0028
==============================================================================

Optimization Options

The residuals from the absorbed variables are either estimated using HDFE or LSMR< depending on the variables included in the regression. HDFE is used when:

  • the model is unweighted; and

  • the absorbed regressors are all categorical (i.e., fixed effects).

If these conditions are not satisfied, then LSMR is used. LSMR can be used by setting method="lsmr" even when the conditions for HDFE are satisfied.

[4]:
import datetime as dt

from linearmodels.iv.absorbing import AbsorbingLS

mod = AbsorbingLS(y, x[:, 1:], absorb=cats)

start = dt.datetime.now()
res = mod.fit(use_cache=False, method="lsmr")
print(f"LSMR Second: {(dt.datetime.now() - start).total_seconds()}")

start = dt.datetime.now()
res = mod.fit()
print(f"HDFE Second: {(dt.datetime.now() - start).total_seconds()}")
LSMR Second: 2.782783
HDFE Second: 1.669036

LSMR is iterative and does not have a closed form. The tolerance can be set using absorb_options which is a dictionary. See scipy.sparse.linalg.lsmr for details on the options.

[5]:
mod = AbsorbingLS(y, x[:, 1:], absorb=cats)
res = mod.fit(method="lsmr", absorb_options={"show": True})

LSMR            Least-squares solution of  Ax = b

The matrix A has 1000000 rows and 198676 columns
damp = 0.00000000000000e+00

atol = 1.00e-08                 conlim = 1.00e+08

btol = 1.00e-08             maxiter =   198676


   itn      x(1)       norm r    norm Ar  compatible   LS      norm A   cond A
     0  0.00000e+00  2.417e+03  2.126e+03   1.0e+00  3.6e-04
     1  1.83446e+02  1.677e+03  6.377e+02   6.9e-01  3.1e-01  1.2e+00  1.0e+00
     2  2.09819e+02  1.563e+03  1.668e+02   6.5e-01  6.2e-02  1.7e+00  1.2e+00
     3  2.17247e+02  1.553e+03  6.446e+01   6.4e-01  2.1e-02  2.0e+00  1.3e+00
     4  2.30067e+02  1.551e+03  6.539e-01   6.4e-01  1.9e-04  2.2e+00  1.5e+00
     5  2.30071e+02  1.551e+03  2.662e-01   6.4e-01  7.0e-05  2.4e+00  1.5e+00
     6  2.29984e+02  1.551e+03  4.542e-03   6.4e-01  1.1e-06  2.6e+00  1.6e+00
     7  2.29984e+02  1.551e+03  3.999e-03   6.4e-01  9.6e-07  2.7e+00  2.5e+00
     8  2.29985e+02  1.551e+03  3.882e-03   6.4e-01  8.7e-07  2.9e+00  6.5e+00
     9  2.29985e+02  1.551e+03  3.882e-03   6.4e-01  8.3e-07  3.0e+00  5.1e+02
    10  2.29990e+02  1.551e+03  3.882e-03   6.4e-01  7.8e-07  3.2e+00  5.5e+02
    13  3.86056e+02  1.551e+03  4.003e-05   6.4e-01  7.2e-09  3.6e+00  8.7e+01

LSMR finished
The least-squares solution is good enough, given atol
istop =       2    normr = 1.6e+03
    normA = 3.6e+00    normAr = 4.0e-05
itn   =      13    condA = 8.7e+01
    normx = 2.3e+03
    13  3.86056e+02   1.551e+03  4.003e-05
   6.4e-01  7.2e-09   3.6e+00  8.7e+01

LSMR            Least-squares solution of  Ax = b

The matrix A has 1000000 rows and 198676 columns
damp = 0.00000000000000e+00

atol = 1.00e-08                 conlim = 1.00e+08

btol = 1.00e-08             maxiter =   198676


   itn      x(1)       norm r    norm Ar  compatible   LS      norm A   cond A
     0  0.00000e+00  1.001e+03  4.470e+02   1.0e+00  4.5e-04
     1  6.08847e-02  8.953e+02  4.167e+00   8.9e-01  4.7e-03  1.0e+00  1.0e+00
     2  2.15243e-01  8.953e+02  1.484e+00   8.9e-01  1.1e-03  1.5e+00  1.1e+00
     3  2.41447e-01  8.953e+02  6.412e-01   8.9e-01  4.0e-04  1.8e+00  1.4e+00
     4  2.88004e-01  8.953e+02  6.266e-03   8.9e-01  3.1e-06  2.2e+00  1.3e+00
     5  2.87319e-01  8.953e+02  2.921e-03   8.9e-01  1.4e-06  2.4e+00  1.4e+00
     6  2.87000e-01  8.953e+02  9.868e-04   8.9e-01  4.2e-07  2.6e+00  1.4e+00
     7  2.87108e-01  8.953e+02  9.866e-04   8.9e-01  4.2e-07  2.6e+00  6.2e+01
     8  2.87255e-01  8.953e+02  9.866e-04   8.9e-01  3.9e-07  2.9e+00  1.3e+02
     9  3.62098e-01  8.953e+02  9.857e-04   8.9e-01  3.7e-07  3.0e+00  7.2e+02
    10  9.04265e-01  8.953e+02  9.789e-04   8.9e-01  3.4e-07  3.2e+00  3.1e+01
    12  3.99381e+01  8.953e+02  2.309e-05   8.9e-01  7.4e-09  3.5e+00  3.1e+01

LSMR finished
The least-squares solution is good enough, given atol
istop =       2    normr = 9.0e+02
    normA = 3.5e+00    normAr = 2.3e-05
itn   =      12    condA = 3.1e+01
    normx = 6.0e+02
    12  3.99381e+01   8.953e+02  2.309e-05
   8.9e-01  7.4e-09   3.5e+00  3.1e+01

LSMR            Least-squares solution of  Ax = b

The matrix A has 1000000 rows and 198676 columns
damp = 0.00000000000000e+00

atol = 1.00e-08                 conlim = 1.00e+08

btol = 1.00e-08             maxiter =   198676


   itn      x(1)       norm r    norm Ar  compatible   LS      norm A   cond A
     0  0.00000e+00  1.000e+03  4.472e+02   1.0e+00  4.5e-04
     1 -9.29233e-01  8.946e+02  4.623e+00   8.9e-01  5.2e-03  1.0e+00  1.0e+00
     2 -7.45416e-01  8.946e+02  1.328e+00   8.9e-01  9.9e-04  1.5e+00  1.1e+00
     3 -8.34706e-01  8.946e+02  1.107e-01   8.9e-01  7.1e-05  1.7e+00  1.5e+00
     4 -8.41880e-01  8.946e+02  7.140e-03   8.9e-01  3.6e-06  2.2e+00  1.8e+00
     5 -8.42126e-01  8.946e+02  4.176e-03   8.9e-01  2.0e-06  2.4e+00  1.8e+00
     6 -8.41939e-01  8.946e+02  3.077e-03   8.9e-01  1.3e-06  2.6e+00  2.4e+00
     7 -8.41643e-01  8.946e+02  3.077e-03   8.9e-01  1.3e-06  2.6e+00  1.9e+02
     8 -8.40968e-01  8.946e+02  3.077e-03   8.9e-01  1.2e-06  2.9e+00  4.2e+02
     9  5.02689e-01  8.946e+02  3.060e-03   8.9e-01  1.1e-06  3.0e+00  7.6e+02
    10  1.66462e+01  8.946e+02  2.851e-03   8.9e-01  9.9e-07  3.2e+00  7.9e+01
    12  1.22889e+02  8.946e+02  2.454e-05   8.9e-01  7.8e-09  3.5e+00  7.9e+01

LSMR finished
The least-squares solution is good enough, given atol
istop =       2    normr = 8.9e+02
    normA = 3.5e+00    normAr = 2.5e-05
itn   =      12    condA = 7.9e+01
    normx = 1.3e+03
    12  1.22889e+02   8.946e+02  2.454e-05
   8.9e-01  7.8e-09   3.5e+00  7.9e+01