linearmodels.iv.absorbing.AbsorbingLS

class linearmodels.iv.absorbing.AbsorbingLS(dependent: ndarray | DataArray | DataFrame | Series, exog: ndarray | DataArray | DataFrame | Series | None = None, *, absorb: DataFrame | Interaction | None = None, interactions: DataFrame | Interaction | Iterable[DataFrame | Interaction] | None = None, weights: ndarray | DataArray | DataFrame | Series | None = None, drop_absorbed: bool = False)[source]

Linear regression with high-dimensional effects

Parameters:
dependent: ndarray | DataArray | DataFrame | Series

Endogenous variables (nobs by 1)

exog: ndarray | DataArray | DataFrame | Series | None = None

Exogenous regressors (nobs by nexog)

absorb: DataFrame | Interaction | None = None

The effects or continuous variables to absorb. When using a DataFrame, effects must be categorical variables. Other variable types are treated as continuous variables that should be absorbed. When using an Interaction, variables in the cat argument are treated as effects and variables in the cont argument are treated as continuous.

interactions: DataFrame | Interaction | Iterable[DataFrame | Interaction] | None = None

Interactions containing both categorical and continuous variables. Each interaction is constructed using the Cartesian product of the categorical variables to produce the dummy, which are then separately interacted with each continuous variable.

weights: ndarray | DataArray | DataFrame | Series | None = None

Observation weights used in estimation

drop_absorbed: bool = False

Flag indicating whether to drop absorbed variables

Notes

Capable of estimating models with millions of effects.

Estimates models of the form

\[y_i = x_i \beta + z_i \gamma + \epsilon_i\]

where \(\beta\) are parameters of interest and \(\gamma\) are not. z may be high-dimensional, although must have fewer variables than the number of observations in y.

The syntax simplifies specifying high-dimensional z when z consists of categorical (factor) variables, also known as effects, or when z contains interactions between continuous variables and categorical variables, also known as fixed slopes.

The high-dimensional effects are fit using LSMR which avoids inverting or even constructing the inner product of the regressors. This is combined with Frish-Waugh-Lovell to orthogonalize x and y from z.

z can contain factors that are perfectly linearly dependent. LSMR estimates a particular restricted set of parameters that captures the effect of non-redundant components in z.

Examples

Estimate a model by absorbing 2 categoricals and 2 continuous variables

>>> import numpy as np
>>> import pandas as pd
>>> from linearmodels.iv import AbsorbingLS, Interaction
>>> dep = np.random.standard_normal((20000,1))
>>> exog = np.random.standard_normal((20000,2))
>>> cats = pd.DataFrame({i: pd.Categorical(np.random.randint(1000, size=20000))
...                      for i in range(2)})
>>> cont = pd.DataFrame({i+2: np.random.standard_normal(20000) for i in range(2)})
>>> absorb = pd.concat([cats, cont], axis=1)
>>> mod = AbsorbingLS(dep, exog, absorb=absorb)
>>> res = mod.fit()

Add interactions between the cartesian product of the categorical and each continuous variables

>>> iaction = Interaction(cat=cats, cont=cont)
>>> absorb = Interaction(cat=cats) # Other encoding of categoricals
>>> mod = AbsorbingLS(dep, exog, absorb=absorb, interactions=iaction)

Methods

fit(*[, cov_type, debiased, method, ...])

Estimate model parameters

resids(params)

Compute model residuals

wresids(params)

Compute weighted model residuals

Properties

absorbed_dependent

Dependent variable with effects absorbed

absorbed_exog

Exogenous variables with effects absorbed

dependent

exog

has_constant

instruments

weights