linearmodels.iv.absorbing.AbsorbingLS¶

Linear regression with high-dimensional effects

Parameters:¶

dependent: ndarray | DataArray | DataFrame | Series¶: Endogenous variables (nobs by 1)
exog: ndarray | DataArray | DataFrame | Series | None = None¶: Exogenous regressors (nobs by nexog)
absorb: DataFrame | Interaction | None = None¶: The effects or continuous variables to absorb. When using a DataFrame, effects must be categorical variables. Other variable types are treated as continuous variables that should be absorbed. When using an Interaction, variables in the cat argument are treated as effects and variables in the cont argument are treated as continuous.
interactions: DataFrame | Interaction | Iterable[DataFrame | Interaction] | None = None¶: Interactions containing both categorical and continuous variables. Each interaction is constructed using the Cartesian product of the categorical variables to produce the dummy, which are then separately interacted with each continuous variable.
weights: ndarray | DataArray | DataFrame | Series | None = None¶: Observation weights used in estimation
drop_absorbed: bool = False¶: Flag indicating whether to drop absorbed variables

Notes

Capable of estimating models with millions of effects.

Estimates models of the form

\[y_i = x_i \beta + z_i \gamma + \epsilon_i\]

where \(\beta\) are parameters of interest and \(\gamma\) are not. z may be high-dimensional, although must have fewer variables than the number of observations in y.

The syntax simplifies specifying high-dimensional z when z consists of categorical (factor) variables, also known as effects, or when z contains interactions between continuous variables and categorical variables, also known as fixed slopes.

The high-dimensional effects are fit using LSMR which avoids inverting or even constructing the inner product of the regressors. This is combined with Frish-Waugh-Lovell to orthogonalize x and y from z.

z can contain factors that are perfectly linearly dependent. LSMR estimates a particular restricted set of parameters that captures the effect of non-redundant components in z.

Examples

Estimate a model by absorbing 2 categoricals and 2 continuous variables

>>> import numpy as np
>>> import pandas as pd
>>> from linearmodels.iv import AbsorbingLS, Interaction
>>> dep = np.random.standard_normal((20000,1))
>>> exog = np.random.standard_normal((20000,2))
>>> cats = pd.DataFrame({i: pd.Categorical(np.random.randint(1000, size=20000))
...                      for i in range(2)})
>>> cont = pd.DataFrame({i+2: np.random.standard_normal(20000) for i in range(2)})
>>> absorb = pd.concat([cats, cont], axis=1)
>>> mod = AbsorbingLS(dep, exog, absorb=absorb)
>>> res = mod.fit()

Add interactions between the cartesian product of the categorical and each continuous variables

>>> iaction = Interaction(cat=cats, cont=cont)
>>> absorb = Interaction(cat=cats) # Other encoding of categoricals
>>> mod = AbsorbingLS(dep, exog, absorb=absorb, interactions=iaction)

Methods

`fit`(*[, cov_type, debiased, method, ...])	Estimate model parameters
`resids`(params)	Compute model residuals
`wresids`(params)	Compute weighted model residuals

Properties

`absorbed_dependent`	Dependent variable with effects absorbed
`absorbed_exog`	Exogenous variables with effects absorbed
`dependent`
`exog`
`has_constant`
`instruments`
`weights`