{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Absorbing Regression\n",
    "\n",
    "An absorbing regression is a model of the form \n",
    "\n",
    "$$ y_i = x_i \\beta + z_i \\gamma +\\epsilon_i $$\n",
    "\n",
    "where interest is on $\\beta$ and not $\\gamma$.  $z_i$ may be high-dimensional, and may grow with the sample size (i.e., a matrix of fixed effects).\n",
    "\n",
    "This notebook shows how this type of model can be fit in a simulate data set that mirrors some used in practice.  There are three effects, one for the state of the worker (small), one one for the workers firm (large)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "rs = np.random.RandomState(0)\n",
    "nobs = 1_000_000\n",
    "state_id = rs.randint(50, size=nobs)\n",
    "state_effects = rs.standard_normal(state_id.max() + 1)\n",
    "state_effects = state_effects[state_id]\n",
    "# 5 workers/firm, on average\n",
    "firm_id = rs.randint(nobs // 5, size=nobs)\n",
    "firm_effects = rs.standard_normal(firm_id.max() + 1)\n",
    "firm_effects = firm_effects[firm_id]\n",
    "cats = pd.DataFrame(\n",
    "    {\"state\": pd.Categorical(state_id), \"firm\": pd.Categorical(firm_id)}\n",
    ")\n",
    "eps = rs.standard_normal(nobs)\n",
    "x = rs.standard_normal((nobs, 2))\n",
    "x = np.column_stack([np.ones(nobs), x])\n",
    "y = x.sum(1) + firm_effects + state_effects + eps"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Including a constant\n",
    "The estimator can estimate an intercept even when all dummies are included.  This is does using a mathematical trick and the intercept is not usually meaningful. This is done as-if the the dummies are orthogonalized to a constant. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from linearmodels.iv.absorbing import AbsorbingLS\n",
    "\n",
    "mod = AbsorbingLS(y, x, absorb=cats)\n",
    "print(mod.fit())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Excluding the constant\n",
    "If the constant is dropped the other coefficient are identical since the dummies span the constant."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from linearmodels.iv.absorbing import AbsorbingLS\n",
    "\n",
    "mod = AbsorbingLS(y, x[:, 1:], absorb=cats)\n",
    "print(mod.fit())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optimization Options\n",
    "\n",
    "The residuals from the absorbed variables are either estimated using HDFE or LSMR< depending on the variables included in the regression. HDFE is used when:\n",
    "\n",
    "* the model is unweighted; and\n",
    "* the absorbed regressors are all categorical (i.e., fixed effects).\n",
    "\n",
    "If these conditions are not satisfied, then LSMR is used. LSMR can be used by setting `method=\"lsmr\"` even when the conditions for HDFE are satisfied."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import datetime as dt\n",
    "\n",
    "from linearmodels.iv.absorbing import AbsorbingLS\n",
    "\n",
    "mod = AbsorbingLS(y, x[:, 1:], absorb=cats)\n",
    "\n",
    "start = dt.datetime.now()\n",
    "res = mod.fit(use_cache=False, method=\"lsmr\")\n",
    "print(f\"LSMR Second: {(dt.datetime.now() - start).total_seconds()}\")\n",
    "\n",
    "start = dt.datetime.now()\n",
    "res = mod.fit()\n",
    "print(f\"HDFE Second: {(dt.datetime.now() - start).total_seconds()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "LSMR is iterative and does not have a closed form. The tolerance can be set using `absorb_options` which is a dictionary.  See [scipy.sparse.linalg.lsmr](https://docs.scipy.org/doc/scipy-1.2.1/reference/generated/scipy.sparse.linalg.lsmr.html#scipy.sparse.linalg.lsmr) for details on the options."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mod = AbsorbingLS(y, x[:, 1:], absorb=cats)\n",
    "res = mod.fit(method=\"lsmr\", absorb_options={\"show\": True})"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.9"
  },
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "metadata": {
     "collapsed": false
    },
    "source": []
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}