.. _panel-introduction: Introduction ------------ Panel data includes observations on multiple entities -- individuals, firms, countries -- over multiple time periods. In most classical applications of panel data the number of entities, N, is large and the number of time periods, T, is small (often between 2 and 5). Most asymptotic theory for these estimators has been developed under an assumption that N will diverge while T is fixed. Most panel models are designed to estimate the parameters of a model which can be described .. math:: y_{it} = x_{it}\beta + \alpha_i + \epsilon_{it} where i indexes the entities and t indexes time. :math:`\beta` contains the parameters of interest. :math:`\alpha_i` are entity-specific components that are not usually identified in the standard setup, and so cannot be consistently estimated and :math:`\epsilon_{it}` are idiosyncratic errors uncorrelated with :math:`\alpha_i` and the covariates :math:`x_{it}`. All models require two inputs * ``dependent`` - The variable to be modeled, :math:`y_{it}` in the model * ``exog`` - The regressors, :math:`x_{it}` in the model. and use different techniques to address the presence of :math:`\alpha_i`. In particular, * :class:`~linearmodels.panel.model.PanelOLS` uses fixed effect (i.e., entity effects) to eliminate the entity specific components. This is mathematically equivalent to including a dummy variable for each entity, although the implementation does not do this for performance reasons. * :class:`~linearmodels.panel.model.BetweenOLS` averages within an entity and then regresses the time-averaged values using OLS. * :class:`~linearmodels.panel.model.FirstDifferenceOLS` takes the first difference to eliminate the entity specific effect. * :class:`~linearmodels.panel.model.RandomEffects` uses a quasi-difference to efficiently estimate :math:`\beta` when the entity effect is independent from the regressors. It is, however, not consistent when there is dependence between the entity effect and the regressors. * :class:`~linearmodels.panel.model.PooledOLS` ignores the entity effect and is consistent but inefficient when the effect is independent of the regressors. :class:`~linearmodels.panel.model.PanelOLS` is somewhat more general than the other estimators and can be used to model 2 effects (e.g., entity and time effects). Model specification is similar to `statsmodels `_. This example estimates a fixed effect regression on a panel of the wages of working men modeling the log wage as a function of squared experience, a dummy if the man is married and a dummy indicating if the man is a union member. .. code-block:: python from linearmodels.panel import PanelOLS from linearmodels.datasets import wage_panel import statsmodels.api as sm data = wage_panel.load() data = data.set_index(['nr','year']) dependent = data.lwage exog = sm.add_constant(data[['expersq','married','union']]) mod = PanelOLS(dependent, exog, entity_effects=True) res = mod.fit(cov_type='unadjusted') res While the result contains many properties containing specific quantities of interest (e.g., ``params`` or ``tstats``), the string representation of the result is a summary table. :: PanelOLS Estimation Summary ================================================================================ Dep. Variable: lwage R-squared: 0.1365 Estimator: PanelOLS R-squared (Between): -0.0674 No. Observations: 4360 R-squared (Within): 0.1365 Date: Wed, Apr 19 2017 R-squared (Overall): 0.0270 Time: 17:48:58 Log-likelihood -1439.0 Cov. Estimator: Unadjusted F-statistic: 200.87 Entities: 545 P-value 0.0000 Avg Obs: 8.0000 Distribution: F(3,3812) Min Obs: 8.0000 Max Obs: 8.0000 F-statistic (robust): 200.87 P-value 0.0000 Time periods: 8 Distribution: F(3,3812) Avg Obs: 545.00 Min Obs: 545.00 Max Obs: 545.00 Parameter Estimates ============================================================================== Parameter Std. Err. T-stat P-value Lower CI Upper CI ------------------------------------------------------------------------------ const 1.3953 0.0123 113.50 0.0000 1.3712 1.4194 expersq 0.0037 0.0002 19.560 0.0000 0.0033 0.0041 married 0.1073 0.0182 5.8992 0.0000 0.0717 0.1430 union 0.0828 0.0198 4.1864 0.0000 0.0440 0.1215 ============================================================================== F-test for Poolability: 9.3360 P-value: 0.0000 Distribution: F(544,3812) Included effects: Entity Like statsmodels, panel models can be specified using a R-like formula. This model is identical to the previous. Note the use of the *special* variable ``EntityEffects`` to include the fixed effects. .. code-block:: python mod = PanelOLS.from_formula('lwage ~ 1 + expersq + union + married + EntityEffects',data)