Models for Panel Data

Fixed Effect Estimation

class PanelOLS(dependent, exog, *, weights=None, entity_effects=False, time_effects=False, other_effects=None)[source]

One- and two-way fixed effects estimator for panel data

Parameters:
  • dependent (array-like) – Dependent (left-hand-side) variable (time by entity).
  • exog (array-like) – Exogenous or right-hand-side variables (variable by time by entity).
  • weights (array-like, optional) – Weights to use in estimation. Assumes residual variance is proportional to inverse of weight to that the residual time the weight should be homoskedastic.
  • entity_effects (bool, optional) – Flag whether to include entity (fixed) effects in the model
  • time_effects (bool, optional) – Flag whether to include time effects in the model
  • other_effects (array-like, optional) – Category codes to use for any effects that are not entity or time effects. Each variable is treated as an effect.

Notes

Many models can be estimated. The most common included entity effects and can be described

\[y_{it} = \alpha_i + \beta^{\prime}x_{it} + \epsilon_{it}\]

where \(\alpha_i\) is included if entity_effects=True.

Time effect are also supported, which leads to a model of the form

\[y_{it}= \gamma_t + \beta^{\prime}x_{it} + \epsilon_{it}\]

where \(\gamma_i\) is included if time_effects=True.

Both effects can be simultaneously used,

\[y_{it}=\alpha_i + \gamma_t + \beta^{\prime}x_{it} + \epsilon_{it}\]

Additionally , arbitrary effects can be specified using categorical variables.

If both entity_effect and``time_effects`` are False, and no other effects are included, the model reduces to PooledOLS.

Model supports at most 2 effects. These can be entity-time, entity-other, time-other or 2 other.

entity_effects

Flag indicating whether entity effects are included

fit(*, use_lsdv=False, cov_type='unadjusted', debiased=True, auto_df=True, count_effects=True, **cov_config)[source]

Estimate model parameters

Parameters:
  • use_lsdv (bool, optional) – Flag indicating to use the Least Squares Dummy Variable estimator to eliminate effects. The default value uses only means and does note require constructing dummy variables for each effect.
  • cov_type (str, optional) – Name of covariance estimator. See Notes.
  • debiased (bool, optional) – Flag indicating whether to debiased the covariance estimator using a degree of freedom adjustment.
  • auto_df (bool, optional) – Flag indicating that the treatment of estimated effects in degree of freedom adjustment is automatically handled. This is useful since clustered standard errors that are clustered using the same variable as an effect do not require degree of freedom correction while other estimators such as the unadjusted covariance do.
  • count_effects (bool, optional) – Flag indicating that the covariance estimator should be adjusted to account for the estimation of effects in the model. Only used if auto_df=False.
  • **cov_config – Additional covariance-specific options. See Notes.
Returns:

results – Estimation results

Return type:

PanelEffectsResults

Examples

>>> from linearmodels import PanelOLS
>>> mod = PanelOLS(y, x, entity_effects=True)
>>> res = mod.fit(cov_type='clustered', cluster_entity=True)

Notes

Three covariance estimators are supported:

  • ‘unadjusted’, ‘homoskedastic’ - Assume residual are homoskedastic
  • ‘robust’, ‘heteroskedastic’ - Control for heteroskedasticity using White’s estimator
  • ‘clustered` - One or two way clustering. Configuration options are:
    • clusters - Input containing containing 1 or 2 variables. Clusters should be integer values, although other types will be coerced to integer values by treating as categorical variables
    • cluster_entity - Boolean flag indicating to use entity clusters
    • cluster_time - Boolean indicating to use time clusters
  • ‘kernel’ - Driscoll-Kraay HAC estimator. Configurations options are:
    • kernel - One of the supported kernels (bartlett, parzen, qs). Default is Bartlett’s kernel, which is produces a covariance estimator similar to the Newey-West covariance estimator.
    • bandwidth - Bandwidth to use when computing the kernel. If not provided, a naive default is used.
formula

Formula used to construct the model

classmethod from_formula(formula, data, *, weights=None, other_effects=None)[source]

Create a model from a formula

Parameters:
  • formula (str) – Formula to transform into model. Conforms to patsy formula rules with two special variable names, EntityEffects and TimeEffects which can be used to specify that the model should contain an entity effect or a time effect, respectively. See Examples.
  • data (array-like) – Data structure that can be coerced into a PanelData. In most cases, this should be a multi-index DataFrame where the level 0 index contains the entities and the level 1 contains the time.
  • weights (array-like) – Weights to use in estimation. Assumes residual variance is proportional to inverse of weight to that the residual time the weight should be homoskedastic.
  • other_effects (array-like, optional) – Category codes to use for any effects that are not entity or time effects. Each variable is treated as an effect.
Returns:

model – Model specified using the formula

Return type:

PanelOLS

Examples

>>> from linearmodels import PanelOLS
>>> mod = PanelOLS.from_formula('y ~ 1 + x1 + EntityEffects', panel_data)
>>> res = mod.fit(cov_type='clustered', cluster_entity=True)
has_constant

Flag indicating the model a constant or implicit constant

not_null

Locations of non-missing observations

other_effects

Flag indicating whether other (generic) effects are included

predict(params, *, exog=None, data=None, eval_env=4)

Predict values for additional data

Parameters:
  • params (array-like) – Model parameters (nvar by 1)
  • exog (array-like) – Exogenous regressors (nobs by nvar)
  • data (DataFrame) – Values to use when making predictions from a model constructed from a formula
  • eval_env (int) – Depth of use when evaluating formulas using Patsy.
Returns:

predictions – Fitted values from supplied data and parameters

Return type:

DataFrame

Notes

If data is not None, then exog must be None. Predictions from models constructed using formulas can be computed using either exog, which will treat these are arrays of values corresponding to the formula-processed data, or using data which will be processed using the formula used to construct the values corresponding to the original model specification.

reformat_clusters(clusters)

Reformat cluster variables

Parameters:clusters (array-like) – Values to use for variance clustering
Returns:reformatted – Original data with matching axis and observation dropped where missing in the model data.
Return type:PanelData

Notes

This is exposed for testing and is not normally needed for estimation

time_effects

Flag indicating whether time effects are included

Random Effects

class RandomEffects(dependent, exog, *, weights=None)[source]

One-way Random Effects model for panel data

Parameters:
  • dependent (array-like) – Dependent (left-hand-side) variable (time by entity)
  • exog (array-like) – Exogenous or right-hand-side variables (variable by time by entity).
  • weights (array-like, optional) – Weights to use in estimation. Assumes residual variance is proportional to inverse of weight to that the residual time the weight should be homoskedastic.

Notes

The model is given by

\[y_{it} = \beta^{\prime}x_{it} + u_i + \epsilon_{it}\]

where \(u_i\) is a shock that is independent of \(x_{it}\) but common to all entities i.

formula

Formula used to construct the model

classmethod from_formula(formula, data, *, weights=None)[source]

Create a model from a formula

Parameters:
  • formula (str) – Formula to transform into model. Conforms to patsy formula rules.
  • data (array-like) – Data structure that can be coerced into a PanelData. In most cases, this should be a multi-index DataFrame where the level 0 index contains the entities and the level 1 contains the time.
  • weights (array-like, optional) – Weights to use in estimation. Assumes residual variance is proportional to inverse of weight to that the residual times the weight should be homoskedastic.
Returns:

model – Model specified using the formula

Return type:

RandomEffects

Notes

Unlike standard patsy, it is necessary to explicitly include a constant using the constant indicator (1)

Examples

>>> from linearmodels import RandomEffects
>>> mod = RandomEffects.from_formula('y ~ 1 + x1', panel_data)
>>> res = mod.fit()
has_constant

Flag indicating the model a constant or implicit constant

not_null

Locations of non-missing observations

predict(params, *, exog=None, data=None, eval_env=4)

Predict values for additional data

Parameters:
  • params (array-like) – Model parameters (nvar by 1)
  • exog (array-like) – Exogenous regressors (nobs by nvar)
  • data (DataFrame) – Values to use when making predictions from a model constructed from a formula
  • eval_env (int) – Depth of use when evaluating formulas using Patsy.
Returns:

predictions – Fitted values from supplied data and parameters

Return type:

DataFrame

Notes

If data is not None, then exog must be None. Predictions from models constructed using formulas can be computed using either exog, which will treat these are arrays of values corresponding to the formula-processed data, or using data which will be processed using the formula used to construct the values corresponding to the original model specification.

reformat_clusters(clusters)

Reformat cluster variables

Parameters:clusters (array-like) – Values to use for variance clustering
Returns:reformatted – Original data with matching axis and observation dropped where missing in the model data.
Return type:PanelData

Notes

This is exposed for testing and is not normally needed for estimation

Between OLS

class BetweenOLS(dependent, exog, *, weights=None)[source]

Between estimator for panel data

Parameters:
  • dependent (array-like) – Dependent (left-hand-side) variable (time by entity)
  • exog (array-like) – Exogenous or right-hand-side variables (variable by time by entity).
  • weights (array-like, optional) – Weights to use in estimation. Assumes residual variance is proportional to inverse of weight to that the residual time the weight should be homoskedastic.

Notes

The model is given by

\[\bar{y}_{i}= \beta^{\prime}\bar{x}_{i}+\bar{\epsilon}_{i}\]

where \(\bar{z}\) is the time-average.

fit(*, reweight=False, cov_type='unadjusted', debiased=True, **cov_config)[source]

Estimate model parameters

Parameters:
  • reweight (bool) – Flag indicating to reweight observations if the input data is unbalanced using a WLS estimator. If weights are provided, these are accounted for when reweighting. Has no effect on balanced data.
  • cov_type (str, optional) – Name of covariance estimator. See Notes.
  • debiased (bool, optional) – Flag indicating whether to debiased the covariance estimator using a degree of freedom adjustment.
  • **cov_config – Additional covariance-specific options. See Notes.
Returns:

results – Estimation results

Return type:

PanelResults

Examples

>>> from linearmodels import BetweenOLS
>>> mod = BetweenOLS(y, x)
>>> res = mod.fit(cov_type='robust')

Notes

Three covariance estimators are supported:

  • ‘unadjusted’, ‘homoskedastic’ - Assume residual are homoskedastic
  • ‘robust’, ‘heteroskedastic’ - Control for heteroskedasticity using White’s estimator
  • ‘clustered` - One or two way clustering. Configuration options are:
    • clusters - Input containing containing 1 or 2 variables. Clusters should be integer values, although other types will be coerced to integer values by treating as categorical variables

When using a clustered covariance estimator, all cluster ids must be identical within an entity.

formula

Formula used to construct the model

classmethod from_formula(formula, data, *, weights=None)[source]

Create a model from a formula

Parameters:
  • formula (str) – Formula to transform into model. Conforms to patsy formula rules.
  • data (array-like) – Data structure that can be coerced into a PanelData. In most cases, this should be a multi-index DataFrame where the level 0 index contains the entities and the level 1 contains the time.
  • weights (array-like, optional) – Weights to use in estimation. Assumes residual variance is proportional to inverse of weight to that the residual times the weight should be homoskedastic.
Returns:

model – Model specified using the formula

Return type:

BetweenOLS

Notes

Unlike standard patsy, it is necessary to explicitly include a constant using the constant indicator (1)

Examples

>>> from linearmodels import BetweenOLS
>>> mod = BetweenOLS.from_formula('y ~ 1 + x1', panel_data)
>>> res = mod.fit()
has_constant

Flag indicating the model a constant or implicit constant

not_null

Locations of non-missing observations

predict(params, *, exog=None, data=None, eval_env=4)

Predict values for additional data

Parameters:
  • params (array-like) – Model parameters (nvar by 1)
  • exog (array-like) – Exogenous regressors (nobs by nvar)
  • data (DataFrame) – Values to use when making predictions from a model constructed from a formula
  • eval_env (int) – Depth of use when evaluating formulas using Patsy.
Returns:

predictions – Fitted values from supplied data and parameters

Return type:

DataFrame

Notes

If data is not None, then exog must be None. Predictions from models constructed using formulas can be computed using either exog, which will treat these are arrays of values corresponding to the formula-processed data, or using data which will be processed using the formula used to construct the values corresponding to the original model specification.

reformat_clusters(clusters)

Reformat cluster variables

Parameters:clusters (array-like) – Values to use for variance clustering
Returns:reformatted – Original data with matching axis and observation dropped where missing in the model data.
Return type:PanelData

Notes

This is exposed for testing and is not normally needed for estimation

First Difference Estimation

class FirstDifferenceOLS(dependent, exog, *, weights=None)[source]

First difference model for panel data

Parameters:
  • dependent (array-like) – Dependent (left-hand-side) variable (time by entity)
  • exog (array-like) – Exogenous or right-hand-side variables (variable by time by entity).
  • weights (array-like, optional) – Weights to use in estimation. Assumes residual variance is proportional to inverse of weight to that the residual time the weight should be homoskedastic.

Notes

The model is given by

\[\Delta y_{it}=\beta^{\prime}\Delta x_{it}+\Delta\epsilon_{it}\]
fit(*, cov_type='unadjusted', debiased=True, **cov_config)[source]

Estimate model parameters

Parameters:
  • cov_type (str, optional) – Name of covariance estimator. See Notes.
  • debiased (bool, optional) – Flag indicating whether to debiased the covariance estimator using a degree of freedom adjustment.
  • **cov_config – Additional covariance-specific options. See Notes.
Returns:

results – Estimation results

Return type:

PanelResults

Examples

>>> from linearmodels import FirstDifferenceOLS
>>> mod = FirstDifferenceOLS(y, x)
>>> res = mod.fit(cov_type='robust')
>>> res = mod.fit(cov_type='clustered', cluster_entity=True)

Notes

Three covariance estimators are supported:

  • ‘unadjusted’, ‘homoskedastic’ - Assume residual are homoskedastic
  • ‘robust’, ‘heteroskedastic’ - Control for heteroskedasticity using White’s estimator
  • ‘clustered` - One or two way clustering. Configuration options are:
    • clusters - Input containing containing 1 or 2 variables. Clusters should be integer values, although other types will be coerced to integer values by treating as categorical variables
    • cluster_entity - Boolean flag indicating to use entity clusters
  • ‘kernel’ - Driscoll-Kraay HAC estimator. Configurations options are:
    • kernel - One of the supported kernels (bartlett, parzen, qs). Default is Bartlett’s kernel, which is produces a covariance estimator similar to the Newey-West covariance estimator.
    • bandwidth - Bandwidth to use when computing the kernel. If not provided, a naive default is used.

When using a clustered covariance estimator, all cluster ids must be identical within a first difference. In most scenarios, this requires ids to be identical within an entity.

formula

Formula used to construct the model

classmethod from_formula(formula, data, *, weights=None)[source]

Create a model from a formula

Parameters:
  • formula (str) – Formula to transform into model. Conforms to patsy formula rules.
  • data (array-like) – Data structure that can be coerced into a PanelData. In most cases, this should be a multi-index DataFrame where the level 0 index contains the entities and the level 1 contains the time.
  • weights (array-like, optional) – Weights to use in estimation. Assumes residual variance is proportional to inverse of weight to that the residual times the weight should be homoskedastic.
Returns:

model – Model specified using the formula

Return type:

FirstDifferenceOLS

Notes

Unlike standard patsy, it is necessary to explicitly include a constant using the constant indicator (1)

Examples

>>> from linearmodels import FirstDifferenceOLS
>>> mod = FirstDifferenceOLS.from_formula('y ~ x1', panel_data)
>>> res = mod.fit()
has_constant

Flag indicating the model a constant or implicit constant

not_null

Locations of non-missing observations

predict(params, *, exog=None, data=None, eval_env=4)

Predict values for additional data

Parameters:
  • params (array-like) – Model parameters (nvar by 1)
  • exog (array-like) – Exogenous regressors (nobs by nvar)
  • data (DataFrame) – Values to use when making predictions from a model constructed from a formula
  • eval_env (int) – Depth of use when evaluating formulas using Patsy.
Returns:

predictions – Fitted values from supplied data and parameters

Return type:

DataFrame

Notes

If data is not None, then exog must be None. Predictions from models constructed using formulas can be computed using either exog, which will treat these are arrays of values corresponding to the formula-processed data, or using data which will be processed using the formula used to construct the values corresponding to the original model specification.

reformat_clusters(clusters)

Reformat cluster variables

Parameters:clusters (array-like) – Values to use for variance clustering
Returns:reformatted – Original data with matching axis and observation dropped where missing in the model data.
Return type:PanelData

Notes

This is exposed for testing and is not normally needed for estimation

Pooled OLS

class PooledOLS(dependent, exog, *, weights=None)[source]

Pooled coefficient estimator for panel data

Parameters:
  • dependent (array-like) – Dependent (left-hand-side) variable (time by entity)
  • exog (array-like) – Exogenous or right-hand-side variables (variable by time by entity).
  • weights (array-like, optional) – Weights to use in estimation. Assumes residual variance is proportional to inverse of weight to that the residual time the weight should be homoskedastic.

Notes

The model is given by

\[y_{it}=\beta^{\prime}x_{it}+\epsilon_{it}\]
fit(*, cov_type='unadjusted', debiased=True, **cov_config)[source]

Estimate model parameters

Parameters:
  • cov_type (str, optional) – Name of covariance estimator. See Notes.
  • debiased (bool, optional) – Flag indicating whether to debiased the covariance estimator using a degree of freedom adjustment.
  • **cov_config – Additional covariance-specific options. See Notes.
Returns:

results – Estimation results

Return type:

PanelResults

Examples

>>> from linearmodels import PooledOLS
>>> mod = PooledOLS(y, x)
>>> res = mod.fit(cov_type='clustered', cluster_entity=True)

Notes

Four covariance estimators are supported:

  • ‘unadjusted’, ‘homoskedastic’ - Assume residual are homoskedastic
  • ‘robust’, ‘heteroskedastic’ - Control for heteroskedasticity using White’s estimator
  • ‘clustered` - One or two way clustering. Configuration options are:
    • clusters - Input containing containing 1 or 2 variables. Clusters should be integer values, although other types will be coerced to integer values by treating as categorical variables
    • cluster_entity - Boolean flag indicating to use entity clusters
    • cluster_time - Boolean indicating to use time clusters
  • ‘kernel’ - Driscoll-Kraay HAC estimator. Configurations options are:
    • kernel - One of the supported kernels (bartlett, parzen, qs). Default is Bartlett’s kernel, which is produces a covariance estimator similar to the Newey-West covariance estimator.
    • bandwidth - Bandwidth to use when computing the kernel. If not provided, a naive default is used.
formula

Formula used to construct the model

classmethod from_formula(formula, data, *, weights=None)[source]

Create a model from a formula

Parameters:
  • formula (str) – Formula to transform into model. Conforms to patsy formula rules.
  • data (array-like) – Data structure that can be coerced into a PanelData. In most cases, this should be a multi-index DataFrame where the level 0 index contains the entities and the level 1 contains the time.
  • weights (array-like, optional) – Weights to use in estimation. Assumes residual variance is proportional to inverse of weight to that the residual times the weight should be homoskedastic.
Returns:

model – Model specified using the formula

Return type:

PooledOLS

Notes

Unlike standard patsy, it is necessary to explicitly include a constant using the constant indicator (1)

Examples

>>> from linearmodels import PooledOLS
>>> mod = PooledOLS.from_formula('y ~ 1 + x1', panel_data)
>>> res = mod.fit()
has_constant

Flag indicating the model a constant or implicit constant

not_null

Locations of non-missing observations

predict(params, *, exog=None, data=None, eval_env=4)[source]

Predict values for additional data

Parameters:
  • params (array-like) – Model parameters (nvar by 1)
  • exog (array-like) – Exogenous regressors (nobs by nvar)
  • data (DataFrame) – Values to use when making predictions from a model constructed from a formula
  • eval_env (int) – Depth of use when evaluating formulas using Patsy.
Returns:

predictions – Fitted values from supplied data and parameters

Return type:

DataFrame

Notes

If data is not None, then exog must be None. Predictions from models constructed using formulas can be computed using either exog, which will treat these are arrays of values corresponding to the formula-processed data, or using data which will be processed using the formula used to construct the values corresponding to the original model specification.

reformat_clusters(clusters)[source]

Reformat cluster variables

Parameters:clusters (array-like) – Values to use for variance clustering
Returns:reformatted – Original data with matching axis and observation dropped where missing in the model data.
Return type:PanelData

Notes

This is exposed for testing and is not normally needed for estimation

Fama-MacBeth

class FamaMacBeth(dependent, exog, *, weights=None)[source]

Pooled coefficient estimator for panel data

Parameters:
  • dependent (array-like) – Dependent (left-hand-side) variable (time by entity)
  • exog (array-like) – Exogenous or right-hand-side variables (variable by time by entity).
  • weights (array-like, optional) – Weights to use in estimation. Assumes residual variance is proportional to inverse of weight to that the residual time the weight should be homoskedastic.

Notes

The model is given by

\[y_{it}=\beta^{\prime}x_{it}+\epsilon_{it}\]

The Fama-MacBeth estimator is computed by performing T regressions, one for each time period using all available entity observations. Denote the estimate of the model parameters as \(\hat{\beta}_t\). The reported estimator is then

\[\hat{\beta} = T^{-1}\sum_{t=1}^T \hat{\beta}_t\]

While the model does not explicitly include time-effects, the implementation based on regressing all observation in a single time period is “as-if” time effects are included.

Parameter inference is made using the set T parameter estimates using either the standard covariance estimator or a kernel-based covariance, depending on cov_type.

fit(cov_type='unadjusted', debiased=True, **cov_config)[source]

Estimate model parameters

Parameters:
  • cov_type (str, optional) – Name of covariance estimator. See Notes.
  • debiased (bool, optional) – Flag indicating whether to debiased the covariance estimator using a degree of freedom adjustment.
  • **cov_config – Additional covariance-specific options. See Notes.
Returns:

results – Estimation results

Return type:

PanelResults

Examples

>>> from linearmodels import FamaMacBeth
>>> mod = FamaMacBeth(y, x)
>>> res = mod.fit(cov_type='kernel', kernel='Parzen')

Notes

Four covariance estimators are supported:

  • ‘unadjusted’, ‘homoskedastic’, ‘robust’, ‘heteroskedastic’ - Use the standard covariance estimator of the T parameter estimates.
  • ‘kernel’ - HAC estimator. Configurations options are:
    • kernel - One of the supported kernels (bartlett, parzen, qs). Default is Bartlett’s kernel, which is implements the the Newey-West covariance estimator.
    • bandwidth - Bandwidth to use when computing the kernel. If not provided, a naive default is used.
formula

Formula used to construct the model

classmethod from_formula(formula, data, *, weights=None)[source]

Create a model from a formula

Parameters:
  • formula (str) – Formula to transform into model. Conforms to patsy formula rules.
  • data (array-like) – Data structure that can be coerced into a PanelData. In most cases, this should be a multi-index DataFrame where the level 0 index contains the entities and the level 1 contains the time.
  • weights (array-like, optional) – Weights to use in estimation. Assumes residual variance is proportional to inverse of weight to that the residual times the weight should be homoskedastic.
Returns:

model – Model specified using the formula

Return type:

FamaMacBeth

Notes

Unlike standard patsy, it is necessary to explicitly include a constant using the constant indicator (1)

Examples

>>> from linearmodels import BetweenOLS
>>> mod = FamaMacBeth.from_formula('y ~ 1 + x1', panel_data)
>>> res = mod.fit()
has_constant

Flag indicating the model a constant or implicit constant

not_null

Locations of non-missing observations

predict(params, *, exog=None, data=None, eval_env=4)

Predict values for additional data

Parameters:
  • params (array-like) – Model parameters (nvar by 1)
  • exog (array-like) – Exogenous regressors (nobs by nvar)
  • data (DataFrame) – Values to use when making predictions from a model constructed from a formula
  • eval_env (int) – Depth of use when evaluating formulas using Patsy.
Returns:

predictions – Fitted values from supplied data and parameters

Return type:

DataFrame

Notes

If data is not None, then exog must be None. Predictions from models constructed using formulas can be computed using either exog, which will treat these are arrays of values corresponding to the formula-processed data, or using data which will be processed using the formula used to construct the values corresponding to the original model specification.

reformat_clusters(clusters)

Reformat cluster variables

Parameters:clusters (array-like) – Values to use for variance clustering
Returns:reformatted – Original data with matching axis and observation dropped where missing in the model data.
Return type:PanelData

Notes

This is exposed for testing and is not normally needed for estimation