Using formulas to specify models¶

All of the models can be specified using formulas. The formulas used here utilize formulaic are similar to those in statsmodels. The basis formula syntax for a single variable regression would be

y ~ 1 + x


The formulas used with BetweenOLS, PooledOLS and RandomEffects are completely standard and are identical to statsmodels. FirstDifferenceOLS is nearly identical with the caveat that the model cannot include an intercept.

PanelOLS, which implements effects (entity, time or other) has a small extension to the formula which allows entity effects or time effects (or both) to be specified as part of the formula. While it is not possible to specify other effects using the formula interface, these can be included as an optional parameter when using a formula.

When using formulas, a MultiIndex pandas dataframe where the index is entity-time is required. Here the Grunfeld data, from “The Determinants of Corporate Investment”, provided by statsmodels, is used to illustrate the use of formulas. This dataset contains data on firm investment, market value and the stock of plant capital.

set_index is used to set the index using variables from the dataset.

[1]:

from statsmodels.datasets import grunfeld

data = data.set_index(["firm", "year"])

                       invest   value  capital
firm           year
General Motors 1935.0   317.6  3078.5      2.8
1936.0   391.8  4661.7     52.6
1937.0   410.6  5387.1    156.9
1938.0   257.7  2792.2    209.2
1939.0   330.8  4313.2    203.4


PanelOLS with Entity Effects¶

Entity effects are specified using the special command EntityEffects. By default a constant is not included, and so if a constant is desired, 1+ should be included in the formula. When including effects, the model and fit are identical whether a constant is included or not.

PanelOLS with Entity Effects and a constant¶

The constant can be explicitly included using the 1 + notation. When a constant is included in the model, and additional constraint is imposed that the number of the effects is 0. This allows the constant to be identified using the grand mean of the dependent and the regressors.

[2]:

from linearmodels import PanelOLS

mod = PanelOLS.from_formula("invest ~ value + capital + EntityEffects", data=data)
print(mod.fit())

                          PanelOLS Estimation Summary
================================================================================
Dep. Variable:                 invest   R-squared:                        0.7667
Estimator:                   PanelOLS   R-squared (Between):              0.8223
No. Observations:                 220   R-squared (Within):               0.7667
Date:                Fri, Jul 19 2024   R-squared (Overall):              0.8132
Time:                        17:54:59   Log-likelihood                   -1167.4
F-statistic:                      340.08
Entities:                          11   P-value                           0.0000
Avg Obs:                       20.000   Distribution:                   F(2,207)
Min Obs:                       20.000
Max Obs:                       20.000   F-statistic (robust):             340.08
P-value                           0.0000
Time periods:                      20   Distribution:                   F(2,207)
Avg Obs:                       11.000
Min Obs:                       11.000
Max Obs:                       11.000

Parameter Estimates
==============================================================================
Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
value          0.1101     0.0113     9.7461     0.0000      0.0879      0.1324
capital        0.3100     0.0165     18.744     0.0000      0.2774      0.3426
==============================================================================

F-test for Poolability: 49.207
P-value: 0.0000
Distribution: F(10,207)

Included effects: Entity

[3]:

mod = PanelOLS.from_formula("invest ~ 1 + value + capital + EntityEffects", data=data)
print(mod.fit())

                          PanelOLS Estimation Summary
================================================================================
Dep. Variable:                 invest   R-squared:                        0.7667
Estimator:                   PanelOLS   R-squared (Between):              0.8193
No. Observations:                 220   R-squared (Within):               0.7667
Date:                Fri, Jul 19 2024   R-squared (Overall):              0.8071
Time:                        17:54:59   Log-likelihood                   -1167.4
F-statistic:                      340.08
Entities:                          11   P-value                           0.0000
Avg Obs:                       20.000   Distribution:                   F(2,207)
Min Obs:                       20.000
Max Obs:                       20.000   F-statistic (robust):             340.08
P-value                           0.0000
Time periods:                      20   Distribution:                   F(2,207)
Avg Obs:                       11.000
Min Obs:                       11.000
Max Obs:                       11.000

Parameter Estimates
==============================================================================
Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept     -55.272     10.891    -5.0750     0.0000     -76.743     -33.800
value          0.1101     0.0113     9.7461     0.0000      0.0879      0.1324
capital        0.3100     0.0165     18.744     0.0000      0.2774      0.3426
==============================================================================

F-test for Poolability: 49.207
P-value: 0.0000
Distribution: F(10,207)

Included effects: Entity


Panel with Entity and Time Effects¶

Time effects can be similarly included using TimeEffects. In many models, time effects can be consistently estimated and so they could be equivalently included in the set of regressors using a categorical variable.

[4]:

mod = PanelOLS.from_formula(
"invest ~ 1 + value + capital + EntityEffects + TimeEffects", data=data
)
print(mod.fit())

                          PanelOLS Estimation Summary
================================================================================
Dep. Variable:                 invest   R-squared:                        0.7253
Estimator:                   PanelOLS   R-squared (Between):              0.7944
No. Observations:                 220   R-squared (Within):               0.7566
Date:                Fri, Jul 19 2024   R-squared (Overall):              0.7856
Time:                        17:54:59   Log-likelihood                   -1153.0
F-statistic:                      248.15
Entities:                          11   P-value                           0.0000
Avg Obs:                       20.000   Distribution:                   F(2,188)
Min Obs:                       20.000
Max Obs:                       20.000   F-statistic (robust):             248.15
P-value                           0.0000
Time periods:                      20   Distribution:                   F(2,188)
Avg Obs:                       11.000
Min Obs:                       11.000
Max Obs:                       11.000

Parameter Estimates
==============================================================================
Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept     -72.394     12.732    -5.6861     0.0000     -97.509     -47.278
value          0.1167     0.0129     9.0219     0.0000      0.0912      0.1422
capital        0.3514     0.0210     16.696     0.0000      0.3099      0.3930
==============================================================================

F-test for Poolability: 18.476
P-value: 0.0000
Distribution: F(29,188)

Included effects: Entity, Time


Between OLS¶

The other panel models are straight-forward and are included for completeness.

[5]:

from linearmodels import BetweenOLS, FirstDifferenceOLS, PooledOLS

mod = BetweenOLS.from_formula("invest ~ 1 + value + capital", data=data)
print(mod.fit())

                         BetweenOLS Estimation Summary
================================================================================
Dep. Variable:                 invest   R-squared:                        0.8644
Estimator:                 BetweenOLS   R-squared (Between):              0.8644
No. Observations:                  11   R-squared (Within):               0.4195
Date:                Fri, Jul 19 2024   R-squared (Overall):              0.7616
Time:                        17:54:59   Log-likelihood                   -61.997
F-statistic:                      25.500
Entities:                          11   P-value                           0.0003
Avg Obs:                       20.000   Distribution:                     F(2,8)
Min Obs:                       20.000
Max Obs:                       20.000   F-statistic (robust):             25.500
P-value                           0.0003
Time periods:                      20   Distribution:                     F(2,8)
Avg Obs:                       11.000
Min Obs:                       11.000
Max Obs:                       11.000

Parameter Estimates
==============================================================================
Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept     -7.3825     40.444    -0.1825     0.8597     -100.65      85.881
value          0.1346     0.0269     5.0065     0.0010      0.0726      0.1966
capital        0.0297     0.1746     0.1700     0.8692     -0.3730      0.4323
==============================================================================


First Difference OLS¶

The first difference model must never include a constant since this is not identified after differencing.

[6]:

mod = FirstDifferenceOLS.from_formula("invest ~ value + capital", data=data)
print(mod.fit())

                     FirstDifferenceOLS Estimation Summary
================================================================================
Dep. Variable:                 invest   R-squared:                        0.4287
Estimator:         FirstDifferenceOLS   R-squared (Between):              0.8643
No. Observations:                 209   R-squared (Within):               0.7539
Date:                Fri, Jul 19 2024   R-squared (Overall):              0.8461
Time:                        17:54:59   Log-likelihood                   -1071.1
F-statistic:                      77.679
Entities:                          11   P-value                           0.0000
Avg Obs:                       20.000   Distribution:                   F(2,207)
Min Obs:                       20.000
Max Obs:                       20.000   F-statistic (robust):             77.679
P-value                           0.0000
Time periods:                      20   Distribution:                   F(2,207)
Avg Obs:                       11.000
Min Obs:                       11.000
Max Obs:                       11.000

Parameter Estimates
==============================================================================
Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
value          0.0891     0.0078     11.348     0.0000      0.0736      0.1045
capital        0.2786     0.0449     6.1990     0.0000      0.1900      0.3673
==============================================================================


Pooled OLS¶

The pooled OLS estimator is a special case of PanelOLS when there are no effects. It is effectively identical to OLS in statsmodels (or WLS) but is included for completeness.

[7]:

mod = PooledOLS.from_formula("invest ~ 1 + value + capital", data=data)
print(mod.fit())

                          PooledOLS Estimation Summary
================================================================================
Dep. Variable:                 invest   R-squared:                        0.8179
Estimator:                  PooledOLS   R-squared (Between):              0.8426
No. Observations:                 220   R-squared (Within):               0.7357
Date:                Fri, Jul 19 2024   R-squared (Overall):              0.8179
Time:                        17:54:59   Log-likelihood                   -1301.3
F-statistic:                      487.28
Entities:                          11   P-value                           0.0000
Avg Obs:                       20.000   Distribution:                   F(2,217)
Min Obs:                       20.000
Max Obs:                       20.000   F-statistic (robust):             487.28
P-value                           0.0000
Time periods:                      20   Distribution:                   F(2,217)
Avg Obs:                       11.000
Min Obs:                       11.000
Max Obs:                       11.000

Parameter Estimates
==============================================================================
Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept     -38.410     8.4134    -4.5654     0.0000     -54.992     -21.828
value          0.1145     0.0055     20.753     0.0000      0.1037      0.1254
capital        0.2275     0.0242     9.3904     0.0000      0.1798      0.2753
==============================================================================