Examples¶
These examples cover the models available for estimating panel models. The initial examples all ignore covariance options and so use the default classic covariance which is appropriate for homoskedastic data. The alternative covariance options are described at the end of this document.
Loading data¶
These examples all make use of the wage panel from
Vella and M. Verbeek (1998), “Whose Wages Do Unions Raise? A Dynamic Model of Unionism and Wage Rate Determination for Young Men,” Journal of Applied Econometrics 13, 163-183.
The data set consists of wages and characteristics for men during the 1980s. The entity identifier is nr
and the time identified is year
. This data is used extensively in Chapter 14 of Introduction to Econometrics by Jeffrey Wooldridge.
Here a MultiIndex
DataFrame
is used to hold the data in a format that can be understood as a panel. Before setting the index, a year Categorical
is created which facilitated making dummies.
[1]:
import pandas as pd
from linearmodels.datasets import wage_panel
data = wage_panel.load()
year = pd.Categorical(data.year)
data = data.set_index(["nr", "year"])
data["year"] = year
print(wage_panel.DESCR)
print(data.head())
F. Vella and M. Verbeek (1998), "Whose Wages Do Unions Raise? A Dynamic Model
of Unionism and Wage Rate Determination for Young Men," Journal of Applied
Econometrics 13, 163-183.
nr person identifier
year 1980 to 1987
black =1 if black
exper labor market experience
hisp =1 if Hispanic
hours annual hours worked
married =1 if married
educ years of schooling
union =1 if in union
lwage log(wage)
expersq exper^2
occupation Occupation code
black exper hisp hours married educ union lwage expersq \
nr year
13 1980 0 1 0 2672 0 14 0 1.197540 1
1981 0 2 0 2320 0 14 1 1.853060 4
1982 0 3 0 2940 0 14 0 1.344462 9
1983 0 4 0 2960 0 14 0 1.433213 16
1984 0 5 0 3071 0 14 0 1.568125 25
occupation year
nr year
13 1980 9 1980
1981 9 1981
1982 9 1982
1983 9 1983
1984 5 1984
Basic regression on panel data¶
PooledOLS
is just plain OLS that understands that various panel data structures. It is useful as a base model. Here the log wage is modeled using all of the variables and time dummies.
[2]:
import statsmodels.api as sm
from linearmodels.panel import PooledOLS
exog_vars = ["black", "hisp", "exper", "expersq", "married", "educ", "union", "year"]
exog = sm.add_constant(data[exog_vars])
mod = PooledOLS(data.lwage, exog)
pooled_res = mod.fit()
print(pooled_res)
PooledOLS Estimation Summary
================================================================================
Dep. Variable: lwage R-squared: 0.1893
Estimator: PooledOLS R-squared (Between): 0.2066
No. Observations: 4360 R-squared (Within): 0.1692
Date: Wed, Nov 09 2022 R-squared (Overall): 0.1893
Time: 06:51:48 Log-likelihood -2982.0
Cov. Estimator: Unadjusted
F-statistic: 72.459
Entities: 545 P-value 0.0000
Avg Obs: 8.0000 Distribution: F(14,4345)
Min Obs: 8.0000
Max Obs: 8.0000 F-statistic (robust): 72.459
P-value 0.0000
Time periods: 8 Distribution: F(14,4345)
Avg Obs: 545.00
Min Obs: 545.00
Max Obs: 545.00
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
const 0.0921 0.0783 1.1761 0.2396 -0.0614 0.2455
black -0.1392 0.0236 -5.9049 0.0000 -0.1855 -0.0930
hisp 0.0160 0.0208 0.7703 0.4412 -0.0248 0.0568
exper 0.0672 0.0137 4.9095 0.0000 0.0404 0.0941
expersq -0.0024 0.0008 -2.9413 0.0033 -0.0040 -0.0008
married 0.1083 0.0157 6.8997 0.0000 0.0775 0.1390
educ 0.0913 0.0052 17.442 0.0000 0.0811 0.1016
union 0.1825 0.0172 10.635 0.0000 0.1488 0.2161
year.1981 0.0583 0.0304 1.9214 0.0548 -0.0012 0.1178
year.1982 0.0628 0.0332 1.8900 0.0588 -0.0023 0.1279
year.1983 0.0620 0.0367 1.6915 0.0908 -0.0099 0.1339
year.1984 0.0905 0.0401 2.2566 0.0241 0.0119 0.1691
year.1985 0.1092 0.0434 2.5200 0.0118 0.0243 0.1942
year.1986 0.1420 0.0464 3.0580 0.0022 0.0509 0.2330
year.1987 0.1738 0.0494 3.5165 0.0004 0.0769 0.2707
==============================================================================
Comparing models¶
Model results can be compared using compare
. compare
accepts lists of results, a dictionary of results where the key is interpreted as the model name.
[11]:
from linearmodels.panel import compare
print(compare({"BE": be_res, "RE": re_res, "Pooled": pooled_res}))
Model Comparison
===============================================================
BE RE Pooled
---------------------------------------------------------------
Dep. Variable lwage lwage lwage
Estimator BetweenOLS RandomEffects PooledOLS
No. Observations 545 4360 4360
Cov. Est. Unadjusted Unadjusted Unadjusted
R-squared 0.2155 0.1806 0.1893
R-Squared (Within) 0.1141 0.1799 0.1692
R-Squared (Between) 0.2155 0.1853 0.2066
R-Squared (Overall) 0.1686 0.1828 0.1893
F-statistic 24.633 68.409 72.459
P-value (F-stat) 0.0000 0.0000 0.0000
===================== ============ =============== ============
const 0.2836 0.0234 0.0921
(1.5897) (0.1546) (1.1761)
black -0.1414 -0.1394 -0.1392
(-2.8915) (-2.9054) (-5.9049)
hisp 0.0100 0.0217 0.0160
(0.2355) (0.5078) (0.7703)
exper 0.0278 0.1058 0.0672
(2.4538) (6.8706) (4.9095)
married 0.1416 0.0638 0.1083
(3.4346) (3.8035) (6.8997)
educ 0.0913 0.0919 0.0913
(8.5159) (8.5744) (17.442)
union 0.2587 0.1059 0.1825
(5.6214) (5.9289) (10.635)
expersq -0.0047 -0.0024
(-6.8623) (-2.9413)
year.1981 0.0404 0.0583
(1.6362) (1.9214)
year.1982 0.0309 0.0628
(0.9519) (1.8900)
year.1983 0.0202 0.0620
(0.4840) (1.6915)
year.1984 0.0430 0.0905
(0.8350) (2.2566)
year.1985 0.0577 0.1092
(0.9383) (2.5200)
year.1986 0.0918 0.1420
(1.2834) (3.0580)
year.1987 0.1348 0.1738
(1.6504) (3.5165)
---------------------------------------------------------------
T-stats reported in parentheses
Covariance options¶
Heteroskedasticity Robust Covariance¶
White”s robust covariance can be used by setting cov_type="robust
. This estimator adds some robustness against certain types of specification issues but should not be used when using fixed effects (entity effects) since it is no longer robust. Instead a clustered covariance is required.
[12]:
exog_vars = ["black", "hisp", "exper", "expersq", "married", "educ", "union"]
exog = sm.add_constant(data[exog_vars])
mod = PooledOLS(data.lwage, exog)
robust = mod.fit(cov_type="robust")
Clustered by Entity¶
The usual variable to cluster are are entity or entity and time. The can be implemented using cov_type="clustered"
and the additional keyword arguments cluster_entity=True
and/or cluster_time=True
.
[13]:
clust_entity = mod.fit(cov_type="clustered", cluster_entity=True)
This next example clusters by both.
[14]:
clust_entity_time = mod.fit(
cov_type="clustered", cluster_entity=True, cluster_time=True
)
An OrderedDict
is used to hold the results for comparing models. This allows the models to be named as well as for the order of the models to be specified. A standard dict
will produce effectively random order.
Clustering on entity reduced the t-stats across the board. This suggests there is important correlation in the residuals per entity. Clustering by both also decreases the t-stats which suggests that there is cross-sectional dependence in the data. Note: clustering by entity addresses correlation across time and clustering by time controls for correlation between entities in a time period.
[15]:
from collections import OrderedDict
res = OrderedDict()
res["Robust"] = robust
res["Entity"] = clust_entity
res["Entity-Time"] = clust_entity_time
print(compare(res))
Model Comparison
=========================================================
Robust Entity Entity-Time
---------------------------------------------------------
Dep. Variable lwage lwage lwage
Estimator PooledOLS PooledOLS PooledOLS
No. Observations 4360 4360 4360
Cov. Est. Robust Clustered Clustered
R-squared 0.1866 0.1866 0.1866
R-Squared (Within) 0.1679 0.1679 0.1679
R-Squared (Between) 0.2027 0.2027 0.2027
R-Squared (Overall) 0.1866 0.1866 0.1866
F-statistic 142.61 142.61 142.61
P-value (F-stat) 0.0000 0.0000 0.0000
===================== =========== =========== ===========
const -0.0347 -0.0347 -0.0347
(-0.5360) (-0.2892) (-0.3145)
black -0.1438 -0.1438 -0.1438
(-5.9045) (-2.8727) (-3.0067)
hisp 0.0157 0.0157 0.0157
(0.7952) (0.4008) (0.4428)
exper 0.0892 0.0892 0.0892
(8.7881) (7.1728) (6.3223)
expersq -0.0028 -0.0028 -0.0028
(-4.1934) (-3.2747) (-3.1571)
married 0.1077 0.1077 0.1077
(7.0525) (4.1314) (4.8989)
educ 0.0994 0.0994 0.0994
(21.626) (10.802) (12.296)
union 0.1801 0.1801 0.1801
(11.087) (6.5343) (6.6732)
---------------------------------------------------------
T-stats reported in parentheses
Other clusters¶
Other clusters can be used by directly passing integer arrays (1 or 2 columns, or a 1-d array) using the input clusters
. This example clustered by occupation, which is probably not a reliable variable to cluster on since there are only 9 groups and the usual theory for clustered standard errors requires that the number of clusters is large.
[16]:
clust_entity = mod.fit(cov_type="clustered", clusters=data.occupation)
print(data.occupation.value_counts())
print(clust_entity)
5 934
6 881
9 509
4 486
1 453
7 401
2 399
3 233
8 64
Name: occupation, dtype: int64
PooledOLS Estimation Summary
================================================================================
Dep. Variable: lwage R-squared: 0.1866
Estimator: PooledOLS R-squared (Between): 0.2027
No. Observations: 4360 R-squared (Within): 0.1679
Date: Wed, Nov 09 2022 R-squared (Overall): 0.1866
Time: 06:51:49 Log-likelihood -2989.2
Cov. Estimator: Clustered
F-statistic: 142.61
Entities: 545 P-value 0.0000
Avg Obs: 8.0000 Distribution: F(7,4352)
Min Obs: 8.0000
Max Obs: 8.0000 F-statistic (robust): 116.58
P-value 0.0000
Time periods: 8 Distribution: F(7,4352)
Avg Obs: 545.00
Min Obs: 545.00
Max Obs: 545.00
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
const -0.0347 0.1479 -0.2346 0.8145 -0.3247 0.2553
black -0.1438 0.0297 -4.8469 0.0000 -0.2020 -0.0857
hisp 0.0157 0.0266 0.5892 0.5557 -0.0365 0.0679
exper 0.0892 0.0134 6.6513 0.0000 0.0629 0.1155
expersq -0.0028 0.0009 -3.2442 0.0012 -0.0046 -0.0011
married 0.1077 0.0139 7.7322 0.0000 0.0804 0.1350
educ 0.0994 0.0112 8.8846 0.0000 0.0775 0.1213
union 0.1801 0.0320 5.6323 0.0000 0.1174 0.2428
==============================================================================