Formulas and Mathematical Detail

Models

The estimators in this module are designed to estimate the parameters of a model specified by

\[y_{it}=x_{it}\beta+\alpha_{i}+\epsilon_{it}\]

where \(i\) indexes entities and \(t\) indexes time, \(x_{it}\)is \(1\) by \(k\) (may include a constant, but not for all models), \(\beta\) is \(k\) by 1, \(\alpha_{i}\) is an entity-specific shock and \(\epsilon_{it}\) is an idiosyncratic shock. The most important difference in the models is the assumptions on \(\alpha_{i}\) which determine whether the estimator is consistent and/or efficient. There are \(N\) entities and \(T_{i}\) observations for entity \(i\).

Fixed Effect Estimation (PanelOLS)

The fixed effect estimator with entity effects estimates the model

\[y_{it}-\bar{y}_{i}=(x_{it}-x_{i})\beta+\left(\epsilon_{it}-\bar{\epsilon}_{i}\right).\]

When the model includes a constant in \(x_{it}\) then the grand mean of \(y\), \(\bar{\bar{y}}=(NT)^{-1}\sum_{i=1}^{N}\sum_{t=1}^{T_{i}}y_{it}\), and \(x\) are re-added to the \(y\) and \(x\) terms, respectively. In practice this imposes the restriction that \(\sum\alpha_{i}=0\). The estimated coefficients for the remaining parameters are identical as is the estimated parameter covariance and related statistics.

The fixed effects estimator can handle general fixed effects, not just entity. While it has special cases that simplify adding entity and/or time effects, general effects can also be used. For example, one might want to use industry effect rather than firm effects. Generally the fixed effect model can be expressed as a least squares dummy variable (LSDV) estimator of the form

\[y_{it}=x_{it}\beta+d_{1,it}\gamma_{1}+d_{2,it}\gamma_{2}\]

where \(d_{1,it}\) and \(d_{2,it}\) are the dummy variables for the first and second effect. When the model contains an intercept, one dummy is dropped form each group and the dummies are orthogonalized to a constant so that they have mean 0. This allows the constant to be estimated. If the model does not contain an intercept, the first group of dummies will contain all values and the second will have one dropped.

Weights

When weights are used, the results are identical to the LSDV model where all variables – \(y_{it,}\)\(x_{it}\), \(d_{1,it}\) and \(d_{2,it}\)(if included) – are multiplied by \(\sqrt{w_{it}}\).

Random Effect Estimation (RandomEffects)

The random effects estimator makes use of a quasi-differenced model,

\[y_{it}-\hat{\theta}_{i}\bar{y}_{i}=\left(1-\hat{\theta}_{i}\right)\alpha_{i}+(x_{it}-\hat{\theta}_{i}x_{i})\beta+\left(\epsilon_{it}-\hat{\theta}_{i}\bar{\epsilon}_{i}\right)\]

where \(\theta_{i}\) is a function of the variance of \(\epsilon_{it}\) , the variance of \(\alpha_{i}\) and the number of observations for entity \(i\),

\[\hat{\theta}_{i}=1-\sqrt{\frac{\sigma_{\epsilon}^{2}}{T_{i}\sigma_{\alpha}^{2}+\sigma_{\epsilon}^{2}}}\]

so that \(\hat{\theta}_{i}\approx1\) when \(T_{i}\) is large (as long as \(\sigma_{\alpha}^{2}>0\)) or when \(\alpha_{i}\) is the only source of variance. On the other hand, \(\hat{\theta}_{i}\approx0\) when the variation due to \(\alpha_{i}\) is low. The estimator of the idiosyncratic variance is

\[\frac{\sum_{i=1}^{N}\sum_{t=1}^{T_{i}}\hat{\epsilon}_{it}^{2}}{\sum_{i=1}^{N}T_{i}-N-K+c}\]

where \(c=1\) if the model includes a constant in the regressors. The variance of \(\alpha_{i}\) is estimated using the residual sum of squares from the between regression (see below), \(RSS_{b}\),

\[\hat{\sigma}_{\alpha}^{2}=\max\{0,\frac{RSS_{b}}{N-K}-\frac{\hat{\sigma}_{\epsilon}^{2}}{\bar{T}}\}\]

where \(\bar{T}=\frac{n}{\sum_{i=1}^{n}T_{i}^{-1}}\). If the optional argument for a small sample adjustment is used, the Baltagi and Chang (1994) estimator is used. This only has an effect when the data are unbalanced.

Weights

When weights are included the averages are replaced by weighted averages and the final regression terms are all multiplied by \(\sqrt{w_{it}}\).

Between Estimation (BetweenOLS)

Between estimation regresses time averages of the dependent variable on the time averaged values of the regressors,

\[\bar{y}_{i}=\bar{x}_{i}\beta+\bar{\epsilon}_{i}.\]

When weights are included, weighted averages are used so that

\[\bar{y}_{i}^{w}=\frac{\sum_{t=1}^{T}w_{it}y_{it}}{\sum_{t=1}^{T}w_{it}}\]

with \(\bar{x}_{i}\) similarly defined. Note that if the conditional variance of \(y_{it}\propto w_{it}^{-1}\) then the conditional variance of \(\bar{y}_{i}^{w}\propto\frac{1}{\sum w_{i}}\) and these weights are used when regressing the weighted averages. Also note that when \(w_{i}=1\) but the panel is imbalanced than the conditional variance of \(\bar{y}_{i}^{w}=\bar{y}_{i}\propto\frac{1}{T_{i}}\). Re-weighting unbalanced panels is exposed through the fit option reweight.

Weights

When weights are used, the averages are replaced by weighted averages and reweighting uses the computed variance of the weighted averages in the actual between regression.

First Difference Estimation (FirstDifferenceOLS)

First difference estimation regresses first difference of the dependent variable on the first difference the regressors,

\[\Delta y_{it}=\Delta x_{it}\beta+\Delta\epsilon_{it}.\]

Weights

When weights are included, weighted are summed to that the weight on \(\Delta y_{it}\) is \(\left(w_{it}^{-1}+w_{it-1}^{-1}\right)^{-1}\) which exploits that the structure that conditional variance of \(y_{it}\propto w_{it}^{-1}\) and the variance of the difference is the sum of the variances when observations are uncorrelated.

Pooled Model Estimation (PooledOLS)

The pooled estimator is a standard regression,

\[y_{it}=x_{it}\beta+\epsilon_{it}.\]

Weights

When weights are included, the data is transformed by multiplying with the square root of the weights prior to the regression (i.e., \(y_{it}\) is replaced by \(\sqrt{w_{it}}y_{it}\) and \(x_{it}\) is similarly transformed).

Covariance Estimators

Standard Covariance Estimator (unadjusted)

The standard covariance estimator is

\[s^{2}\Sigma_{XX}^{-1}\]

where

\[\Sigma_{XX}=\sum_{i=1}^{N}\sum_{t=1}^{T_{i}}x_{it}^{\prime}x_{it}\]

and

\[s^{2}=(n_{obs}-k)\sum_{i=1}^{N}\sum_{t=1}^{Ti}\hat{\epsilon}_{it}^{2}\]

where \(n_{obs}=\sum_{i=1}^{N}T_{i}\). If the debiased options is not used, the \(k\) is omitted.

Heteroskedastic Covariance Estimator (robust)

The standard covariance estimator is

\[n_{obs}/(n_{obs}-k)\Sigma_{XX}^{-1}\hat{S}\Sigma_{XX}^{-1}\]

where

\[\hat{S}=\sum_{i=1}^{N}\sum_{t=1}^{T_{i}}\hat{\epsilon}_{it}^{2}x_{it}^{\prime}x_{it}.\]

The \(-k\) term is dropped if the debiased options is not used

Clustered Covariance Estimator

The clustered covariance estimator supports 1 and 2 way clustering.

\[n_{obs}/(n_{obs}-k)\Sigma_{XX}^{-1}\hat{S}_{\mathcal{G}}\Sigma_{XX}^{-1}\]

where in the case of one-way clustering,

\[\hat{S}_{\mathcal{G}}=\frac{G}{G-1}\frac{n_{obs}-1}{n_{obs}}\frac{1}{n_{obs}}\sum_{g=1}^{G}\xi_{g}^{\prime}\xi_{g}\]

where

\[\xi_{g}=\sum_{it\in G_{g}}\hat{\epsilon}_{ii}x_{it}\]

and \(it\in G_{g}\) indicates that observation belongs to group \(g\). The two-way clustered replaces \(\hat{S}_{\mathcal{G}}\)by

\[\hat{S}_{\mathcal{G}_{1}}+\hat{S}_{\mathcal{G}_{2}}-\hat{S}_{\mathcal{G}_{1}\cap\mathcal{G}_{2}}.\]

Where the group debiasing is applies individually to each of the three components depending on the number of groups in the three estimators. If the group debias term is not used, the expression \(\frac{G}{G-1}\frac{n_{obs}-1}{n_{obs}}\) is omitted. The \(-k\) term is dropped if the debiased estimator is not used.

Clustering by Variables used as Effects

When clustering by the same variable used in the effect estimation, e.g., entity effects with clustering by entity, there the degrees of freedom used in estimating the effects are not counted.

Driscoll-Kraay Covariance Estimator

The Driscoll-Kraay covariance estimator may be appropriate when the number of time periods is relatively large (e.g., a large T panel), and are given by

\[n_{obs}/(n_{obs}-k)\Sigma_{XX}^{-1}\hat{S}_{HAC}\Sigma_{XX}^{-1}\]

where

\[\begin{split}\begin{aligned} \hat{S}_{HAC} & =\hat{\Gamma}_{0}+\sum_{i=1}^{bw}K(i,bw)(\hat{\Gamma}_{1}+\hat{\Gamma}_{1}^{\prime})\\ \hat{\Gamma}_{i} & =\sum_{t=i+1}^{T}\xi_{t}^{\prime}\xi_{t}\\ \xi_{t} & =\sum_{i=1}^{n_{t}}\hat{\epsilon}_{it}x_{it}\end{aligned}\end{split}\]

and \(K\left(i,bw\right)\) is a kernel weighting function. Kernel supported include the Bartlett, Parzen and Quadratic Spectral.

\(R^{2}\) Calculation

There are 3 different \(R^{2}\) estimates computed for the models produced using 2 different methods. When models contain entity effects and are not weighted, the two methods agree.

The correlation-based measures do not make use of weights. They are prefixed as \(\texttt{corr\_squared}\) in estimation results.

Between \(R^{2}\)

Define \(\bar{y}_{iw}\) and \(\bar{x}_{iw}\) to be entity-wise weighted means, and the entity-wise weight to be

\[w_{i}=\frac{\sum_{t}w_{it}}{\left(NT\right)^{-1}\sum_{i}\sum_{t}w_{it}}.\]

The weighted between residuals are

\[\bar{\epsilon}_{iw}=\sqrt{w_{i}}\left(\bar{y}_{iw}-\bar{x}_{iw}\hat{\beta}\right)\]

and the weighted between \(R^{2}\) is defined

\[R_{B}^{2}=1-\frac{\bar{\epsilon}_{w}^{\prime}\bar{\epsilon}_{w}}{\ddot{\bar{y}}_{w}^{\prime}\ddot{\bar{y}}_{w}}\]
\[\]

where \(\ddot{\bar{y}}_{iw}=\bar{y}_{iw}\) if the model does not contains a constant or \(\ddot{\bar{y}}_{iw}=\bar{y}_{iw}-\bar{\bar{y}}_{w}\) if the model does. The overall weighted mean is defined

\[\bar{\bar{y}}_{w}=\frac{\sum_{i=1}^{N}w_{i}\bar{y}_{iw}}{\sum_{i=1}^{N}w_{i}}.\]

Between\(R^{2}\) (correlation method)

This measure matches Stata. It does not reflect weighting.

\[\rho_{B}^{2}=\textrm{Corr}\left[\bar{x}_{t}\hat{\beta},\bar{y}_{i}\right]^{2}\]

Overall \(R^{2}\)

Define the weighted residual as

\[\epsilon_{it,w}=\sqrt{w}\left(y_{it}-x_{it}\hat{\beta}\right)\]

and the mean deviated dependent as

\[\tilde{y}_{it,w}=\sqrt{w_{it}}\left(y_{it}-\bar{\bar{y}}_{w}\right)\]

where

\[\bar{\bar{y}}_{w}=\frac{\sum_{i}\sum_{t}w_{it}y_{it}}{\sum_{i}\sum_{t}w_{it}}\]

if the model contains a constant or \(\bar{\bar{y}}_{w}=0\) if not. Then

\[R_{O}^{2}=1-\frac{\sum_{i}\sum_{t}\epsilon_{it,w}^{2}}{\sum_{i}\sum_{t}\tilde{y}_{it,w}^{2}}.\]

Overall \(R^{2}\) (correlation method)

This measure matches Stata. It does not reflect weighting.

\[\rho_{O}^{2}=\textrm{Corr}\left[x_{it}\hat{\beta},y_{it}\right]\]

Within \(R^{2}\)

Define \(\tilde{y}_{it,w}=\sqrt{w_{it}}\left(y_{it}-\bar{y}_{iw}\right)\) and \(\tilde{x}_{it,w}=\sqrt{w_{it}}\left(x_{it}-\bar{x}_{iw}\right)\) and the residuals as

\[\tilde{\epsilon}_{it,w}=\tilde{y}_{it,w}-\tilde{x}_{it,w}\hat{\beta}.\]

The within\(R^{2}\) is defined as

\[R_{W}^{2}=1-\frac{\sum_{i}\sum_{t}\tilde{\epsilon}_{it,w}}{\sum_{i}\sum_{t}\tilde{y}_{it,w}}\]

Within\(R^{2}\) (correlation method)

This measure matches Stata. It does not reflect weighting.

\[\rho_{W}^{2}=\textrm{Corr}\left[\left(x_{it}-\bar{x}_{t}\right)\hat{\beta},\left(y_{it}-\bar{y}_{i}\right)\right]^{2}\]