# Formulas and Mathematical Detail¶

## Models¶

The estimators in this module are designed to estimate the parameters of a model specified by

where \(i\) indexes entities and \(t\) indexes time, \(x_{it}\)is \(1\) by \(k\) (may include a constant, but not for all models), \(\beta\) is \(k\) by 1, \(\alpha_{i}\) is an entity-specific shock and \(\epsilon_{it}\) is an idiosyncratic shock. The most important difference in the models is the assumptions on \(\alpha_{i}\) which determine whether the estimator is consistent and/or efficient. There are \(N\) entities and \(T_{i}\) observations for entity \(i\).

### Fixed Effect Estimation (PanelOLS)¶

The fixed effect estimator with entity effects estimates the model

When the model includes a constant in \(x_{it}\) then the grand mean of \(y\), \(\bar{\bar{y}}=(NT)^{-1}\sum_{i=1}^{N}\sum_{t=1}^{T_{i}}y_{it}\), and \(x\) are re-added to the \(y\) and \(x\) terms, respectively. In practice this imposes the restriction that \(\sum\alpha_{i}=0\). The estimated coefficients for the remaining parameters are identical as is the estimated parameter covariance and related statistics.

The fixed effects estimator can handle general fixed effects, not just entity. While it has special cases that simplify adding entity and/or time effects, general effects can also be used. For example, one might want to use industry effect rather than firm effects. Generally the fixed effect model can be expressed as a least squares dummy variable (LSDV) estimator of the form

where \(d_{1,it}\) and \(d_{2,it}\) are the dummy variables for the first and second effect. When the model contains an intercept, one dummy is dropped form each group and the dummies are orthogonalized to a constant so that they have mean 0. This allows the constant to be estimated. If the model does not contain an intercept, the first group of dummies will contain all values and the second will have one dropped.

**Weights**

When weights are used, the results are identical to the LSDV model where all variables – \(y_{it,}\)\(x_{it}\), \(d_{1,it}\) and \(d_{2,it}\)(if included) – are multiplied by \(\sqrt{w_{it}}\).

### Random Effect Estimation (RandomEffects)¶

The random effects estimator makes use of a quasi-differenced model,

where \(\theta_{i}\) is a function of the variance of \(\epsilon_{it}\) , the variance of \(\alpha_{i}\) and the number of observations for entity \(i\),

so that \(\hat{\theta}_{i}\approx1\) when \(T_{i}\) is large (as long as \(\sigma_{\alpha}^{2}>0\)) or when \(\alpha_{i}\) is the only source of variance. On the other hand, \(\hat{\theta}_{i}\approx0\) when the variation due to \(\alpha_{i}\) is low. The estimator of the idiosyncratic variance is

where \(c=1\) if the model includes a constant in the regressors. The variance of \(\alpha_{i}\) is estimated using the residual sum of squares from the between regression (see below), \(RSS_{b}\),

where \(\bar{T}=\frac{n}{\sum_{i=1}^{n}T_{i}^{-1}}\). If the optional argument for a small sample adjustment is used, the Baltagi and Chang (1994) estimator is used. This only has an effect when the data are unbalanced.

**Weights**

When weights are included the averages are replaced by weighted averages and the final regression terms are all multiplied by \(\sqrt{w_{it}}\).

### Between Estimation (BetweenOLS)¶

Between estimation regresses time averages of the dependent variable on the time averaged values of the regressors,

When weights are included, weighted averages are used so that

with \(\bar{x}_{i}\) similarly defined. Note that if the conditional
variance of \(y_{it}\propto w_{it}^{-1}\) then the conditional
variance of \(\bar{y}_{i}^{w}\propto\frac{1}{\sum w_{i}}\) and these
weights are used when regressing the weighted averages. Also note that
when \(w_{i}=1\) but the panel is imbalanced than the conditional
variance of \(\bar{y}_{i}^{w}=\bar{y}_{i}\propto\frac{1}{T_{i}}\).
Re-weighting unbalanced panels is exposed through the fit option
`reweight`

.

**Weights**

When weights are used, the averages are replaced by weighted averages and reweighting uses the computed variance of the weighted averages in the actual between regression.

### First Difference Estimation (FirstDifferenceOLS)¶

First difference estimation regresses first difference of the dependent variable on the first difference the regressors,

**Weights**

When weights are included, weighted are summed to that the weight on \(\Delta y_{it}\) is \(\left(w_{it}^{-1}+w_{it-1}^{-1}\right)^{-1}\) which exploits that the structure that conditional variance of \(y_{it}\propto w_{it}^{-1}\) and the variance of the difference is the sum of the variances when observations are uncorrelated.

### Pooled Model Estimation (PooledOLS)¶

The pooled estimator is a standard regression,

**Weights**

When weights are included, the data is transformed by multiplying with the square root of the weights prior to the regression (i.e., \(y_{it}\) is replaced by \(\sqrt{w_{it}}y_{it}\) and \(x_{it}\) is similarly transformed).

## Covariance Estimators¶

### Standard Covariance Estimator (unadjusted)¶

The standard covariance estimator is

where

and

where \(n_{obs}=\sum_{i=1}^{N}T_{i}\). If the debiased options is not used, the \(k\) is omitted.

### Heteroskedastic Covariance Estimator (robust)¶

The standard covariance estimator is

where

The \(-k\) term is dropped if the debiased options is not used

### Clustered Covariance Estimator¶

The clustered covariance estimator supports 1 and 2 way clustering.

where in the case of one-way clustering,

where

and \(it\in G_{g}\) indicates that observation belongs to group \(g\). The two-way clustered replaces \(\hat{S}_{\mathcal{G}}\)by

Where the group debiasing is applies individually to each of the three components depending on the number of groups in the three estimators. If the group debias term is not used, the expression \(\frac{G}{G-1}\frac{n_{obs}-1}{n_{obs}}\) is omitted. The \(-k\) term is dropped if the debiased estimator is not used.

**Clustering by Variables used as Effects**

When clustering by the same variable used in the effect estimation,
e.g., entity effects with clustering by entity, there the degrees of
freedom used in estimating the effects are *not* counted.

### Driscoll-Kraay Covariance Estimator¶

The Driscoll-Kraay covariance estimator may be appropriate when the number of time periods is relatively large (e.g., a large T panel), and are given by

where

and \(K\left(i,bw\right)\) is a kernel weighting function. Kernel supported include the Bartlett, Parzen and Quadratic Spectral.

## \(R^{2}\) Calculation¶

There are 3 different \(R^{2}\) estimates computed for the models produced using 2 different methods. When models contain entity effects and are not weighted, the two methods agree.

The correlation-based measures **do not** make use of weights. They are
prefixed as \(\texttt{corr\_squared}\) in estimation results.

### Between \(R^{2}\)¶

Define \(\bar{y}_{iw}\) and \(\bar{x}_{iw}\) to be entity-wise weighted means, and the entity-wise weight to be

The weighted between residuals are

and the weighted between \(R^{2}\) is defined

where \(\ddot{\bar{y}}_{iw}=\bar{y}_{iw}\) if the model does not contains a constant or \(\ddot{\bar{y}}_{iw}=\bar{y}_{iw}-\bar{\bar{y}}_{w}\) if the model does. The overall weighted mean is defined

#### Between\(R^{2}\) (correlation method)¶

This measure matches Stata. It does not reflect weighting.

### Overall \(R^{2}\)¶

Define the weighted residual as

and the mean deviated dependent as

where

if the model contains a constant or \(\bar{\bar{y}}_{w}=0\) if not. Then

#### Overall \(R^{2}\) (correlation method)¶

This measure matches Stata. It does not reflect weighting.

### Within \(R^{2}\)¶

Define \(\tilde{y}_{it,w}=\sqrt{w_{it}}\left(y_{it}-\bar{y}_{iw}\right)\) and \(\tilde{x}_{it,w}=\sqrt{w_{it}}\left(x_{it}-\bar{x}_{iw}\right)\) and the residuals as

The within\(R^{2}\) is defined as

#### Within\(R^{2}\) (correlation method)¶

This measure matches Stata. It does not reflect weighting.