{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Unit Root Testing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_This setup code is required to run in an IPython notebook_" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import warnings\n", "\n", "warnings.simplefilter(\"ignore\")\n", "\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import seaborn\n", "\n", "seaborn.set_style(\"darkgrid\")\n", "plt.rc(\"figure\", figsize=(16, 6))\n", "plt.rc(\"savefig\", dpi=90)\n", "plt.rc(\"font\", family=\"sans-serif\")\n", "plt.rc(\"font\", size=14)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Most examples will make use of the Default premium, which is the difference between the yields of BAA and AAA rated corporate bonds. The data is downloaded from FRED using pandas." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import arch.data.default\n", "import pandas as pd\n", "import statsmodels.api as sm\n", "\n", "default_data = arch.data.default.load()\n", "default = default_data.BAA.copy()\n", "default.name = \"default\"\n", "default = default - default_data.AAA.values\n", "fig = default.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Default premium is clearly highly persistent. A simple check of the autocorrelations confirms this." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "acf = pd.DataFrame(sm.tsa.stattools.acf(default), columns=[\"ACF\"])\n", "fig = acf[1:].plot(kind=\"bar\", title=\"Autocorrelations\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Augmented Dickey-Fuller Testing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Augmented Dickey-Fuller test is the most common unit root test used. It is a regression of the first difference of the variable on its lagged level as well as additional lags of the first difference. The null is that the series contains a unit root, and the (one-sided) alternative is that the series is stationary. \n", "\n", "By default, the number of lags is selected by minimizing the AIC across a range of lag lengths (which can be set using `max_lag` when initializing the model). Additionally, the basic test includes a constant in the ADF regression.\n", "\n", "These results indicate that the Default premium is stationary." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Augmented Dickey-Fuller Results \n", "=====================================\n", "Test Statistic -3.356\n", "P-value 0.013\n", "Lags 21\n", "-------------------------------------\n", "\n", "Trend: Constant\n", "Critical Values: -3.44 (1%), -2.86 (5%), -2.57 (10%)\n", "Null Hypothesis: The process contains a unit root.\n", "Alternative Hypothesis: The process is weakly stationary.\n" ] } ], "source": [ "from arch.unitroot import ADF\n", "\n", "adf = ADF(default)\n", "print(adf.summary().as_text())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The number of lags can be directly set using `lags`. Changing the number of lags makes no difference to the conclusion.\n", "\n", "**Note**: The ADF assumes residuals are white noise, and that the number of lags is sufficient to pick up any dependence in the data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setting the number of lags" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Augmented Dickey-Fuller Results \n", "=====================================\n", "Test Statistic -3.582\n", "P-value 0.006\n", "Lags 5\n", "-------------------------------------\n", "\n", "Trend: Constant\n", "Critical Values: -3.44 (1%), -2.86 (5%), -2.57 (10%)\n", "Null Hypothesis: The process contains a unit root.\n", "Alternative Hypothesis: The process is weakly stationary.\n" ] } ], "source": [ "adf = ADF(default, lags=5)\n", "print(adf.summary().as_text())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Deterministic terms" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The deterministic terms can be altered using `trend`. The options are:\n", "\n", "* `'nc'` : No deterministic terms\n", "* `'c'` : Constant only\n", "* `'ct'` : Constant and time trend\n", "* `'ctt'` : Constant, time trend and time-trend squared\n", "\n", "Changing the type of constant also makes no difference for this data." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Augmented Dickey-Fuller Results \n", "=====================================\n", "Test Statistic -3.786\n", "P-value 0.017\n", "Lags 5\n", "-------------------------------------\n", "\n", "Trend: Constant and Linear Time Trend\n", "Critical Values: -3.97 (1%), -3.41 (5%), -3.13 (10%)\n", "Null Hypothesis: The process contains a unit root.\n", "Alternative Hypothesis: The process is weakly stationary.\n" ] } ], "source": [ "adf = ADF(default, trend=\"ct\", lags=5)\n", "print(adf.summary().as_text())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Regression output" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ADF uses a standard regression when computing results. These can be accesses using `regression`." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: y R-squared: 0.095\n", "Model: OLS Adj. R-squared: 0.090\n", "Method: Least Squares F-statistic: 17.83\n", "Date: Tue, 18 May 2021 Prob (F-statistic): 1.30e-22\n", "Time: 13:22:02 Log-Likelihood: 630.15\n", "No. Observations: 1194 AIC: -1244.\n", "Df Residuals: 1186 BIC: -1204.\n", "Df Model: 7 \n", "Covariance Type: nonrobust \n", "==============================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "Level.L1 -0.0248 0.007 -3.786 0.000 -0.038 -0.012\n", "Diff.L1 0.2229 0.029 7.669 0.000 0.166 0.280\n", "Diff.L2 -0.0525 0.030 -1.769 0.077 -0.111 0.006\n", "Diff.L3 -0.1363 0.029 -4.642 0.000 -0.194 -0.079\n", "Diff.L4 -0.0510 0.030 -1.727 0.084 -0.109 0.007\n", "Diff.L5 0.0440 0.029 1.516 0.130 -0.013 0.101\n", "const 0.0383 0.013 2.858 0.004 0.012 0.065\n", "trend -1.586e-05 1.29e-05 -1.230 0.219 -4.11e-05 9.43e-06\n", "==============================================================================\n", "Omnibus: 665.553 Durbin-Watson: 2.000\n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 146083.295\n", "Skew: -1.425 Prob(JB): 0.00\n", "Kurtosis: 57.113 Cond. No. 5.70e+03\n", "==============================================================================\n", "\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", "[2] The condition number is large, 5.7e+03. This might indicate that there are\n", "strong multicollinearity or other numerical problems.\n" ] } ], "source": [ "reg_res = adf.regression\n", "print(reg_res.summary().as_text())" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "\n", "resids = pd.DataFrame(reg_res.resid)\n", "resids.index = default.index[6:]\n", "resids.columns = [\"resids\"]\n", "fig = resids.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the number lags was directly set, it is good to check whether the residuals appear to be white noise." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "acf = pd.DataFrame(sm.tsa.stattools.acf(reg_res.resid), columns=[\"ACF\"])\n", "fig = acf[1:].plot(kind=\"bar\", title=\"Residual Autocorrelations\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dickey-Fuller GLS Testing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Dickey-Fuller GLS test is an improved version of the ADF which uses a GLS-detrending regression before running an ADF regression with no additional deterministic terms. This test is only available with a constant or constant and time trend (`trend='c'` or `trend='ct'`).\n", "\n", "The results of this test agree with the ADF results." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Dickey-Fuller GLS Results \n", "=====================================\n", "Test Statistic -2.322\n", "P-value 0.020\n", "Lags 21\n", "-------------------------------------\n", "\n", "Trend: Constant\n", "Critical Values: -2.59 (1%), -1.96 (5%), -1.64 (10%)\n", "Null Hypothesis: The process contains a unit root.\n", "Alternative Hypothesis: The process is weakly stationary.\n" ] } ], "source": [ "from arch.unitroot import DFGLS\n", "\n", "dfgls = DFGLS(default)\n", "print(dfgls.summary().as_text())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The trend can be altered using `trend`. The conclusion is the same. " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Dickey-Fuller GLS Results \n", "=====================================\n", "Test Statistic -3.464\n", "P-value 0.009\n", "Lags 21\n", "-------------------------------------\n", "\n", "Trend: Constant and Linear Time Trend\n", "Critical Values: -3.43 (1%), -2.86 (5%), -2.58 (10%)\n", "Null Hypothesis: The process contains a unit root.\n", "Alternative Hypothesis: The process is weakly stationary.\n" ] } ], "source": [ "dfgls = DFGLS(default, trend=\"ct\")\n", "print(dfgls.summary().as_text())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Phillips-Perron Testing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Phillips-Perron test is similar to the ADF except that the regression run does not include lagged values of the first differences. Instead, the PP test fixed the t-statistic using a long run variance estimation, implemented using a Newey-West covariance estimator. \n", "\n", "By default, the number of lags is automatically set, although this can be overridden using `lags`." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Phillips-Perron Test (Z-tau) \n", "=====================================\n", "Test Statistic -3.898\n", "P-value 0.002\n", "Lags 23\n", "-------------------------------------\n", "\n", "Trend: Constant\n", "Critical Values: -3.44 (1%), -2.86 (5%), -2.57 (10%)\n", "Null Hypothesis: The process contains a unit root.\n", "Alternative Hypothesis: The process is weakly stationary.\n" ] } ], "source": [ "from arch.unitroot import PhillipsPerron\n", "\n", "pp = PhillipsPerron(default)\n", "print(pp.summary().as_text())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is important that the number of lags is sufficient to pick up any dependence in the data." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Phillips-Perron Test (Z-tau) \n", "=====================================\n", "Test Statistic -4.024\n", "P-value 0.001\n", "Lags 12\n", "-------------------------------------\n", "\n", "Trend: Constant\n", "Critical Values: -3.44 (1%), -2.86 (5%), -2.57 (10%)\n", "Null Hypothesis: The process contains a unit root.\n", "Alternative Hypothesis: The process is weakly stationary.\n" ] } ], "source": [ "pp = PhillipsPerron(default, lags=12)\n", "print(pp.summary().as_text())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The trend can be changed as well." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Phillips-Perron Test (Z-tau) \n", "=====================================\n", "Test Statistic -4.262\n", "P-value 0.004\n", "Lags 12\n", "-------------------------------------\n", "\n", "Trend: Constant and Linear Time Trend\n", "Critical Values: -3.97 (1%), -3.41 (5%), -3.13 (10%)\n", "Null Hypothesis: The process contains a unit root.\n", "Alternative Hypothesis: The process is weakly stationary.\n" ] } ], "source": [ "pp = PhillipsPerron(default, trend=\"ct\", lags=12)\n", "print(pp.summary().as_text())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, the PP testing framework includes two types of tests. One which uses an ADF-type regression of the first difference on the level, the other which regresses the level on the level. The default is the `tau` test, which is similar to an ADF regression, although this can be changed using `test_type='rho'`." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Phillips-Perron Test (Z-rho) \n", "=====================================\n", "Test Statistic -36.114\n", "P-value 0.002\n", "Lags 12\n", "-------------------------------------\n", "\n", "Trend: Constant and Linear Time Trend\n", "Critical Values: -29.16 (1%), -21.60 (5%), -18.17 (10%)\n", "Null Hypothesis: The process contains a unit root.\n", "Alternative Hypothesis: The process is weakly stationary.\n" ] } ], "source": [ "pp = PhillipsPerron(default, test_type=\"rho\", trend=\"ct\", lags=12)\n", "print(pp.summary().as_text())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## KPSS Testing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The KPSS test differs from the three previous in that the null is a stationary process and the alternative is a unit root. \n", "\n", "Note that here the null is rejected which indicates that the series might be a unit root." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " KPSS Stationarity Test Results \n", "=====================================\n", "Test Statistic 1.088\n", "P-value 0.002\n", "Lags 20\n", "-------------------------------------\n", "\n", "Trend: Constant\n", "Critical Values: 0.74 (1%), 0.46 (5%), 0.35 (10%)\n", "Null Hypothesis: The process is weakly stationary.\n", "Alternative Hypothesis: The process contains a unit root.\n" ] } ], "source": [ "from arch.unitroot import KPSS\n", "\n", "kpss = KPSS(default)\n", "print(kpss.summary().as_text())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Changing the trend does not alter the conclusion." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " KPSS Stationarity Test Results \n", "=====================================\n", "Test Statistic 0.393\n", "P-value 0.000\n", "Lags 20\n", "-------------------------------------\n", "\n", "Trend: Constant and Linear Time Trend\n", "Critical Values: 0.22 (1%), 0.15 (5%), 0.12 (10%)\n", "Null Hypothesis: The process is weakly stationary.\n", "Alternative Hypothesis: The process contains a unit root.\n" ] } ], "source": [ "kpss = KPSS(default, trend=\"ct\")\n", "print(kpss.summary().as_text())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Zivot-Andrews Test\n", "\n", "The Zivot-Andrews test allows the possibility of a single structural break in the series. Here we test the default using the test." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Zivot-Andrews Results \n", "=====================================\n", "Test Statistic -4.900\n", "P-value 0.040\n", "Lags 21\n", "-------------------------------------\n", "\n", "Trend: Constant\n", "Critical Values: -5.28 (1%), -4.81 (5%), -4.57 (10%)\n", "Null Hypothesis: The process contains a unit root with a single structural break.\n", "Alternative Hypothesis: The process is trend and break stationary.\n" ] } ], "source": [ "from arch.unitroot import ZivotAndrews\n", "\n", "za = ZivotAndrews(default)\n", "print(za.summary().as_text())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Variance Ratio Testing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Variance ratio tests are not usually used as unit root tests, and are instead used for testing whether a financial return series is a pure random walk versus having some predictability. This example uses the excess return on the market from Ken French's data. " ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Mkt-RF SMB HML RF\n", "count 1109.000000 1109.000000 1109.000000 1109.000000\n", "mean 0.659946 0.206555 0.368864 0.274220\n", "std 5.327524 3.191132 3.482352 0.253377\n", "min -29.130000 -16.870000 -13.280000 -0.060000\n", "25% -1.970000 -1.560000 -1.320000 0.030000\n", "50% 1.020000 0.070000 0.140000 0.230000\n", "75% 3.610000 1.730000 1.740000 0.430000\n", "max 38.850000 36.700000 35.460000 1.350000\n" ] } ], "source": [ "import arch.data.frenchdata\n", "import numpy as np\n", "import pandas as pd\n", "\n", "ff = arch.data.frenchdata.load()\n", "excess_market = ff.iloc[:, 0] # Excess Market\n", "print(ff.describe())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The variance ratio compares the variance of a 1-period return to that of a multi-period return. The comparison length has to be set when initializing the test. \n", "\n", "This example compares 1-month to 12-month returns, and the null that the series is a pure random walk is rejected. Negative values indicate some positive autocorrelation in the returns (momentum)." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Variance-Ratio Test Results \n", "=====================================\n", "Test Statistic -5.029\n", "P-value 0.000\n", "Lags 12\n", "-------------------------------------\n", "\n", "Computed with overlapping blocks (de-biased)\n" ] } ], "source": [ "from arch.unitroot import VarianceRatio\n", "\n", "vr = VarianceRatio(excess_market, 12)\n", "print(vr.summary().as_text())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default the VR test uses all overlapping blocks to estimate the variance of the long period's return. This can be changed by setting `overlap=False`. This lowers the power but does not change the conclusion." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Variance-Ratio Test Results \n", "=====================================\n", "Test Statistic -6.206\n", "P-value 0.000\n", "Lags 12\n", "-------------------------------------\n", "\n", "Computed with non-overlapping blocks\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "c:\\git\\arch\\arch\\unitroot\\unitroot.py:1679: InvalidLengthWarning: \n", "The length of y is not an exact multiple of 12, and so the final\n", "4 observations have been dropped.\n", "\n", " warnings.warn(\n" ] } ], "source": [ "warnings.simplefilter(\"always\") # Restore warnings\n", "\n", "vr = VarianceRatio(excess_market, 12, overlap=False)\n", "print(vr.summary().as_text())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: The warning is intentional. It appears here since when it is not possible to use all data since the data length is not an integer multiple of the long period when using non-overlapping blocks. There is little reason to use `overlap=False`." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 4 }