Vector Autoregressions tsa.vector_ar

VAR(p) processes

We are interested in modeling a \(T \times K\) multivariate time series \(Y\), where \(T\) denotes the number of observations and \(K\) the number of variables. One way of estimating relationships between the time series and their lagged values is the vector autoregression process:

\[ \begin{align}\begin{aligned}Y_t = A_1 Y_{t-1} + \ldots + A_p Y_{t-p} + u_t\\u_t \sim {\sf Normal}(0, \Sigma_u)\end{aligned}\end{align} \]

where \(A_i\) is a \(K \times K\) coefficient matrix.

We follow in large part the methods and notation of Lutkepohl (2005), which we will not develop here.

Model fitting

Note

The classes referenced below are accessible via the statsmodels.tsa.api module.

To estimate a VAR model, one must first create the model using an ndarray of homogeneous or structured dtype. When using a structured or record array, the class will use the passed variable names. Otherwise they can be passed explicitly:

 # some example data
In [1]: import numpy as np

In [2]: import pandas

In [3]: import statsmodels.api as sm

ImportErrorTraceback (most recent call last)
<ipython-input-3-085740203b77> in <module>()
----> 1 import statsmodels.api as sm

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/api.py in <module>()
      5 from . import regression
      6 from .regression.linear_model import OLS, GLS, WLS, GLSAR
----> 7 from .regression.recursive_ls import RecursiveLS
      8 from .regression.quantile_regression import QuantReg
      9 from .regression.mixed_linear_model import MixedLM

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/regression/recursive_ls.py in <module>()
     14 from statsmodels.regression.linear_model import OLS
     15 from statsmodels.tools.data import _is_using_pandas
---> 16 from statsmodels.tsa.statespace.mlemodel import (
     17     MLEModel, MLEResults, MLEResultsWrapper)
     18 from statsmodels.tools.tools import Bunch

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/mlemodel.py in <module>()
     16 from scipy.stats import norm
     17 
---> 18 from .simulation_smoother import SimulationSmoother
     19 from .kalman_smoother import SmootherResults
     20 from .kalman_filter import (INVERT_UNIVARIATE, SOLVE_LU)

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/simulation_smoother.py in <module>()
      8 
      9 import numpy as np
---> 10 from .kalman_smoother import KalmanSmoother
     11 from . import tools
     12 

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/kalman_smoother.py in <module>()
      9 import numpy as np
     10 
---> 11 from statsmodels.tsa.statespace.representation import OptionWrapper
     12 from statsmodels.tsa.statespace.kalman_filter import (KalmanFilter,
     13                                                       FilterResults)

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/representation.py in <module>()
      8 
      9 import numpy as np
---> 10 from .tools import (
     11     find_best_blas_type, validate_matrix_shape, validate_vector_shape
     12 )

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/tools.py in <module>()
    205             'z': _statespace.zcopy_index_vector
    206         })
--> 207 set_mode(compatibility=None)
    208 
    209 

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/tools.py in set_mode(compatibility)
     57     if not compatibility:
     58         from scipy.linalg import cython_blas
---> 59         from . import (_representation, _kalman_filter, _kalman_smoother,
     60                        _simulation_smoother, _tools)
     61         compatibility_mode = False

ImportError: cannot import name _representation

In [4]: from statsmodels.tsa.api import VAR, DynamicVAR

ImportErrorTraceback (most recent call last)
<ipython-input-4-1efb627027bb> in <module>()
----> 1 from statsmodels.tsa.api import VAR, DynamicVAR

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/api.py in <module>()
----> 1 from .ar_model import AR
      2 from .arima_model import ARMA, ARIMA
      3 from . import vector_ar as var
      4 from .arima_process import arma_generate_sample, ArmaProcess
      5 from .vector_ar.var_model import VAR

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/ar_model.py in <module>()
     15                                           cache_readonly, cache_writable)
     16 from statsmodels.tools.numdiff import approx_fprime, approx_hess
---> 17 from statsmodels.tsa.kalmanf.kalmanfilter import KalmanFilter
     18 import statsmodels.base.wrapper as wrap
     19 from statsmodels.tsa.vector_ar import util

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/kalmanf/__init__.py in <module>()
----> 1 from .kalmanfilter import KalmanFilter

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/kalmanf/kalmanfilter.py in <module>()
     31 from numpy.linalg import inv, pinv
     32 from statsmodels.tools.tools import chain_dot
---> 33 from . import kalman_loglike
     34 
     35 #Fast filtering and smoothing for multivariate state space models

ImportError: cannot import name kalman_loglike

In [5]: mdata = sm.datasets.macrodata.load_pandas().data

NameErrorTraceback (most recent call last)
<ipython-input-5-67753caef3c7> in <module>()
----> 1 mdata = sm.datasets.macrodata.load_pandas().data

NameError: name 'sm' is not defined

 # prepare the dates index
In [6]: dates = mdata[['year', 'quarter']].astype(int).astype(str)

NameErrorTraceback (most recent call last)
<ipython-input-6-a166f233d576> in <module>()
----> 1 dates = mdata[['year', 'quarter']].astype(int).astype(str)

NameError: name 'mdata' is not defined

In [7]: quarterly = dates["year"] + "Q" + dates["quarter"]

NameErrorTraceback (most recent call last)
<ipython-input-7-c012991f378a> in <module>()
----> 1 quarterly = dates["year"] + "Q" + dates["quarter"]

NameError: name 'dates' is not defined

In [8]: from statsmodels.tsa.base.datetools import dates_from_str

In [9]: quarterly = dates_from_str(quarterly)

NameErrorTraceback (most recent call last)
<ipython-input-9-7592f2cc6963> in <module>()
----> 1 quarterly = dates_from_str(quarterly)

NameError: name 'quarterly' is not defined

In [10]: mdata = mdata[['realgdp','realcons','realinv']]

NameErrorTraceback (most recent call last)
<ipython-input-10-3205dd0ed1fe> in <module>()
----> 1 mdata = mdata[['realgdp','realcons','realinv']]

NameError: name 'mdata' is not defined

In [11]: mdata.index = pandas.DatetimeIndex(quarterly)

NameErrorTraceback (most recent call last)
<ipython-input-11-c7b6b3bbf843> in <module>()
----> 1 mdata.index = pandas.DatetimeIndex(quarterly)

NameError: name 'quarterly' is not defined

In [12]: data = np.log(mdata).diff().dropna()

NameErrorTraceback (most recent call last)
<ipython-input-12-b9ff088fc4dd> in <module>()
----> 1 data = np.log(mdata).diff().dropna()

NameError: name 'mdata' is not defined

 # make a VAR model
In [13]: model = VAR(data)

NameErrorTraceback (most recent call last)
<ipython-input-13-86149d72e5b4> in <module>()
----> 1 model = VAR(data)

NameError: name 'VAR' is not defined

Note

The VAR class assumes that the passed time series are stationary. Non-stationary or trending data can often be transformed to be stationary by first-differencing or some other method. For direct analysis of non-stationary time series, a standard stable VAR(p) model is not appropriate.

To actually do the estimation, call the fit method with the desired lag order. Or you can have the model select a lag order based on a standard information criterion (see below):

In [14]: results = model.fit(2)

NameErrorTraceback (most recent call last)
<ipython-input-14-c45e3c7a54be> in <module>()
----> 1 results = model.fit(2)

NameError: name 'model' is not defined

In [15]: results.summary()

NameErrorTraceback (most recent call last)
<ipython-input-15-1da58c3af1a3> in <module>()
----> 1 results.summary()

NameError: name 'results' is not defined

Several ways to visualize the data using matplotlib are available.

Plotting input time series:

In [16]: results.plot()

NameErrorTraceback (most recent call last)
<ipython-input-16-93a479bf92ff> in <module>()
----> 1 results.plot()

NameError: name 'results' is not defined
_images/var_plot_input.png

Plotting time series autocorrelation function:

In [17]: results.plot_acorr()

NameErrorTraceback (most recent call last)
<ipython-input-17-dd7a1bd9f2f9> in <module>()
----> 1 results.plot_acorr()

NameError: name 'results' is not defined
_images/var_plot_acorr.png

Lag order selection

Choice of lag order can be a difficult problem. Standard analysis employs likelihood test or information criteria-based order selection. We have implemented the latter, accessible through the VAR class:

In [18]: model.select_order(15)

NameErrorTraceback (most recent call last)
<ipython-input-18-6f1cc5879977> in <module>()
----> 1 model.select_order(15)

NameError: name 'model' is not defined

When calling the fit function, one can pass a maximum number of lags and the order criterion to use for order selection:

In [19]: results = model.fit(maxlags=15, ic='aic')

NameErrorTraceback (most recent call last)
<ipython-input-19-4b6b93f0aafb> in <module>()
----> 1 results = model.fit(maxlags=15, ic='aic')

NameError: name 'model' is not defined

Forecasting

The linear predictor is the optimal h-step ahead forecast in terms of mean-squared error:

\[y_t(h) = \nu + A_1 y_t(h − 1) + \cdots + A_p y_t(h − p)\]

We can use the forecast function to produce this forecast. Note that we have to specify the “initial value” for the forecast:

In [20]: lag_order = results.k_ar

NameErrorTraceback (most recent call last)
<ipython-input-20-a11c944a667e> in <module>()
----> 1 lag_order = results.k_ar

NameError: name 'results' is not defined

In [21]: results.forecast(data.values[-lag_order:], 5)

NameErrorTraceback (most recent call last)
<ipython-input-21-f83e64f1c23c> in <module>()
----> 1 results.forecast(data.values[-lag_order:], 5)

NameError: name 'results' is not defined

The forecast_interval function will produce the above forecast along with asymptotic standard errors. These can be visualized using the plot_forecast function:

In [22]: results.plot_forecast(10)

NameErrorTraceback (most recent call last)
<ipython-input-22-7a48b6406ef8> in <module>()
----> 1 results.plot_forecast(10)

NameError: name 'results' is not defined
_images/var_forecast.png

Impulse Response Analysis

Impulse responses are of interest in econometric studies: they are the estimated responses to a unit impulse in one of the variables. They are computed in practice using the MA(\(\infty\)) representation of the VAR(p) process:

\[Y_t = \mu + \sum_{i=0}^\infty \Phi_i u_{t-i}\]

We can perform an impulse response analysis by calling the irf function on a VARResults object:

In [23]: irf = results.irf(10)

NameErrorTraceback (most recent call last)
<ipython-input-23-1e11bcc52a5e> in <module>()
----> 1 irf = results.irf(10)

NameError: name 'results' is not defined

These can be visualized using the plot function, in either orthogonalized or non-orthogonalized form. Asymptotic standard errors are plotted by default at the 95% significance level, which can be modified by the user.

Note

Orthogonalization is done using the Cholesky decomposition of the estimated error covariance matrix \(\hat \Sigma_u\) and hence interpretations may change depending on variable ordering.

In [24]: irf.plot(orth=False)

NameErrorTraceback (most recent call last)
<ipython-input-24-713e3eba5933> in <module>()
----> 1 irf.plot(orth=False)

NameError: name 'irf' is not defined
_images/var_irf.png

Note the plot function is flexible and can plot only variables of interest if so desired:

In [25]: irf.plot(impulse='realgdp')

NameErrorTraceback (most recent call last)
<ipython-input-25-e41dfe9e23e0> in <module>()
----> 1 irf.plot(impulse='realgdp')

NameError: name 'irf' is not defined
_images/var_realgdp.png

The cumulative effects \(\Psi_n = \sum_{i=0}^n \Phi_i\) can be plotted with the long run effects as follows:

In [26]: irf.plot_cum_effects(orth=False)

NameErrorTraceback (most recent call last)
<ipython-input-26-bb40f89ba62b> in <module>()
----> 1 irf.plot_cum_effects(orth=False)

NameError: name 'irf' is not defined
_images/var_irf_cum.png

Forecast Error Variance Decomposition (FEVD)

Forecast errors of component j on k in an i-step ahead forecast can be decomposed using the orthogonalized impulse responses \(\Theta_i\):

\[ \begin{align}\begin{aligned}\omega_{jk, i} = \sum_{i=0}^{h-1} (e_j^\prime \Theta_i e_k)^2 / \mathrm{MSE}_j(h)\\\mathrm{MSE}_j(h) = \sum_{i=0}^{h-1} e_j^\prime \Phi_i \Sigma_u \Phi_i^\prime e_j\end{aligned}\end{align} \]

These are computed via the fevd function up through a total number of steps ahead:

In [27]: fevd = results.fevd(5)

NameErrorTraceback (most recent call last)
<ipython-input-27-6e6bb53ef95d> in <module>()
----> 1 fevd = results.fevd(5)

NameError: name 'results' is not defined

In [28]: fevd.summary()

NameErrorTraceback (most recent call last)
<ipython-input-28-46a76c3e73dd> in <module>()
----> 1 fevd.summary()

NameError: name 'fevd' is not defined

They can also be visualized through the returned FEVD object:

In [29]: results.fevd(20).plot()

NameErrorTraceback (most recent call last)
<ipython-input-29-f2dfd44ce8e5> in <module>()
----> 1 results.fevd(20).plot()

NameError: name 'results' is not defined
_images/var_fevd.png

Statistical tests

A number of different methods are provided to carry out hypothesis tests about the model results and also the validity of the model assumptions (normality, whiteness / “iid-ness” of errors, etc.).

Granger causality

One is often interested in whether a variable or group of variables is “causal” for another variable, for some definition of “causal”. In the context of VAR models, one can say that a set of variables are Granger-causal within one of the VAR equations. We will not detail the mathematics or definition of Granger causality, but leave it to the reader. The VARResults object has the test_causality method for performing either a Wald (\(\chi^2\)) test or an F-test.

In [30]: results.test_causality('realgdp', ['realinv', 'realcons'], kind='f')

NameErrorTraceback (most recent call last)
<ipython-input-30-40c8192e32ca> in <module>()
----> 1 results.test_causality('realgdp', ['realinv', 'realcons'], kind='f')

NameError: name 'results' is not defined

Normality

Whiteness of residuals

Dynamic Vector Autoregressions

Note

To use this functionality, pandas must be installed. See the pandas documentation for more information on the below data structures.

One is often interested in estimating a moving-window regression on time series data for the purposes of making forecasts throughout the data sample. For example, we may wish to produce the series of 2-step-ahead forecasts produced by a VAR(p) model estimated at each point in time.

In [31]: np.random.seed(1)

In [32]: import pandas.util.testing as ptest

In [33]: ptest.N = 500

In [34]: data = ptest.makeTimeDataFrame().cumsum(0)

In [35]: data
Out[35]: 
                    A          B          C          D
2000-01-03   1.624345  -1.719394  -0.153236   1.301225
2000-01-04   1.012589  -1.662273  -2.585745   0.988833
2000-01-05   0.484417  -2.461821  -2.077760   0.717604
2000-01-06  -0.588551  -2.753416  -2.401793   2.580517
2000-01-07   0.276856  -3.012398  -3.912869   1.937644
...               ...        ...        ...        ...
2001-11-26  29.552085  14.274036  39.222558 -13.243907
2001-11-27  30.080964  11.996738  38.589968 -12.682989
2001-11-28  27.843878  11.927114  38.380121 -13.604648
2001-11-29  26.736165  12.280984  40.277282 -12.957273
2001-11-30  26.718447  12.094029  38.895890 -11.570447

[500 rows x 4 columns]

In [36]: var = DynamicVAR(data, lag_order=2, window_type='expanding')

NameErrorTraceback (most recent call last)
<ipython-input-36-0fe4ff8cb1b9> in <module>()
----> 1 var = DynamicVAR(data, lag_order=2, window_type='expanding')

NameError: name 'DynamicVAR' is not defined

The estimated coefficients for the dynamic model are returned as a pandas.Panel object, which can allow you to easily examine, for example, all of the model coefficients by equation or by date:

In [37]: import datetime as dt

In [38]: var.coefs

NameErrorTraceback (most recent call last)
<ipython-input-38-8a75285c8630> in <module>()
----> 1 var.coefs

NameError: name 'var' is not defined

 # all estimated coefficients for equation A
In [39]: var.coefs.minor_xs('A').info()

NameErrorTraceback (most recent call last)
<ipython-input-39-1c14d1933400> in <module>()
----> 1 var.coefs.minor_xs('A').info()

NameError: name 'var' is not defined

 # coefficients on 11/30/2001
In [40]: var.coefs.major_xs(dt.datetime(2001, 11, 30)).T

NameErrorTraceback (most recent call last)
<ipython-input-40-18e5c39db92e> in <module>()
----> 1 var.coefs.major_xs(dt.datetime(2001, 11, 30)).T

NameError: name 'var' is not defined

Dynamic forecasts for a given number of steps ahead can be produced using the forecast function and return a pandas.DataMatrix object:

In [41]: var.forecast(2)

NameErrorTraceback (most recent call last)
<ipython-input-41-acfb44261041> in <module>()
----> 1 var.forecast(2)

NameError: name 'var' is not defined

The forecasts can be visualized using plot_forecast:

In [42]: var.plot_forecast(2)

NameErrorTraceback (most recent call last)
<ipython-input-42-41a0ae3429b9> in <module>()
----> 1 var.plot_forecast(2)

NameError: name 'var' is not defined
_images/dvar_forecast.png

Class Reference

var_model.VAR(endog[, exog, dates, freq, …]) Fit VAR(p) process and do lag order selection
var_model.VARProcess(coefs, coefs_exog, sigma_u) Class represents a known VAR(p) process
var_model.VARResults(endog, endog_lagged, …) Estimate VAR(p) process with fixed number of lags
irf.IRAnalysis(model[, P, periods, order, …]) Impulse response analysis class.
var_model.FEVD(model[, P, periods]) Compute and plot Forecast error variance decomposition and asymptotic standard errors
dynamic.DynamicVAR(data[, lag_order, …]) Estimates time-varying vector autoregression (VAR(p)) using equation-by-equation least squares