Vector Autoregressions tsa.vector_ar
¶
VAR(p) processes¶
We are interested in modeling a \(T \times K\) multivariate time series \(Y\), where \(T\) denotes the number of observations and \(K\) the number of variables. One way of estimating relationships between the time series and their lagged values is the vector autoregression process:
where \(A_i\) is a \(K \times K\) coefficient matrix.
We follow in large part the methods and notation of Lutkepohl (2005), which we will not develop here.
Model fitting¶
Note
The classes referenced below are accessible via the
statsmodels.tsa.api
module.
To estimate a VAR model, one must first create the model using an ndarray of homogeneous or structured dtype. When using a structured or record array, the class will use the passed variable names. Otherwise they can be passed explicitly:
# some example data
In [1]: import numpy as np
In [2]: import pandas
In [3]: import statsmodels.api as sm
ImportErrorTraceback (most recent call last)
<ipython-input-3-085740203b77> in <module>()
----> 1 import statsmodels.api as sm
/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/api.py in <module>()
5 from . import regression
6 from .regression.linear_model import OLS, GLS, WLS, GLSAR
----> 7 from .regression.recursive_ls import RecursiveLS
8 from .regression.quantile_regression import QuantReg
9 from .regression.mixed_linear_model import MixedLM
/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/regression/recursive_ls.py in <module>()
14 from statsmodels.regression.linear_model import OLS
15 from statsmodels.tools.data import _is_using_pandas
---> 16 from statsmodels.tsa.statespace.mlemodel import (
17 MLEModel, MLEResults, MLEResultsWrapper)
18 from statsmodels.tools.tools import Bunch
/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/mlemodel.py in <module>()
16 from scipy.stats import norm
17
---> 18 from .simulation_smoother import SimulationSmoother
19 from .kalman_smoother import SmootherResults
20 from .kalman_filter import (INVERT_UNIVARIATE, SOLVE_LU)
/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/simulation_smoother.py in <module>()
8
9 import numpy as np
---> 10 from .kalman_smoother import KalmanSmoother
11 from . import tools
12
/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/kalman_smoother.py in <module>()
9 import numpy as np
10
---> 11 from statsmodels.tsa.statespace.representation import OptionWrapper
12 from statsmodels.tsa.statespace.kalman_filter import (KalmanFilter,
13 FilterResults)
/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/representation.py in <module>()
8
9 import numpy as np
---> 10 from .tools import (
11 find_best_blas_type, validate_matrix_shape, validate_vector_shape
12 )
/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/tools.py in <module>()
205 'z': _statespace.zcopy_index_vector
206 })
--> 207 set_mode(compatibility=None)
208
209
/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/tools.py in set_mode(compatibility)
57 if not compatibility:
58 from scipy.linalg import cython_blas
---> 59 from . import (_representation, _kalman_filter, _kalman_smoother,
60 _simulation_smoother, _tools)
61 compatibility_mode = False
ImportError: cannot import name _representation
In [4]: from statsmodels.tsa.api import VAR, DynamicVAR
ImportErrorTraceback (most recent call last)
<ipython-input-4-1efb627027bb> in <module>()
----> 1 from statsmodels.tsa.api import VAR, DynamicVAR
/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/api.py in <module>()
----> 1 from .ar_model import AR
2 from .arima_model import ARMA, ARIMA
3 from . import vector_ar as var
4 from .arima_process import arma_generate_sample, ArmaProcess
5 from .vector_ar.var_model import VAR
/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/ar_model.py in <module>()
15 cache_readonly, cache_writable)
16 from statsmodels.tools.numdiff import approx_fprime, approx_hess
---> 17 from statsmodels.tsa.kalmanf.kalmanfilter import KalmanFilter
18 import statsmodels.base.wrapper as wrap
19 from statsmodels.tsa.vector_ar import util
/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/kalmanf/__init__.py in <module>()
----> 1 from .kalmanfilter import KalmanFilter
/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/kalmanf/kalmanfilter.py in <module>()
31 from numpy.linalg import inv, pinv
32 from statsmodels.tools.tools import chain_dot
---> 33 from . import kalman_loglike
34
35 #Fast filtering and smoothing for multivariate state space models
ImportError: cannot import name kalman_loglike
In [5]: mdata = sm.datasets.macrodata.load_pandas().data
NameErrorTraceback (most recent call last)
<ipython-input-5-67753caef3c7> in <module>()
----> 1 mdata = sm.datasets.macrodata.load_pandas().data
NameError: name 'sm' is not defined
# prepare the dates index
In [6]: dates = mdata[['year', 'quarter']].astype(int).astype(str)
NameErrorTraceback (most recent call last)
<ipython-input-6-a166f233d576> in <module>()
----> 1 dates = mdata[['year', 'quarter']].astype(int).astype(str)
NameError: name 'mdata' is not defined
In [7]: quarterly = dates["year"] + "Q" + dates["quarter"]
NameErrorTraceback (most recent call last)
<ipython-input-7-c012991f378a> in <module>()
----> 1 quarterly = dates["year"] + "Q" + dates["quarter"]
NameError: name 'dates' is not defined
In [8]: from statsmodels.tsa.base.datetools import dates_from_str
In [9]: quarterly = dates_from_str(quarterly)
NameErrorTraceback (most recent call last)
<ipython-input-9-7592f2cc6963> in <module>()
----> 1 quarterly = dates_from_str(quarterly)
NameError: name 'quarterly' is not defined
In [10]: mdata = mdata[['realgdp','realcons','realinv']]
NameErrorTraceback (most recent call last)
<ipython-input-10-3205dd0ed1fe> in <module>()
----> 1 mdata = mdata[['realgdp','realcons','realinv']]
NameError: name 'mdata' is not defined
In [11]: mdata.index = pandas.DatetimeIndex(quarterly)
NameErrorTraceback (most recent call last)
<ipython-input-11-c7b6b3bbf843> in <module>()
----> 1 mdata.index = pandas.DatetimeIndex(quarterly)
NameError: name 'quarterly' is not defined
In [12]: data = np.log(mdata).diff().dropna()
NameErrorTraceback (most recent call last)
<ipython-input-12-b9ff088fc4dd> in <module>()
----> 1 data = np.log(mdata).diff().dropna()
NameError: name 'mdata' is not defined
# make a VAR model
In [13]: model = VAR(data)
NameErrorTraceback (most recent call last)
<ipython-input-13-86149d72e5b4> in <module>()
----> 1 model = VAR(data)
NameError: name 'VAR' is not defined
Note
The VAR
class assumes that the passed time series are
stationary. Non-stationary or trending data can often be transformed to be
stationary by first-differencing or some other method. For direct analysis of
non-stationary time series, a standard stable VAR(p) model is not
appropriate.
To actually do the estimation, call the fit method with the desired lag order. Or you can have the model select a lag order based on a standard information criterion (see below):
In [14]: results = model.fit(2)
NameErrorTraceback (most recent call last)
<ipython-input-14-c45e3c7a54be> in <module>()
----> 1 results = model.fit(2)
NameError: name 'model' is not defined
In [15]: results.summary()
NameErrorTraceback (most recent call last)
<ipython-input-15-1da58c3af1a3> in <module>()
----> 1 results.summary()
NameError: name 'results' is not defined
Several ways to visualize the data using matplotlib are available.
Plotting input time series:
In [16]: results.plot()
NameErrorTraceback (most recent call last)
<ipython-input-16-93a479bf92ff> in <module>()
----> 1 results.plot()
NameError: name 'results' is not defined

Plotting time series autocorrelation function:
In [17]: results.plot_acorr()
NameErrorTraceback (most recent call last)
<ipython-input-17-dd7a1bd9f2f9> in <module>()
----> 1 results.plot_acorr()
NameError: name 'results' is not defined

Lag order selection¶
Choice of lag order can be a difficult problem. Standard analysis employs
likelihood test or information criteria-based order selection. We have
implemented the latter, accessible through the VAR
class:
In [18]: model.select_order(15)
NameErrorTraceback (most recent call last)
<ipython-input-18-6f1cc5879977> in <module>()
----> 1 model.select_order(15)
NameError: name 'model' is not defined
When calling the fit function, one can pass a maximum number of lags and the order criterion to use for order selection:
In [19]: results = model.fit(maxlags=15, ic='aic')
NameErrorTraceback (most recent call last)
<ipython-input-19-4b6b93f0aafb> in <module>()
----> 1 results = model.fit(maxlags=15, ic='aic')
NameError: name 'model' is not defined
Forecasting¶
The linear predictor is the optimal h-step ahead forecast in terms of mean-squared error:
We can use the forecast function to produce this forecast. Note that we have to specify the “initial value” for the forecast:
In [20]: lag_order = results.k_ar
NameErrorTraceback (most recent call last)
<ipython-input-20-a11c944a667e> in <module>()
----> 1 lag_order = results.k_ar
NameError: name 'results' is not defined
In [21]: results.forecast(data.values[-lag_order:], 5)
NameErrorTraceback (most recent call last)
<ipython-input-21-f83e64f1c23c> in <module>()
----> 1 results.forecast(data.values[-lag_order:], 5)
NameError: name 'results' is not defined
The forecast_interval function will produce the above forecast along with asymptotic standard errors. These can be visualized using the plot_forecast function:
In [22]: results.plot_forecast(10)
NameErrorTraceback (most recent call last)
<ipython-input-22-7a48b6406ef8> in <module>()
----> 1 results.plot_forecast(10)
NameError: name 'results' is not defined

Impulse Response Analysis¶
Impulse responses are of interest in econometric studies: they are the estimated responses to a unit impulse in one of the variables. They are computed in practice using the MA(\(\infty\)) representation of the VAR(p) process:
We can perform an impulse response analysis by calling the irf function on a VARResults object:
In [23]: irf = results.irf(10)
NameErrorTraceback (most recent call last)
<ipython-input-23-1e11bcc52a5e> in <module>()
----> 1 irf = results.irf(10)
NameError: name 'results' is not defined
These can be visualized using the plot function, in either orthogonalized or non-orthogonalized form. Asymptotic standard errors are plotted by default at the 95% significance level, which can be modified by the user.
Note
Orthogonalization is done using the Cholesky decomposition of the estimated error covariance matrix \(\hat \Sigma_u\) and hence interpretations may change depending on variable ordering.
In [24]: irf.plot(orth=False)
NameErrorTraceback (most recent call last)
<ipython-input-24-713e3eba5933> in <module>()
----> 1 irf.plot(orth=False)
NameError: name 'irf' is not defined

Note the plot function is flexible and can plot only variables of interest if so desired:
In [25]: irf.plot(impulse='realgdp')
NameErrorTraceback (most recent call last)
<ipython-input-25-e41dfe9e23e0> in <module>()
----> 1 irf.plot(impulse='realgdp')
NameError: name 'irf' is not defined

The cumulative effects \(\Psi_n = \sum_{i=0}^n \Phi_i\) can be plotted with the long run effects as follows:
In [26]: irf.plot_cum_effects(orth=False)
NameErrorTraceback (most recent call last)
<ipython-input-26-bb40f89ba62b> in <module>()
----> 1 irf.plot_cum_effects(orth=False)
NameError: name 'irf' is not defined

Forecast Error Variance Decomposition (FEVD)¶
Forecast errors of component j on k in an i-step ahead forecast can be decomposed using the orthogonalized impulse responses \(\Theta_i\):
These are computed via the fevd function up through a total number of steps ahead:
In [27]: fevd = results.fevd(5)
NameErrorTraceback (most recent call last)
<ipython-input-27-6e6bb53ef95d> in <module>()
----> 1 fevd = results.fevd(5)
NameError: name 'results' is not defined
In [28]: fevd.summary()
NameErrorTraceback (most recent call last)
<ipython-input-28-46a76c3e73dd> in <module>()
----> 1 fevd.summary()
NameError: name 'fevd' is not defined
They can also be visualized through the returned FEVD
object:
In [29]: results.fevd(20).plot()
NameErrorTraceback (most recent call last)
<ipython-input-29-f2dfd44ce8e5> in <module>()
----> 1 results.fevd(20).plot()
NameError: name 'results' is not defined

Statistical tests¶
A number of different methods are provided to carry out hypothesis tests about the model results and also the validity of the model assumptions (normality, whiteness / “iid-ness” of errors, etc.).
Granger causality¶
One is often interested in whether a variable or group of variables is “causal”
for another variable, for some definition of “causal”. In the context of VAR
models, one can say that a set of variables are Granger-causal within one of the
VAR equations. We will not detail the mathematics or definition of Granger
causality, but leave it to the reader. The VARResults
object has the
test_causality method for performing either a Wald (\(\chi^2\)) test or an
F-test.
In [30]: results.test_causality('realgdp', ['realinv', 'realcons'], kind='f')
NameErrorTraceback (most recent call last)
<ipython-input-30-40c8192e32ca> in <module>()
----> 1 results.test_causality('realgdp', ['realinv', 'realcons'], kind='f')
NameError: name 'results' is not defined
Normality¶
Whiteness of residuals¶
Dynamic Vector Autoregressions¶
Note
To use this functionality, pandas must be installed. See the pandas documentation for more information on the below data structures.
One is often interested in estimating a moving-window regression on time series data for the purposes of making forecasts throughout the data sample. For example, we may wish to produce the series of 2-step-ahead forecasts produced by a VAR(p) model estimated at each point in time.
In [31]: np.random.seed(1)
In [32]: import pandas.util.testing as ptest
In [33]: ptest.N = 500
In [34]: data = ptest.makeTimeDataFrame().cumsum(0)
In [35]: data
Out[35]:
A B C D
2000-01-03 1.624345 -1.719394 -0.153236 1.301225
2000-01-04 1.012589 -1.662273 -2.585745 0.988833
2000-01-05 0.484417 -2.461821 -2.077760 0.717604
2000-01-06 -0.588551 -2.753416 -2.401793 2.580517
2000-01-07 0.276856 -3.012398 -3.912869 1.937644
... ... ... ... ...
2001-11-26 29.552085 14.274036 39.222558 -13.243907
2001-11-27 30.080964 11.996738 38.589968 -12.682989
2001-11-28 27.843878 11.927114 38.380121 -13.604648
2001-11-29 26.736165 12.280984 40.277282 -12.957273
2001-11-30 26.718447 12.094029 38.895890 -11.570447
[500 rows x 4 columns]
In [36]: var = DynamicVAR(data, lag_order=2, window_type='expanding')
NameErrorTraceback (most recent call last)
<ipython-input-36-0fe4ff8cb1b9> in <module>()
----> 1 var = DynamicVAR(data, lag_order=2, window_type='expanding')
NameError: name 'DynamicVAR' is not defined
The estimated coefficients for the dynamic model are returned as a
pandas.Panel
object, which can allow you to easily examine, for
example, all of the model coefficients by equation or by date:
In [37]: import datetime as dt
In [38]: var.coefs
NameErrorTraceback (most recent call last)
<ipython-input-38-8a75285c8630> in <module>()
----> 1 var.coefs
NameError: name 'var' is not defined
# all estimated coefficients for equation A
In [39]: var.coefs.minor_xs('A').info()
NameErrorTraceback (most recent call last)
<ipython-input-39-1c14d1933400> in <module>()
----> 1 var.coefs.minor_xs('A').info()
NameError: name 'var' is not defined
# coefficients on 11/30/2001
In [40]: var.coefs.major_xs(dt.datetime(2001, 11, 30)).T
NameErrorTraceback (most recent call last)
<ipython-input-40-18e5c39db92e> in <module>()
----> 1 var.coefs.major_xs(dt.datetime(2001, 11, 30)).T
NameError: name 'var' is not defined
Dynamic forecasts for a given number of steps ahead can be produced using the
forecast function and return a pandas.DataMatrix
object:
In [41]: var.forecast(2)
NameErrorTraceback (most recent call last)
<ipython-input-41-acfb44261041> in <module>()
----> 1 var.forecast(2)
NameError: name 'var' is not defined
The forecasts can be visualized using plot_forecast:
In [42]: var.plot_forecast(2)
NameErrorTraceback (most recent call last)
<ipython-input-42-41a0ae3429b9> in <module>()
----> 1 var.plot_forecast(2)
NameError: name 'var' is not defined

Class Reference¶
var_model.VAR (endog[, exog, dates, freq, …]) |
Fit VAR(p) process and do lag order selection |
var_model.VARProcess (coefs, coefs_exog, sigma_u) |
Class represents a known VAR(p) process |
var_model.VARResults (endog, endog_lagged, …) |
Estimate VAR(p) process with fixed number of lags |
irf.IRAnalysis (model[, P, periods, order, …]) |
Impulse response analysis class. |
var_model.FEVD (model[, P, periods]) |
Compute and plot Forecast error variance decomposition and asymptotic standard errors |
dynamic.DynamicVAR (data[, lag_order, …]) |
Estimates time-varying vector autoregression (VAR(p)) using equation-by-equation least squares |