Linear Regression

Linear models with independently and identically distributed errors, and for errors with heteroscedasticity or autocorrelation. This module allows estimation by ordinary least squares (OLS), weighted least squares (WLS), generalized least squares (GLS), and feasible generalized least squares with autocorrelated AR(p) errors.

See Module Reference for commands and arguments.

Examples

# Load modules and data
In [1]: import numpy as np

In [2]: import statsmodels.api as sm

ImportErrorTraceback (most recent call last)
<ipython-input-2-085740203b77> in <module>()
----> 1 import statsmodels.api as sm

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/api.py in <module>()
      5 from . import regression
      6 from .regression.linear_model import OLS, GLS, WLS, GLSAR
----> 7 from .regression.recursive_ls import RecursiveLS
      8 from .regression.quantile_regression import QuantReg
      9 from .regression.mixed_linear_model import MixedLM

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/regression/recursive_ls.py in <module>()
     14 from statsmodels.regression.linear_model import OLS
     15 from statsmodels.tools.data import _is_using_pandas
---> 16 from statsmodels.tsa.statespace.mlemodel import (
     17     MLEModel, MLEResults, MLEResultsWrapper)
     18 from statsmodels.tools.tools import Bunch

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/mlemodel.py in <module>()
     16 from scipy.stats import norm
     17 
---> 18 from .simulation_smoother import SimulationSmoother
     19 from .kalman_smoother import SmootherResults
     20 from .kalman_filter import (INVERT_UNIVARIATE, SOLVE_LU)

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/simulation_smoother.py in <module>()
      8 
      9 import numpy as np
---> 10 from .kalman_smoother import KalmanSmoother
     11 from . import tools
     12 

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/kalman_smoother.py in <module>()
      9 import numpy as np
     10 
---> 11 from statsmodels.tsa.statespace.representation import OptionWrapper
     12 from statsmodels.tsa.statespace.kalman_filter import (KalmanFilter,
     13                                                       FilterResults)

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/representation.py in <module>()
      8 
      9 import numpy as np
---> 10 from .tools import (
     11     find_best_blas_type, validate_matrix_shape, validate_vector_shape
     12 )

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/tools.py in <module>()
    205             'z': _statespace.zcopy_index_vector
    206         })
--> 207 set_mode(compatibility=None)
    208 
    209 

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/tools.py in set_mode(compatibility)
     57     if not compatibility:
     58         from scipy.linalg import cython_blas
---> 59         from . import (_representation, _kalman_filter, _kalman_smoother,
     60                        _simulation_smoother, _tools)
     61         compatibility_mode = False

ImportError: cannot import name _representation

In [3]: spector_data = sm.datasets.spector.load()

NameErrorTraceback (most recent call last)
<ipython-input-3-c95ffa4a7cfe> in <module>()
----> 1 spector_data = sm.datasets.spector.load()

NameError: name 'sm' is not defined

In [4]: spector_data.exog = sm.add_constant(spector_data.exog, prepend=False)

NameErrorTraceback (most recent call last)
<ipython-input-4-216e259e3277> in <module>()
----> 1 spector_data.exog = sm.add_constant(spector_data.exog, prepend=False)

NameError: name 'sm' is not defined

# Fit and summarize OLS model
In [5]: mod = sm.OLS(spector_data.endog, spector_data.exog)

NameErrorTraceback (most recent call last)
<ipython-input-5-671ded02a868> in <module>()
----> 1 mod = sm.OLS(spector_data.endog, spector_data.exog)

NameError: name 'sm' is not defined

In [6]: res = mod.fit()

NameErrorTraceback (most recent call last)
<ipython-input-6-fa3ccf53f431> in <module>()
----> 1 res = mod.fit()

NameError: name 'mod' is not defined

In [7]: print(res.summary())

NameErrorTraceback (most recent call last)
<ipython-input-7-ba064a039ab1> in <module>()
----> 1 print(res.summary())

NameError: name 'res' is not defined

Detailed examples can be found here:

Technical Documentation

The statistical model is assumed to be

\(Y = X\beta + \mu\), where \(\mu\sim N\left(0,\Sigma\right).\)

Depending on the properties of \(\Sigma\), we have currently four classes available:

  • GLS : generalized least squares for arbitrary covariance \(\Sigma\)
  • OLS : ordinary least squares for i.i.d. errors \(\Sigma=\textbf{I}\)
  • WLS : weighted least squares for heteroskedastic errors \(\text{diag}\left (\Sigma\right)\)
  • GLSAR : feasible generalized least squares with autocorrelated AR(p) errors \(\Sigma=\Sigma\left(\rho\right)\)

All regression models define the same methods and follow the same structure, and can be used in a similar fashion. Some of them contain additional model specific methods and attributes.

GLS is the superclass of the other regression classes except for RecursiveLS.

References

General reference for regression models:

  • D.C. Montgomery and E.A. Peck. “Introduction to Linear Regression Analysis.” 2nd. Ed., Wiley, 1992.

Econometrics references for regression models:

  • R.Davidson and J.G. MacKinnon. “Econometric Theory and Methods,” Oxford, 2004.
  • W.Green. “Econometric Analysis,” 5th ed., Pearson, 2003.

Attributes

The following is more verbose description of the attributes which is mostly common to all regression classes

pinv_wexog : array
The p x n Moore-Penrose pseudoinverse of the whitened design matrix. It is approximately equal to \(\left(X^{T}\Sigma^{-1}X\right)^{-1}X^{T}\Psi\), where \(\Psi\) is defined such that \(\Psi\Psi^{T}=\Sigma^{-1}\).
cholsimgainv : array
The n x n upper triangular matrix \(\Psi^{T}\) that satisfies \(\Psi\Psi^{T}=\Sigma^{-1}\).
df_model : float
The model degrees of freedom. This is equal to p - 1, where p is the number of regressors. Note that the intercept is not counted as using a degree of freedom here.
df_resid : float
The residual degrees of freedom. This is equal n - p where n is the number of observations and p is the number of parameters. Note that the intercept is counted as using a degree of freedom here.
llf : float
The value of the likelihood function of the fitted model.
nobs : float
The number of observations n
normalized_cov_params : array
A p x p array equal to \((X^{T}\Sigma^{-1}X)^{-1}\).
sigma : array
The n x n covariance matrix of the error terms: \(\mu\sim N\left(0,\Sigma\right)\).
wexog : array
The whitened design matrix \(\Psi^{T}X\).
wendog : array
The whitened response variable \(\Psi^{T}Y\).

Module Reference

Model Classes

OLS(endog[, exog, missing, hasconst]) A simple ordinary least squares model.
GLS(endog, exog[, sigma, missing, hasconst]) Generalized least squares model with a general covariance structure.
WLS(endog, exog[, weights, missing, hasconst]) A regression model with diagonal but non-identity covariance structure.
GLSAR(endog[, exog, rho, missing]) A regression model with an AR(p) covariance structure.
yule_walker(X[, order, method, df, inv, demean]) Estimate AR(p) parameters from a sequence X using Yule-Walker equation.
QuantReg(endog, exog, **kwargs) Quantile Regression
RecursiveLS

Results Classes

Fitting a linear regression model returns a results class. OLS has a specific results class with some additional methods compared to the results class of the other linear models.

RegressionResults(model, params[, …]) This class summarizes the fit of a linear regression model.
OLSResults(model, params[, …]) Results class for for an OLS model.
PredictionResults(predicted_mean, …[, df, …])
QuantRegResults(model, params[, …]) Results instance for the QuantReg model
RecursiveLSResults