Generalized Linear Models

Generalized linear models currently supports estimation using the one-parameter exponential families.

See Module Reference for commands and arguments.

Examples

# Load modules and data
In [1]: import statsmodels.api as sm

ImportErrorTraceback (most recent call last)
<ipython-input-1-085740203b77> in <module>()
----> 1 import statsmodels.api as sm

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/api.py in <module>()
      5 from . import regression
      6 from .regression.linear_model import OLS, GLS, WLS, GLSAR
----> 7 from .regression.recursive_ls import RecursiveLS
      8 from .regression.quantile_regression import QuantReg
      9 from .regression.mixed_linear_model import MixedLM

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/regression/recursive_ls.py in <module>()
     14 from statsmodels.regression.linear_model import OLS
     15 from statsmodels.tools.data import _is_using_pandas
---> 16 from statsmodels.tsa.statespace.mlemodel import (
     17     MLEModel, MLEResults, MLEResultsWrapper)
     18 from statsmodels.tools.tools import Bunch

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/mlemodel.py in <module>()
     16 from scipy.stats import norm
     17 
---> 18 from .simulation_smoother import SimulationSmoother
     19 from .kalman_smoother import SmootherResults
     20 from .kalman_filter import (INVERT_UNIVARIATE, SOLVE_LU)

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/simulation_smoother.py in <module>()
      8 
      9 import numpy as np
---> 10 from .kalman_smoother import KalmanSmoother
     11 from . import tools
     12 

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/kalman_smoother.py in <module>()
      9 import numpy as np
     10 
---> 11 from statsmodels.tsa.statespace.representation import OptionWrapper
     12 from statsmodels.tsa.statespace.kalman_filter import (KalmanFilter,
     13                                                       FilterResults)

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/representation.py in <module>()
      8 
      9 import numpy as np
---> 10 from .tools import (
     11     find_best_blas_type, validate_matrix_shape, validate_vector_shape
     12 )

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/tools.py in <module>()
    205             'z': _statespace.zcopy_index_vector
    206         })
--> 207 set_mode(compatibility=None)
    208 
    209 

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/tools.py in set_mode(compatibility)
     57     if not compatibility:
     58         from scipy.linalg import cython_blas
---> 59         from . import (_representation, _kalman_filter, _kalman_smoother,
     60                        _simulation_smoother, _tools)
     61         compatibility_mode = False

ImportError: cannot import name _representation

In [2]: data = sm.datasets.scotland.load()

NameErrorTraceback (most recent call last)
<ipython-input-2-d63e41f1869e> in <module>()
----> 1 data = sm.datasets.scotland.load()

NameError: name 'sm' is not defined

In [3]: data.exog = sm.add_constant(data.exog)

NameErrorTraceback (most recent call last)
<ipython-input-3-d96db36c0463> in <module>()
----> 1 data.exog = sm.add_constant(data.exog)

NameError: name 'sm' is not defined

# Instantiate a gamma family model with the default link function.
In [4]: gamma_model = sm.GLM(data.endog, data.exog, family=sm.families.Gamma())

NameErrorTraceback (most recent call last)
<ipython-input-4-fb16331b4644> in <module>()
----> 1 gamma_model = sm.GLM(data.endog, data.exog, family=sm.families.Gamma())

NameError: name 'sm' is not defined

In [5]: gamma_results = gamma_model.fit()

NameErrorTraceback (most recent call last)
<ipython-input-5-e01981823026> in <module>()
----> 1 gamma_results = gamma_model.fit()

NameError: name 'gamma_model' is not defined

In [6]: print(gamma_results.summary())

NameErrorTraceback (most recent call last)
<ipython-input-6-a26dd2945d5f> in <module>()
----> 1 print(gamma_results.summary())

NameError: name 'gamma_results' is not defined

Detailed examples can be found here:

Technical Documentation

The statistical model for each observation \(i\) is assumed to be

\(Y_i \sim F_{EDM}(\cdot|\theta,\phi,w_i)\) and \(\mu_i = E[Y_i|x_i] = g^{-1}(x_i^\prime\beta)\).

where \(g\) is the link function and \(F_{EDM}(\cdot|\theta,\phi,w)\) is a distribution of the family of exponential dispersion models (EDM) with natural parameter \(\theta\), scale parameter \(\phi\) and weight \(w\). Its density is given by

\(f_{EDM}(y|\theta,\phi,w) = c(y,\phi,w) \exp\left(\frac{y\theta-b(\theta)}{\phi}w\right)\,.\)

It follows that \(\mu = b'(\theta)\) and \(Var[Y|x]=\frac{\phi}{w}b''(\theta)\). The inverse of the first equation gives the natural parameter as a function of the expected value \(\theta(\mu)\) such that

\(Var[Y_i|x_i] = \frac{\phi}{w_i} v(\mu_i)\)

with \(v(\mu) = b''(\theta(\mu))\). Therefore it is said that a GLM is determined by link function \(g\) and variance function \(v(\mu)\) alone (and \(x\) of course).

Note that while \(\phi\) is the same for every observation \(y_i\) and therefore does not influence the estimation of \(\beta\), the weights \(w_i\) might be different for every \(y_i\) such that the estimation of \(\beta\) depends on them.

Distribution Domain \(\mu=E[Y|x]\) \(v(\mu)\) \(\theta(\mu)\) \(b(\theta)\) \(\phi\)
Binomial \(B(n,p)\) \(0,1,\ldots,n\) \(np\) \(\mu-\frac{\mu^2}{n}\) \(\log\frac{p}{1-p}\) \(n\log(1+e^\theta)\) 1
Poisson \(P(\mu)\) \(0,1,\ldots,\infty\) \(\mu\) \(\mu\) \(\log(\mu)\) \(e^\theta\) 1
Neg. Binom. \(NB(\mu,\alpha)\) \(0,1,\ldots,\infty\) \(\mu\) \(\mu+\alpha\mu^2\) \(\log(\frac{\alpha\mu}{1+\alpha\mu})\) \(-\frac{1}{\alpha}\log(1-\alpha e^\theta)\) 1
Gaussian/Normal \(N(\mu,\sigma^2)\) \((-\infty,\infty)\) \(\mu\) \(1\) \(\mu\) \(\frac{1}{2}\theta^2\) \(\sigma^2\)
Gamma \(N(\mu,\nu)\) \((0,\infty)\) \(\mu\) \(\mu^2\) \(-\frac{1}{\mu}\) \(-\log(-\theta)\) \(\frac{1}{\nu}\)
Inv. Gauss. \(IG(\mu,\sigma^2)\) \((0,\infty)\) \(\mu\) \(\mu^3\) \(-\frac{1}{2\mu^2}\) \(-\sqrt{-2\theta}\) \(\sigma^2\)
Tweedie \(p\geq 1\) depends on \(p\) \(\mu\) \(\mu^p\) \(\frac{\mu^{1-p}}{1-p}\) \(\frac{\alpha-1}{\alpha}\left(\frac{\theta}{\alpha-1}\right)^{\alpha}\) \(\phi\)

The Tweedie distribution has special cases for \(p=0,1,2\) not listed in the table and uses \(\alpha=\frac{p-2}{p-1}\).

Correspondence of mathematical variables to code:

  • \(Y\) and \(y\) are coded as endog, the variable one wants to model
  • \(x\) is coded as exog, the covariates alias explanatory variables
  • \(\beta\) is coded as params, the parameters one wants to estimate
  • \(\mu\) is coded as mu, the expectation (conditional on \(x\)) of \(Y\)
  • \(g\) is coded as link argument to the class Family
  • \(\phi\) is coded as scale, the dispersion parameter of the EDM
  • \(w\) is not yet supported (i.e. \(w=1\)), in the future it might be var_weights
  • \(p\) is coded as var_power for the power of the variance function \(v(\mu)\) of the Tweedie distribution, see table
  • \(\alpha\) is either
    • Negative Binomial: the ancillary parameter alpha, see table
    • Tweedie: an abbreviation for \(\frac{p-2}{p-1}\) of the power \(p\) of the variance function, see table

References

  • Gill, Jeff. 2000. Generalized Linear Models: A Unified Approach. SAGE QASS Series.
  • Green, PJ. 1984. “Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives.” Journal of the Royal Statistical Society, Series B, 46, 149-192.
  • Hardin, J.W. and Hilbe, J.M. 2007. “Generalized Linear Models and Extensions.” 2nd ed. Stata Press, College Station, TX.
  • McCullagh, P. and Nelder, J.A. 1989. “Generalized Linear Models.” 2nd ed. Chapman & Hall, Boca Rotan.

Module Reference

Model Class

GLM(endog, exog[, family, offset, exposure, …]) Generalized Linear Models class

Results Class

GLMResults(model, params, …[, cov_type, …]) Class to contain GLM results.
PredictionResults(predicted_mean, var_pred_mean)

Families

The distribution families currently implemented are

Family(link, variance) The parent class for one-parameter exponential families.
Binomial([link]) Binomial exponential family distribution.
Gamma([link]) Gamma exponential family distribution.
Gaussian([link]) Gaussian exponential family distribution.
InverseGaussian([link]) InverseGaussian exponential family.
NegativeBinomial([link, alpha]) Negative Binomial exponential family.
Poisson([link]) Poisson exponential family.
Tweedie([link, var_power]) Tweedie family.