Generalized Estimating Equations

Generalized Estimating Equations estimate generalized linear models for panel, cluster or repeated measures data when the observations are possibly correlated withing a cluster but uncorrelated across clusters. It supports estimation of the same one-parameter exponential families as Generalized Linear models (GLM).

See Module Reference for commands and arguments.

Examples

The following illustrates a Poisson regression with exchangeable correlation within clusters using data on epilepsy seizures.

In [1]: import statsmodels.api as sm

ImportErrorTraceback (most recent call last)
<ipython-input-1-085740203b77> in <module>()
----> 1 import statsmodels.api as sm

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/api.py in <module>()
      5 from . import regression
      6 from .regression.linear_model import OLS, GLS, WLS, GLSAR
----> 7 from .regression.recursive_ls import RecursiveLS
      8 from .regression.quantile_regression import QuantReg
      9 from .regression.mixed_linear_model import MixedLM

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/regression/recursive_ls.py in <module>()
     14 from statsmodels.regression.linear_model import OLS
     15 from statsmodels.tools.data import _is_using_pandas
---> 16 from statsmodels.tsa.statespace.mlemodel import (
     17     MLEModel, MLEResults, MLEResultsWrapper)
     18 from statsmodels.tools.tools import Bunch

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/mlemodel.py in <module>()
     16 from scipy.stats import norm
     17 
---> 18 from .simulation_smoother import SimulationSmoother
     19 from .kalman_smoother import SmootherResults
     20 from .kalman_filter import (INVERT_UNIVARIATE, SOLVE_LU)

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/simulation_smoother.py in <module>()
      8 
      9 import numpy as np
---> 10 from .kalman_smoother import KalmanSmoother
     11 from . import tools
     12 

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/kalman_smoother.py in <module>()
      9 import numpy as np
     10 
---> 11 from statsmodels.tsa.statespace.representation import OptionWrapper
     12 from statsmodels.tsa.statespace.kalman_filter import (KalmanFilter,
     13                                                       FilterResults)

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/representation.py in <module>()
      8 
      9 import numpy as np
---> 10 from .tools import (
     11     find_best_blas_type, validate_matrix_shape, validate_vector_shape
     12 )

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/tools.py in <module>()
    205             'z': _statespace.zcopy_index_vector
    206         })
--> 207 set_mode(compatibility=None)
    208 
    209 

/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/tools.py in set_mode(compatibility)
     57     if not compatibility:
     58         from scipy.linalg import cython_blas
---> 59         from . import (_representation, _kalman_filter, _kalman_smoother,
     60                        _simulation_smoother, _tools)
     61         compatibility_mode = False

ImportError: cannot import name _representation

In [2]: import statsmodels.formula.api as smf

In [3]: data = sm.datasets.get_rdataset('epil', package='MASS').data

NameErrorTraceback (most recent call last)
<ipython-input-3-ab9a3c49f853> in <module>()
----> 1 data = sm.datasets.get_rdataset('epil', package='MASS').data

NameError: name 'sm' is not defined

In [4]: fam = sm.families.Poisson()

NameErrorTraceback (most recent call last)
<ipython-input-4-d62f3586541d> in <module>()
----> 1 fam = sm.families.Poisson()

NameError: name 'sm' is not defined

In [5]: ind = sm.cov_struct.Exchangeable()

NameErrorTraceback (most recent call last)
<ipython-input-5-db939971d45b> in <module>()
----> 1 ind = sm.cov_struct.Exchangeable()

NameError: name 'sm' is not defined

In [6]: mod = smf.gee("y ~ age + trt + base", "subject", data,
   ...:               cov_struct=ind, family=fam)
   ...: 

NameErrorTraceback (most recent call last)
<ipython-input-6-12bac5f45363> in <module>()
----> 1 mod = smf.gee("y ~ age + trt + base", "subject", data,
      2               cov_struct=ind, family=fam)

NameError: name 'data' is not defined

In [7]: res = mod.fit()

NameErrorTraceback (most recent call last)
<ipython-input-7-fa3ccf53f431> in <module>()
----> 1 res = mod.fit()

NameError: name 'mod' is not defined

In [8]: print(res.summary())

NameErrorTraceback (most recent call last)
<ipython-input-8-ba064a039ab1> in <module>()
----> 1 print(res.summary())

NameError: name 'res' is not defined

Several notebook examples of the use of GEE can be found on the Wiki: Wiki notebooks for GEE

References

  • KY Liang and S Zeger. “Longitudinal data analysis using generalized linear models”. Biometrika (1986) 73 (1): 13-22.
  • S Zeger and KY Liang. “Longitudinal Data Analysis for Discrete and Continuous Outcomes”. Biometrics Vol. 42, No. 1 (Mar., 1986), pp. 121-130
  • A Rotnitzky and NP Jewell (1990). “Hypothesis testing of regression parameters in semiparametric generalized linear models for cluster correlated data”, Biometrika, 77, 485-497.
  • Xu Guo and Wei Pan (2002). “Small sample performance of the score test in GEE”. http://www.sph.umn.edu/faculty1/wp-content/uploads/2012/11/rr2002-013.pdf
  • LA Mancl LA, TA DeRouen (2001). A covariance estimator for GEE with improved small-sample properties. Biometrics. 2001 Mar;57(1):126-34.

Module Reference

Model Class

GEE(endog, exog, groups[, time, family, …]) Estimation of marginal regression models using Generalized Estimating Equations (GEE).

Results Classes

GEEResults(model, params, cov_params, scale) This class summarizes the fit of a marginal regression model using GEE.
GEEMargins(results, args[, kwargs]) Estimated marginal effects for a regression model fit with GEE.

Dependence Structures

The dependence structures currently implemented are

CovStruct([cov_nearest_method]) A base class for correlation and covariance structures of grouped data.
Autoregressive([dist_func]) A first-order autoregressive working dependence structure.
Exchangeable() An exchangeable working dependence structure.
GlobalOddsRatio(endog_type) Estimate the global odds ratio for a GEE with ordinal or nominal data.
Independence([cov_nearest_method]) An independence working dependence structure.
Nested([cov_nearest_method]) A nested working dependence structure.

Families

The distribution families are the same as for GLM, currently implemented are

Family(link, variance) The parent class for one-parameter exponential families.
Binomial([link]) Binomial exponential family distribution.
Gamma([link]) Gamma exponential family distribution.
Gaussian([link]) Gaussian exponential family distribution.
InverseGaussian([link]) InverseGaussian exponential family.
NegativeBinomial([link, alpha]) Negative Binomial exponential family.
Poisson([link]) Poisson exponential family.