Generalized Estimating Equations¶
Generalized Estimating Equations estimate generalized linear models for panel, cluster or repeated measures data when the observations are possibly correlated withing a cluster but uncorrelated across clusters. It supports estimation of the same one-parameter exponential families as Generalized Linear models (GLM).
See Module Reference for commands and arguments.
Examples¶
The following illustrates a Poisson regression with exchangeable correlation within clusters using data on epilepsy seizures.
In [1]: import statsmodels.api as sm
ImportErrorTraceback (most recent call last)
<ipython-input-1-085740203b77> in <module>()
----> 1 import statsmodels.api as sm
/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/api.py in <module>()
5 from . import regression
6 from .regression.linear_model import OLS, GLS, WLS, GLSAR
----> 7 from .regression.recursive_ls import RecursiveLS
8 from .regression.quantile_regression import QuantReg
9 from .regression.mixed_linear_model import MixedLM
/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/regression/recursive_ls.py in <module>()
14 from statsmodels.regression.linear_model import OLS
15 from statsmodels.tools.data import _is_using_pandas
---> 16 from statsmodels.tsa.statespace.mlemodel import (
17 MLEModel, MLEResults, MLEResultsWrapper)
18 from statsmodels.tools.tools import Bunch
/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/mlemodel.py in <module>()
16 from scipy.stats import norm
17
---> 18 from .simulation_smoother import SimulationSmoother
19 from .kalman_smoother import SmootherResults
20 from .kalman_filter import (INVERT_UNIVARIATE, SOLVE_LU)
/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/simulation_smoother.py in <module>()
8
9 import numpy as np
---> 10 from .kalman_smoother import KalmanSmoother
11 from . import tools
12
/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/kalman_smoother.py in <module>()
9 import numpy as np
10
---> 11 from statsmodels.tsa.statespace.representation import OptionWrapper
12 from statsmodels.tsa.statespace.kalman_filter import (KalmanFilter,
13 FilterResults)
/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/representation.py in <module>()
8
9 import numpy as np
---> 10 from .tools import (
11 find_best_blas_type, validate_matrix_shape, validate_vector_shape
12 )
/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/tools.py in <module>()
205 'z': _statespace.zcopy_index_vector
206 })
--> 207 set_mode(compatibility=None)
208
209
/builddir/build/BUILD/statsmodels-0.9.0/statsmodels/tsa/statespace/tools.py in set_mode(compatibility)
57 if not compatibility:
58 from scipy.linalg import cython_blas
---> 59 from . import (_representation, _kalman_filter, _kalman_smoother,
60 _simulation_smoother, _tools)
61 compatibility_mode = False
ImportError: cannot import name _representation
In [2]: import statsmodels.formula.api as smf
In [3]: data = sm.datasets.get_rdataset('epil', package='MASS').data
NameErrorTraceback (most recent call last)
<ipython-input-3-ab9a3c49f853> in <module>()
----> 1 data = sm.datasets.get_rdataset('epil', package='MASS').data
NameError: name 'sm' is not defined
In [4]: fam = sm.families.Poisson()
NameErrorTraceback (most recent call last)
<ipython-input-4-d62f3586541d> in <module>()
----> 1 fam = sm.families.Poisson()
NameError: name 'sm' is not defined
In [5]: ind = sm.cov_struct.Exchangeable()
NameErrorTraceback (most recent call last)
<ipython-input-5-db939971d45b> in <module>()
----> 1 ind = sm.cov_struct.Exchangeable()
NameError: name 'sm' is not defined
In [6]: mod = smf.gee("y ~ age + trt + base", "subject", data,
...: cov_struct=ind, family=fam)
...:
NameErrorTraceback (most recent call last)
<ipython-input-6-12bac5f45363> in <module>()
----> 1 mod = smf.gee("y ~ age + trt + base", "subject", data,
2 cov_struct=ind, family=fam)
NameError: name 'data' is not defined
In [7]: res = mod.fit()
NameErrorTraceback (most recent call last)
<ipython-input-7-fa3ccf53f431> in <module>()
----> 1 res = mod.fit()
NameError: name 'mod' is not defined
In [8]: print(res.summary())
NameErrorTraceback (most recent call last)
<ipython-input-8-ba064a039ab1> in <module>()
----> 1 print(res.summary())
NameError: name 'res' is not defined
Several notebook examples of the use of GEE can be found on the Wiki: Wiki notebooks for GEE
References¶
- KY Liang and S Zeger. “Longitudinal data analysis using generalized linear models”. Biometrika (1986) 73 (1): 13-22.
- S Zeger and KY Liang. “Longitudinal Data Analysis for Discrete and Continuous Outcomes”. Biometrics Vol. 42, No. 1 (Mar., 1986), pp. 121-130
- A Rotnitzky and NP Jewell (1990). “Hypothesis testing of regression parameters in semiparametric generalized linear models for cluster correlated data”, Biometrika, 77, 485-497.
- Xu Guo and Wei Pan (2002). “Small sample performance of the score test in GEE”. http://www.sph.umn.edu/faculty1/wp-content/uploads/2012/11/rr2002-013.pdf
- LA Mancl LA, TA DeRouen (2001). A covariance estimator for GEE with improved small-sample properties. Biometrics. 2001 Mar;57(1):126-34.
Module Reference¶
Model Class¶
GEE (endog, exog, groups[, time, family, …]) |
Estimation of marginal regression models using Generalized Estimating Equations (GEE). |
Results Classes¶
GEEResults (model, params, cov_params, scale) |
This class summarizes the fit of a marginal regression model using GEE. |
GEEMargins (results, args[, kwargs]) |
Estimated marginal effects for a regression model fit with GEE. |
Dependence Structures¶
The dependence structures currently implemented are
CovStruct ([cov_nearest_method]) |
A base class for correlation and covariance structures of grouped data. |
Autoregressive ([dist_func]) |
A first-order autoregressive working dependence structure. |
Exchangeable () |
An exchangeable working dependence structure. |
GlobalOddsRatio (endog_type) |
Estimate the global odds ratio for a GEE with ordinal or nominal data. |
Independence ([cov_nearest_method]) |
An independence working dependence structure. |
Nested ([cov_nearest_method]) |
A nested working dependence structure. |
Families¶
The distribution families are the same as for GLM, currently implemented are
Family (link, variance) |
The parent class for one-parameter exponential families. |
Binomial ([link]) |
Binomial exponential family distribution. |
Gamma ([link]) |
Gamma exponential family distribution. |
Gaussian ([link]) |
Gaussian exponential family distribution. |
InverseGaussian ([link]) |
InverseGaussian exponential family. |
NegativeBinomial ([link, alpha]) |
Negative Binomial exponential family. |
Poisson ([link]) |
Poisson exponential family. |
Link Functions¶
The link functions are the same as for GLM, currently implemented are the following. Not all link functions are available for each distribution family. The list of available link functions can be obtained by
>>> sm.families.family.<familyname>.links
Link |
A generic link function for one-parameter exponential family. |
CDFLink ([dbn]) |
The use the CDF of a scipy.stats distribution |
CLogLog |
The complementary log-log transform |
Log |
The log transform |
Logit |
The logit transform |
NegativeBinomial ([alpha]) |
The negative binomial link function |
Power ([power]) |
The power transform |
cauchy () |
The Cauchy (standard Cauchy CDF) transform |
cloglog |
The CLogLog transform link function. |
identity () |
The identity transform |
inverse_power () |
The inverse transform |
inverse_squared () |
The inverse squared transform |
log |
The log transform |
logit |
|
nbinom ([alpha]) |
The negative binomial link function. |
probit ([dbn]) |
The probit (standard normal CDF) transform |