cumulative {VGAM} | R Documentation |
Fits a cumulative link regression model to a (preferably ordered) factor response.
cumulative(link = "logit", earg = list(), parallel = FALSE, reverse = FALSE, mv = FALSE, intercept.apply = FALSE)
link |
Link function applied to the J cumulative probabilities.
See |
earg |
List. Extra argument for the link function.
See |
parallel |
A logical or formula specifying which terms have equal/unequal coefficients. See below for more information about the parallelism assumption. |
reverse |
Logical.
By default, the cumulative probabilities used are
P(Y<=1), P(Y<=2),
..., P(Y<=J).
If This should be set to |
mv |
Logical.
Multivariate response? If |
intercept.apply |
Logical.
Whether the |
In this help file the response Y is assumed to be a factor
with ordered values 1,2,…,J+1.
Hence M is the number of linear/additive predictors
eta_j;
for cumulative()
one has M=J.
This VGAM family function fits the class of cumulative link models to (hopefully) an ordinal response. By default, the non-parallel cumulative logit model is fitted, i.e.,
eta_j = logit(P[Y<=j])
where j=1,2,…,M and
the eta_j are not constrained to be parallel.
This is also known as the non-proportional odds model.
If the logit link is replaced by a complementary log-log link
(cloglog
) then
this is known as the proportional-hazards model.
In almost all the literature, the constraint matrices associated
with this family of models are known. For example, setting
parallel = TRUE
will make all constraint matrices
(except for the intercept) equal to a vector of M 1's.
If the constraint matrices are equal, unknown and to be estimated, then
this can be achieved by fitting the model as a
reduced-rank vector generalized
linear model (RR-VGLM; see rrvglm
).
Currently, reduced-rank vector generalized additive models
(RR-VGAMs) have not been implemented here.
An object of class "vglmff"
(see vglmff-class
).
The object is used by modelling functions such as vglm
,
and vgam
.
No check is made to verify that the response is ordinal;
see ordered
.
The response should be either a matrix of counts (with row sums that
are all positive), or a factor. In both cases, the y
slot
returned by vglm
/vgam
/rrvglm
is the matrix
of counts.
The formula must contain an intercept term.
Other VGAM family functions for an ordinal response include
acat
,
cratio
,
sratio
.
For a nominal (unordered) factor response, the multinomial
logit model (multinomial
) is more appropriate.
With the logit link, setting parallel = TRUE
will fit a
proportional odds model. Note that the TRUE
here does
not apply to the intercept term.
In practice, the validity of the proportional odds assumption
needs to be checked, e.g., by a likelihood ratio test (LRT).
If acceptable on the data,
then numerical problems are less likely to occur during the fitting,
and there are less parameters. Numerical problems occur when
the linear/additive predictors cross, which results in probabilities
outside of (0,1); setting parallel=TRUE
will help avoid
this problem.
Here is an example of the usage of the parallel
argument.
If there are covariates x2
, x3
and x4
, then
parallel = TRUE ~ x2 + x3 -1
and
parallel = FALSE ~ x4
are equivalent. This would constrain
the regression coefficients for x2
and x3
to be
equal; those of the intercepts and x4
would be different.
If the data is inputted in long format
(not wide format, as in pneumo
below)
and the self-starting initial values are not good enough then
try using
mustart
,
coefstart
and/or
etatstart
.
See the example below.
To fit the proportional odds model one can use the
VGAM family function propodds
.
Note that propodds(reverse)
is equivalent to
cumulative(parallel=TRUE, reverse=reverse)
(which is equivalent to
cumulative(parallel=TRUE, reverse=reverse, link="logit")
).
It is for convenience only. A call to cumulative()
is preferred
since it reminds the user that a parallelism assumption is made, as
well as being a lot more flexible.
Thomas W. Yee
Agresti, A. (2002) Categorical Data Analysis, 2nd ed. New York: Wiley.
Agresti, A. (2010) Analysis of Ordinal Categorical Data, 2nd ed. New York: Wiley.
Dobson, A. J. and Barnett, A. (2008) An Introduction to Generalized Linear Models, 3rd ed. Boca Raton: Chapman & Hall/CRC Press.
McCullagh, P. and Nelder, J. A. (1989) Generalized Linear Models, 2nd ed. London: Chapman & Hall.
Simonoff, J. S. (2003) Analyzing Categorical Data, New York: Springer-Verlag.
Yee, T. W. (2010) The VGAM package for categorical data analysis. Journal of Statistical Software, 32, 1–34. http://www.jstatsoft.org/v32/i10/.
Yee, T. W. and Wild, C. J. (1996) Vector generalized additive models. Journal of the Royal Statistical Society, Series B, Methodological, 58, 481–493.
Further information and examples on categorical data analysis by the VGAM package can be found at http://www.stat.auckland.ac.nz/~yee/VGAM/doc/categorical.pdf.
propodds
,
prplot
,
margeff
,
acat
,
cratio
,
sratio
,
multinomial
,
pneumo
,
Links
,
logit
,
probit
,
cloglog
,
cauchit
,
golf
,
polf
,
nbolf
,
logistic1
.
# Fit the proportional odds model, p.179, in McCullagh and Nelder (1989) pneumo = transform(pneumo, let = log(exposure.time)) (fit = vglm(cbind(normal, mild, severe) ~ let, cumulative(parallel = TRUE, reverse = TRUE), pneumo)) depvar(fit) # Sample proportions (good technique) fit@y # Sample proportions (bad technique) weights(fit, type = "prior") # Number of observations coef(fit, matrix = TRUE) constraints(fit) # Constraint matrices # Check that the model is linear in let ---------------------- fit2 = vgam(cbind(normal, mild, severe) ~ s(let, df = 2), cumulative(reverse = TRUE), pneumo) ## Not run: plot(fit2, se = TRUE, overlay = TRUE, lcol = 1:2, scol = 1:2) # Check the proportional odds assumption with a LRT ---------- (fit3 = vglm(cbind(normal, mild, severe) ~ let, cumulative(parallel=FALSE, reverse=TRUE), pneumo)) pchisq(2*(logLik(fit3)-logLik(fit)), df = length(coef(fit3))-length(coef(fit)), lower.tail = FALSE) # A factor() version of fit ---------------------------------- # This is in long format (cf. wide format above) nobs = round(fit@y * c(weights(fit, type = "prior"))) sumnobs = colSums(nobs) # apply(nobs, 2, sum) pneumo.long = data.frame(symptoms = ordered(rep(rep(colnames(nobs), nrow(nobs)), times = c(t(nobs))), levels = colnames(nobs)), let = rep(rep(with(pneumo, let), each = ncol(nobs)), times = c(t(nobs)))) with(pneumo.long, table(let, symptoms)) # check it; should be same as pneumo (fit.long1 = vglm(symptoms ~ let, data = pneumo.long, cumulative(parallel=TRUE, reverse=TRUE), trace=TRUE)) coef(fit.long1, matrix=TRUE) # Should be same as coef(fit, matrix=TRUE) # Could try using mustart if fit.long1 failed to converge. mymustart = matrix(sumnobs / sum(sumnobs), nrow(pneumo.long), ncol(nobs), byrow = TRUE) fit.long2 = vglm(symptoms ~ let, fam = cumulative(parallel = TRUE, reverse = TRUE), mustart = mymustart, data = pneumo.long, trace = TRUE) coef(fit.long2, matrix = TRUE) # Should be same as coef(fit, matrix = TRUE)