Predict {rms} | R Documentation |
Predict
allows the user to easily specify which predictors are to
vary. When the vector of values over which a predictor should vary is
not specified, the
range will be all levels of a categorical predictor or equally-spaced
points between the datadist
"Low:prediction"
and
"High:prediction"
values for the variable (datadist
by
default uses the 10th smallest and 10th largest predictor values in the
dataset). Predicted values are
the linear predictor (X beta), a user-specified transformation of that
scale, or estimated probability of surviving past a fixed single time
point given the linear predictor. Predict
is usually used for
plotting predicted values but there is also a print
method.
When the first argument to Predict
is a fit object created by
bootcov
with coef.reps=TRUE
, confidence limits come from
the stored matrix of bootstrap repetitions of coefficients, using
bootstrap percentile nonparametric confidence limits. Such confidence
intervals do not make distributional assumptions.
There is a plot
method for Predict
objects that makes it
easy to show predicted values and confidence bands.
The rbind
method for Predict
objects allows you to create
separate sets of predictions under different situations and to combine
them into one set for feeding to plot.Predict
. For example you
might want to plot confidence intervals for means and for individuals
using ols
, and have the two types of confidence bands be
superposed onto one plot or placed into two panels. Another use for
rbind
is to combine predictions from quantile regression models
that predicted three different quantiles.
If conf.type="simultaneous"
, simultaneous (over all requested
predictions) confidence limits are computed. See the
predictrms
function for details.
Predict(x, ..., fun, type = c("predictions", "model.frame", "x"), np = 200, conf.int = 0.95, conf.type = c("mean", "individual","simultaneous"), adj.zero = FALSE, ref.zero = FALSE, non.slopes, time = NULL, loglog = FALSE, digits=4, name, factors=NULL) ## S3 method for class 'Predict' print(x, ...) ## S3 method for class 'Predict' rbind(..., rename)
x |
an |
... |
One or more variables to vary, or single-valued adjustment values.
Specify a variable name without an equal sign to use the default
display range, or any range
you choose (e.g. |
fun |
an optional transformation of the linear predictor |
type |
defaults to providing predictions. Set to |
np |
the number of equally-spaced points computed for continuous
predictors that vary, i.e., when the specified value is |
conf.int |
confidence level. Default is 0.95. Specify |
conf.type |
type of confidence interval. Default is |
adj.zero |
Set to |
ref.zero |
Set to |
non.slopes |
This is only useful in a multiple intercept model such as the ordinal
logistic model. There to use to second of three intercepts, for example,
specify |
time |
Specify a single time |
loglog |
Specify |
digits |
Controls how “adjust-to” values are plotted. The default is 4 significant digits. |
name |
Instead of specifying the variables to vary in the
|
factors |
an alternate way of specifying ..., mainly for use by
|
rename |
If you are concatenating predictor sets using |
When there are no intercepts in the fitted model, plot subtracts adjustment values from each factor while computing variances for confidence limits.
Specifying time
will not work for Cox models with time-dependent
covariables. Use survest
or survfit
for that purpose.
a data frame containing all model predictors and the computed values
yhat
, lower
, upper
, the latter two if confidence
intervals were requested. The data frame has an additional
class
"Predict"
. If name
is specified or no
predictors are specified in ..., the resulting data frame has an
additional variable called .predictor.
specifying which
predictor is currently being varied. .predictor.
is handy for
use as a paneling variable in lattice
or ggplot2
graphics.
Frank Harrell
Department of Biostatistics, Vanderbilt University
f.harrell@vanderbilt.edu
datadist
, predictrms
,
contrast.rms
, summary.rms
,
rms
, rms.trans
, survest
,
survplot
, rmsMisc
,
transace
, rbind
, bootcov
n <- 1000 # define sample size set.seed(17) # so can reproduce the results age <- rnorm(n, 50, 10) blood.pressure <- rnorm(n, 120, 15) cholesterol <- rnorm(n, 200, 25) sex <- factor(sample(c('female','male'), n,TRUE)) label(age) <- 'Age' # label is in Hmisc label(cholesterol) <- 'Total Cholesterol' label(blood.pressure) <- 'Systolic Blood Pressure' label(sex) <- 'Sex' units(cholesterol) <- 'mg/dl' # uses units.default in Hmisc units(blood.pressure) <- 'mmHg' # Specify population model for log odds that Y=1 L <- .4*(sex=='male') + .045*(age-50) + (log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male')) # Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)] y <- ifelse(runif(n) < plogis(L), 1, 0) ddist <- datadist(age, blood.pressure, cholesterol, sex) options(datadist='ddist') fit <- lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol,4))) Predict(fit, age, cholesterol, np=4) Predict(fit, age=seq(20,80,by=10), sex, conf.int=FALSE) Predict(fit, age=seq(20,80,by=10), sex='male') # works if datadist not used # Get simultaneous confidence limits accounting for making 7 estimates # Predict(fit, age=seq(20,80,by=10), sex='male', conf.type='simult') # (this needs the multcomp package) ddist$limits$age[2] <- 30 # make 30 the reference value for age # Could also do: ddist$limits["Adjust to","age"] <- 30 fit <- update(fit) # make new reference value take effect Predict(fit, age, ref.zero=TRUE, fun=exp) # Make two curves, and plot the predicted curves as two trellis panels w <- Predict(fit, age, sex) require(lattice) xyplot(yhat ~ age | sex, data=w, type='l') # To add confidence bands we need to use the Hmisc xYplot function in # place of xyplot xYplot(Cbind(yhat,lower,upper) ~ age | sex, data=w, method='filled bands', type='l', col.fill=gray(.95)) # If non-displayed variables were in the model, add a subtitle to show # their settings using title(sub=paste('Adjusted to',attr(w,'info')$adjust),adj=0) # Easier: feed w into plot.Predict ## Not run: # Predictions form a parametric survival model n <- 1000 set.seed(731) age <- 50 + 12*rnorm(n) label(age) <- "Age" sex <- factor(sample(c('Male','Female'), n, rep=TRUE, prob=c(.6, .4))) cens <- 15*runif(n) h <- .02*exp(.04*(age-50)+.8*(sex=='Female')) t <- -log(runif(n))/h label(t) <- 'Follow-up Time' e <- ifelse(t<=cens,1,0) t <- pmin(t, cens) units(t) <- "Year" ddist <- datadist(age, sex) Srv <- Surv(t,e) # Fit log-normal survival model and plot median survival time vs. age f <- psm(Surv(t, e) ~ rcs(age), dist=if(.R.)'lognormal' else 'gaussian') med <- Quantile(f) # Creates function to compute quantiles # (median by default) Predict(f, age, fun=function(x)med(lp=x)) # Note: This works because med() expects the linear predictor (X*beta) # as an argument. Would not work if use # ref.zero=TRUE or adj.zero=TRUE. # Also, confidence intervals from this method are approximate since # they don't take into account estimation of scale parameter # Fit an ols model to log(y) and plot the relationship between x1 # and the predicted mean(y) on the original scale without assuming # normality of residuals; use the smearing estimator. Before doing # that, show confidence intervals for mean and individual log(y), # and for the latter, also show bootstrap percentile nonparametric # pointwise confidence limits set.seed(1) x1 <- runif(300) x2 <- runif(300) ddist <- datadist(x1,x2); options(datadist='ddist') y <- exp(x1+ x2 - 1 + rnorm(300)) f <- ols(log(y) ~ pol(x1,2) + x2, x=TRUE, y=TRUE) # x y for bootcov fb <- bootcov(f, B=100, coef.reps=TRUE) pb <- Predict(fb, x1, x2=c(.25,.75)) p1 <- Predict(f, x1, x2=c(.25,.75)) p <- rbind(normal=p1, boot=pb) plot(p) p1 <- Predict(f, x1, conf.type='mean') p2 <- Predict(f, x1, conf.type='individual') p <- rbind(mean=p1, individual=p2) plot(p, label.curve=FALSE) # uses superposition plot(p, ~x1 | .set.) # 2 panels r <- resid(f) smean <- function(yhat)smearingEst(yhat, exp, res, statistic='mean') formals(smean) <- list(yhat=numeric(0), res=r[!is.na(r)]) #smean$res <- r[!is.na(r)] # define default res argument to function Predict(f, x1, fun=smean) ## End(Not run) options(datadist=NULL)