predict.sarlm {spdep} | R Documentation |
predict.sarlm()
calculates predictions as far as is at present
possible for for spatial simultaneous autoregressive linear
model objects, using Haining's terminology for decomposition into
trend, signal, and noise — see reference.
## S3 method for class 'sarlm' predict(object, newdata = NULL, listw = NULL, zero.policy = NULL, ...) ## S3 method for class 'sarlm.pred' print(x, ...)
object |
|
newdata |
Data frame in which to predict — if NULL, predictions are for the data on which the model was fitted |
listw |
a |
zero.policy |
default NULL, use global option value; if TRUE assign zero to the lagged value of zones without
neighbours, if FALSE (default) assign NA - causing |
x |
the object to be printed |
... |
further arguments passed through |
In the following, the trend is the non-spatial smooth, the signal is the spatial smooth, and the noise is the residual. The fit returned is the sum of the trend and the signal.
The function approaches prediction first by dividing invocations between those with or without newdata. When no newdata is present, the response variable may be reconstructed as the sum of the trend, the signal, and the noise (residuals). Since the values of the response variable are known, their spatial lags are used to calculate signal components (Cressie 1993, p. 564). For the error model, trend = X beta, and signal = lambda W y - lambda W X beta. For the lag and mixed models, trend = X beta, and signal = rho W y.
This approach differs from the design choices made in other software, for example GeoDa, which does not use observations of the response variable, and corresponds to the newdata situation described below.
When however newdata is used for prediction, no observations of the response variable being predicted are available. Consequently, while the trend components are the same, the signal cannot take full account of the spatial smooth. In the error model, the signal is set to zero, since the spatial smooth is expressed in terms of the error: inv(I - lambda W) e.
In the lag model, the signal can be expressed in the following way:
(I - rho W) y = X beta + e
,
y = inv(I - rho W) X beta + inv(I - rho W) e
giving a feasible signal component of:
rho W y = rho W inv(I - rho W) X beta
setting the error term to zero. This also means that predictions of the signal component for lag and mixed models require the inversion of an n-by-n matrix.
Because the outcomes of the spatial smooth on the error term are unobservable, this means that the signal values for newdata are incomplete. In the mixed model, the spatially lagged RHS variables influence both the trend and the signal, so that the root mean square prediction error in the examples below for this case with newdata is smallest, although the model was not the best fit
predict.sarlm()
returns a vector of predictions with two attribute
vectors of trend and signal values with class sarlm.pred
.
print.sarlm.pred
is a print function for this class, printing and
returning a data frame with columns: "fit", "trend" and "signal".
Roger Bivand Roger.Bivand@nhh.no
Haining, R. 1990 Spatial data analysis in the social and environmental sciences, Cambridge: Cambridge University Press, p. 258; Cressie, N. A. C. 1993 Statistics for spatial data, Wiley, New York.
data(oldcol) COL.lag.eig <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, nb2listw(COL.nb)) COL.mix.eig <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, nb2listw(COL.nb), type="mixed") COL.err.eig <- errorsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, nb2listw(COL.nb)) print(p1 <- predict(COL.mix.eig)) print(p2 <- predict(COL.mix.eig, newdata=COL.OLD, listw=nb2listw(COL.nb))) AIC(COL.mix.eig) sqrt(deviance(COL.mix.eig)/length(COL.nb)) sqrt(sum((COL.OLD$CRIME - as.vector(p1))^2)/length(COL.nb)) sqrt(sum((COL.OLD$CRIME - as.vector(p2))^2)/length(COL.nb)) AIC(COL.err.eig) sqrt(deviance(COL.err.eig)/length(COL.nb)) sqrt(sum((COL.OLD$CRIME - as.vector(predict(COL.err.eig)))^2)/length(COL.nb)) sqrt(sum((COL.OLD$CRIME - as.vector(predict(COL.err.eig, newdata=COL.OLD, listw=nb2listw(COL.nb))))^2)/length(COL.nb)) AIC(COL.lag.eig) sqrt(deviance(COL.lag.eig)/length(COL.nb)) sqrt(sum((COL.OLD$CRIME - as.vector(predict(COL.lag.eig)))^2)/length(COL.nb)) sqrt(sum((COL.OLD$CRIME - as.vector(predict(COL.lag.eig, newdata=COL.OLD, listw=nb2listw(COL.nb))))^2)/length(COL.nb))