maxNR {maxLik} | R Documentation |
Unconstrained maximization based on the quadratic approximation (Newton) method. The Newton-Raphson, BFGS (Broyden 1970, Fletcher 1970, Goldfarb 1970, Shanno 1970), and BHHH (Berndt, Hall, Hall, Hausman 1974) methods are available.
maxNR(fn, grad = NULL, hess = NULL, start, print.level = 0, tol = 1e-08, reltol=sqrt(.Machine$double.eps), gradtol = 1e-06, steptol = 1e-10, lambdatol = 1e-06, qrtol = 1e-10, iterlim = 150, constraints = NULL, finalHessian = TRUE, bhhhHessian=FALSE, fixed = NULL, activePar = NULL, ... ) maxBFGSR(fn, grad = NULL, hess = NULL, start, print.level = 0, tol = 1e-8, reltol=sqrt(.Machine$double.eps), gradtol = 1e-6, steptol = 1e-10, lambdatol=1e-6, qrtol=1e-10, iterlim = 150, constraints = NULL, finalHessian = TRUE, fixed = NULL, activePar = NULL, ... ) maxBHHH(fn, grad = NULL, hess = NULL, start, print.level = 0, iterlim = 100, finalHessian = "BHHH", ... )
fn |
function to be maximized.
It must have the parameter vector as the first argument and
it must return either a single number or a numeric vector, which is summed.
If the BHHH method is used and argument
|
grad |
gradient of the objective function.
It must have the parameter vector as the first argument and
it must return either a gradient vector of the objective function,
or a matrix, where columns correspond to individual parameters.
The column sums are treated as gradient components.
If |
hess |
Hessian matrix of the function.
It must have the parameter vector as the first argument and
it must return the Hessian matrix of the objective function.
If missing, finite-difference Hessians, based on |
start |
initial value for the parameter vector. |
print.level |
this argument determines the level of printing which is done during the minimization process. The default value of 0 means that no printing occurs, a value of 1 means that initial and final details are printed and a value of 2 means that full tracing information for every iteration is printed. Higher values will result in even more details. |
tol |
stopping condition. Stop if the absolute difference
between successive iterations is less than |
reltol |
Relative convergence tolerance. The algorithm stops if it is unable to increase the value by a factor of 'reltol * (abs(val) + reltol)' at a step. Defaults to 'sqrt(.Machine\$double.eps)', typically about '1e-8'. |
gradtol |
stopping condition. Stop if the norm of the gradient
less than |
steptol |
stopping/error condition. If the quadratic
approximation leads to lower function value instead of higher, or
|
lambdatol |
control whether the Hessian is treated as negative
definite. If the
largest of the eigenvalues of the Hessian is larger than
|
qrtol |
QR-decomposition tolerance |
iterlim |
stopping condition. Stop if more than |
constraints |
either |
finalHessian |
how (and if) to calculate the final Hessian. Either
|
bhhhHessian |
logical. Indicating whether the approximation for the Hessian suggested by Bernd, Hall, Hall, and Hausman (1974) should be used. |
fixed |
parameters that should be fixed at their starting values:
either a logical vector of the same length as argument |
activePar |
this argument is retained for backward compatibility only;
please use argument |
... |
further arguments to |
The idea of the Newton method is to approximate the function in a given location with a multidimensional parabola, and use the estimated maximum as the initial value for the next iteration. Such an approximation requires knowledge of both gradient and Hessian, the latter of which can be quite costly to compute. Several methods for approximating Hessian exist, including BFGS and BHHH.
The BHHH method
(maxNR
with argument bhhhHessian = TRUE )
or maxBHHH
)
is suitable only for maximizing log-likelihood functions.
It uses information equality in order to approximate the Hessian
of the log-likelihood function.
Hence, the log-likelihood values and its gradients must e calculated
by individual observations.
The Hessian is approximated as the negative of the sum of the outer products
of the gradients of individual observations, or, in the matrix form,
-t(gradient) %*% gradient = - crossprod( gradient )
.
The functions maxNR
, maxBFGSR
, and maxBHHH
can work with constant parameters and related changes of
parameter values. Constant parameters are useful if a parameter value
is converging toward the boundary of support, or for testing.
One way is to put
fixed
to non-NULL, specifying which parameters should be treated as
constants.
However, when using maxNR
or maxBHHH
,
parameters can also be fixed in runtime by
signaling with
fn
. This may be useful if an estimation converges toward the
edge of the parameter space possibly causing problems. The value of
fn
may have following attributes (only used by maxNR):
constPar index vector. Which parameters are redefined to constant
newVal a list with following components:
indexwhich parameters will have a new value
valthe new value of parameters
Hence, constVal
specifies which parameters are treated as
constants. newVal
allows one to overwrite the existing parameter
values, possibly the non-constant values as well. If the attribute
newVal
is present, the new function value need not to exceed
the previous one (maximization is not performed in that step).
list of class "maxim" with following components:
maximum |
|
estimate |
estimated parameter value. |
gradient |
vector, last gradient value which was calculated. Should be close to 0 if normal convergence. |
gradientObs |
matrix of gradients at parameter value |
hessian |
Hessian at the maximum (the last calculated value if not converged). |
code |
return code:
|
message |
a short message, describing |
last.step |
list describing the last unsuccessful step if
|
fixed |
logical vector, which parameters are constants. |
iterations |
number of iterations. |
type |
character string, type of maximization. |
constraints |
A list, describing the constrained optimization
(
|
No attempt is made to ensure that user-provided analytic
gradient/Hessian correct. However, the users are
recommended to use compareDerivatives
function,
designed for this purpose. If analytic gradient/Hessian are wrong,
the algorithm may not converge, or converge to a wrong point.
As the BHHH method
(maxNR
with argument bhhhHessian = TRUE
or maxBHHH
)
uses the likelihood-specific information equality,
it is only suitable for maximizing log-likelihood functions!
Quasi-Newton methods, including those mentioned above, do not work
well in non-concave regions. This is especially the case with the
implementation in maxBFGSR
. The user is advised to
experiment with various tolerance options to achieve convergence.
Ott Toomet otoomet@ut.ee, Arne Henningsen,
function maxBFGSR
was originally developed by Yves Croissant
(and placed in 'mlogit' package)
Berndt, E., Hall, B., Hall, R. and Hausman, J. (1974): Estimation and Inference in Nonlinear Structural Models, Annals of Social Measurement 3, p. 653-665.
Broyden, C.G. (1970): The Convergence of a Class of Double-rank Minimization Algorithms, Journal of the Institute of Mathematics and Its Applications 6, p. 76-90.
Fletcher, R. (1970): A New Approach to Variable Metric Algorithms, Computer Journal 13, p. 317-322.
Goldfeld, S.M. and Quandt, R.E. (1972): Nonlinear Methods in Econometrics. Amsterdam: North-Holland.
Goldfarb, D. (1970): A Family of Variable Metric Updates Derived by Variational Means, Mathematics of Computation 24, p. 23-26.
Greene, W.H., 2008, Econometric Analysis, 6th edition, Prentice Hall.
Shanno, D.F. (1970): Conditioning of Quasi-Newton Methods for Function Minimization, Mathematics of Computation 24, p. 647-656.
maxLik
for a general framework for maximum likelihood
estimation (MLE),
maxBHHH
for maximizations using the Berndt, Hall, Hall,
Hausman (1974) algorithm (which is a wrapper function to maxNR
),
maxBFGS
for maximization using the BFGS, Nelder-Mead (NM),
and Simulated Annealing (SANN) method (based optim
),
nlm
for Newton-Raphson optimization, and
optim
for different gradient-based optimization
methods.
## ML estimation of exponential duration model: t <- rexp(100, 2) loglik <- function(theta) sum(log(theta) - theta*t) ## Note the log-likelihood and gradient are summed over observations gradlik <- function(theta) sum(1/theta - t) hesslik <- function(theta) -100/theta^2 ## Estimate with finite-difference gradient and Hessian a <- maxNR(loglik, start=1, print.level=2) summary(a) ## You would probably prefer 1/mean(t) instead ;-) ## Estimate with analytic gradient and Hessian a <- maxNR(loglik, gradlik, hesslik, start=1) summary(a) ## BFGS estimation with finite-difference gradient a <- maxBFGSR( loglik, start=1 ) summary(a) ## BFGS estimation with analytic gradient a <- maxBFGSR( loglik, gradlik, start=1 ) summary(a) ## For the BHHH method we need likelihood values and gradients ## of individual observations loglikInd <- function(theta) log(theta) - theta*t gradlikInd <- function(theta) 1/theta - t ## Estimate with finite-difference gradient a <- maxBHHH(loglikInd, start=1, print.level=2) summary(a) ## Estimate with analytic gradient a <- maxBHHH(loglikInd, gradlikInd, start=1) summary(a) ## ## Next, we give an example with vector argument: Estimate the mean and ## variance of a random normal sample by maximum likelihood ## Note: you might want to use maxLik instead ## loglik <- function(param) { mu <- param[1] sigma <- param[2] ll <- -0.5*N*log(2*pi) - N*log(sigma) - sum(0.5*(x - mu)^2/sigma^2) ll } x <- rnorm(1000, 1, 2) # use mean=1, stdd=2 N <- length(x) res <- maxNR(loglik, start=c(0,1)) # use 'wrong' start values summary(res) ### ### Now an example of constrained optimization ### ## We maximize exp(-x^2 - y^2) where x+y = 1 f <- function(theta) { x <- theta[1] y <- theta[2] exp(-(x^2 + y^2)) ## Note: you may want to use exp(- theta %*% theta) instead ;-) } ## use constraints: x + y = 1 A <- matrix(c(1, 1), 1, 2) B <- -1 res <- maxNR(f, start=c(0,0), constraints=list(eqA=A, eqB=B), print.level=1) print(summary(res))