maxNR {maxLik}R Documentation

Newton- and Quasi-Newton Maximization

Description

Unconstrained maximization based on the quadratic approximation (Newton) method. The Newton-Raphson, BFGS (Broyden 1970, Fletcher 1970, Goldfarb 1970, Shanno 1970), and BHHH (Berndt, Hall, Hall, Hausman 1974) methods are available.

Usage

maxNR(fn, grad = NULL, hess = NULL, start, print.level = 0,
      tol = 1e-08, reltol=sqrt(.Machine$double.eps), gradtol = 1e-06,
      steptol = 1e-10, lambdatol = 1e-06, qrtol = 1e-10, iterlim = 150,
      constraints = NULL, finalHessian = TRUE, bhhhHessian=FALSE,
      fixed = NULL, activePar = NULL, ... )
maxBFGSR(fn, grad = NULL, hess = NULL, start, print.level = 0,
      tol = 1e-8, reltol=sqrt(.Machine$double.eps), gradtol = 1e-6,
      steptol = 1e-10, lambdatol=1e-6, qrtol=1e-10,
      iterlim = 150,
      constraints = NULL, finalHessian = TRUE,
      fixed = NULL, activePar = NULL, ... )
maxBHHH(fn, grad = NULL, hess = NULL, start, print.level = 0,
      iterlim = 100, finalHessian = "BHHH", ... )

Arguments

fn

function to be maximized. It must have the parameter vector as the first argument and it must return either a single number or a numeric vector, which is summed. If the BHHH method is used and argument gradient is not given, fn must return a numeric vector of observation-specific likelihood values. If the parameters are out of range, fn should return NA. See details for constant parameters.

fn may also return attributes "gradient" and/or "hessian". If these attributes are set, the algorithm uses the corresponding values as gradient and Hessian.

grad

gradient of the objective function. It must have the parameter vector as the first argument and it must return either a gradient vector of the objective function, or a matrix, where columns correspond to individual parameters. The column sums are treated as gradient components. If NULL, finite-difference gradients are computed. If the BHHH method is used, grad must return a matrix, where rows corresponds to the gradient vectors of individual observations and the columns to the individual parameters. If fn returns an object with attribute gradient, this argument is ignored.

hess

Hessian matrix of the function. It must have the parameter vector as the first argument and it must return the Hessian matrix of the objective function. If missing, finite-difference Hessians, based on gradient, are computed. Hessians are used for maximizations with the Newton-Raphson method but not for maximizations with the BFGS or BHHH method.

start

initial value for the parameter vector.

print.level

this argument determines the level of printing which is done during the minimization process. The default value of 0 means that no printing occurs, a value of 1 means that initial and final details are printed and a value of 2 means that full tracing information for every iteration is printed. Higher values will result in even more details.

tol

stopping condition. Stop if the absolute difference between successive iterations is less than tol, return code=2.

reltol

Relative convergence tolerance. The algorithm stops if it is unable to increase the value by a factor of 'reltol * (abs(val) + reltol)' at a step. Defaults to 'sqrt(.Machine\$double.eps)', typically about '1e-8'.

gradtol

stopping condition. Stop if the norm of the gradient less than gradtol, return code=1.

steptol

stopping/error condition. If the quadratic approximation leads to lower function value instead of higher, or NA, the step length is halved and a new attempt is made. This procedure is repeated until step < steptol, thereafter code=3 is returned.

lambdatol

control whether the Hessian is treated as negative definite. If the largest of the eigenvalues of the Hessian is larger than -lambdatol, a suitable diagonal matrix is subtracted from the Hessian (quadratic hill-climbing) in order to enforce nagetive definiteness.

qrtol

QR-decomposition tolerance

iterlim

stopping condition. Stop if more than iterlim iterations, return code=4.

constraints

either NULL for unconstrained optimization or a list with two components eqA and eqB for equality-constrained optimization A %*% theta + B = 0. The constrained problem is forwarded to sumt.

finalHessian

how (and if) to calculate the final Hessian. Either FALSE (do not calculate), TRUE (use analytic/finite-difference Hessian) or "bhhh"/"BHHH" for the information equality approach. The latter approach is only suitable for maximizing log-likelihood functions. It requires the gradient/log-likelihood to be supplied by individual observations. Note that computing the (real, not BHHH) final Hessian does not carry any extra penalty for the NR method, but for the other methods.

bhhhHessian

logical. Indicating whether the approximation for the Hessian suggested by Bernd, Hall, Hall, and Hausman (1974) should be used.

fixed

parameters that should be fixed at their starting values: either a logical vector of the same length as argument start, a numeric (index) vector indicating the positions of the fixed parameters, or a vector of character strings indicating the names of the fixed parameters (parameter names are taken from argument start).

activePar

this argument is retained for backward compatibility only; please use argument fixed instead.

...

further arguments to fn, grad and hess. Further arguments to maxBHHH are also passed to maxNR.

Details

The idea of the Newton method is to approximate the function in a given location with a multidimensional parabola, and use the estimated maximum as the initial value for the next iteration. Such an approximation requires knowledge of both gradient and Hessian, the latter of which can be quite costly to compute. Several methods for approximating Hessian exist, including BFGS and BHHH.

The BHHH method (maxNR with argument bhhhHessian = TRUE ) or maxBHHH) is suitable only for maximizing log-likelihood functions. It uses information equality in order to approximate the Hessian of the log-likelihood function. Hence, the log-likelihood values and its gradients must e calculated by individual observations. The Hessian is approximated as the negative of the sum of the outer products of the gradients of individual observations, or, in the matrix form, -t(gradient) %*% gradient = - crossprod( gradient ).

The functions maxNR, maxBFGSR, and maxBHHH can work with constant parameters and related changes of parameter values. Constant parameters are useful if a parameter value is converging toward the boundary of support, or for testing. One way is to put fixed to non-NULL, specifying which parameters should be treated as constants.

However, when using maxNR or maxBHHH, parameters can also be fixed in runtime by signaling with fn. This may be useful if an estimation converges toward the edge of the parameter space possibly causing problems. The value of fn may have following attributes (only used by maxNR):

Hence, constVal specifies which parameters are treated as constants. newVal allows one to overwrite the existing parameter values, possibly the non-constant values as well. If the attribute newVal is present, the new function value need not to exceed the previous one (maximization is not performed in that step).

Value

list of class "maxim" with following components:

maximum

fn value at maximum (the last calculated value if not converged).

estimate

estimated parameter value.

gradient

vector, last gradient value which was calculated. Should be close to 0 if normal convergence.

gradientObs

matrix of gradients at parameter value estimate evaluated at each observation (only if grad returns a matrix or grad is not specified and fn returns a vector).

hessian

Hessian at the maximum (the last calculated value if not converged).

code

return code:

  • 1 gradient close to zero (normal convergence).

  • 2 successive function values within tolerance limit (normal convergence).

  • 3 last step could not find higher value (probably not converged). This is related to line search step getting too small, usually because hitting the boundary of the parameter space. It may also be related to attempts to move to a wrong direction because of numerical errors. In some cases it can be helped by changing steptol.

  • 4 iteration limit exceeded.

  • 5 Infinite value.

  • 6 Infinite gradient.

  • 7 Infinite Hessian.

  • 8 Successive function values withing relative tolerance limit (normal convergence).

  • 9 (BFGS) Hessian approximation cannot be improved because of gradient did not change. May be related to numerical approximation problems or wrong analytic gradient.

  • 100 Initial value out of range.

message

a short message, describing code.

last.step

list describing the last unsuccessful step if code=3 with following components:

  • theta0 previous parameter value

  • f0 fn value at theta0

  • climb the movement vector to the maximum of the quadratic approximation

fixed

logical vector, which parameters are constants.

iterations

number of iterations.

type

character string, type of maximization.

constraints

A list, describing the constrained optimization (NULL if unconstrained). Includes the following components:

  • type type of constrained optimization

  • outer.iterations number of iterations in the constraints step

  • barrier.value value of the barrier function

Warning

No attempt is made to ensure that user-provided analytic gradient/Hessian correct. However, the users are recommended to use compareDerivatives function, designed for this purpose. If analytic gradient/Hessian are wrong, the algorithm may not converge, or converge to a wrong point.

As the BHHH method (maxNR with argument bhhhHessian = TRUE or maxBHHH) uses the likelihood-specific information equality, it is only suitable for maximizing log-likelihood functions!

Quasi-Newton methods, including those mentioned above, do not work well in non-concave regions. This is especially the case with the implementation in maxBFGSR. The user is advised to experiment with various tolerance options to achieve convergence.

Author(s)

Ott Toomet otoomet@ut.ee, Arne Henningsen, function maxBFGSR was originally developed by Yves Croissant (and placed in 'mlogit' package)

References

Berndt, E., Hall, B., Hall, R. and Hausman, J. (1974): Estimation and Inference in Nonlinear Structural Models, Annals of Social Measurement 3, p. 653-665.

Broyden, C.G. (1970): The Convergence of a Class of Double-rank Minimization Algorithms, Journal of the Institute of Mathematics and Its Applications 6, p. 76-90.

Fletcher, R. (1970): A New Approach to Variable Metric Algorithms, Computer Journal 13, p. 317-322.

Goldfeld, S.M. and Quandt, R.E. (1972): Nonlinear Methods in Econometrics. Amsterdam: North-Holland.

Goldfarb, D. (1970): A Family of Variable Metric Updates Derived by Variational Means, Mathematics of Computation 24, p. 23-26.

Greene, W.H., 2008, Econometric Analysis, 6th edition, Prentice Hall.

Shanno, D.F. (1970): Conditioning of Quasi-Newton Methods for Function Minimization, Mathematics of Computation 24, p. 647-656.

See Also

maxLik for a general framework for maximum likelihood estimation (MLE), maxBHHH for maximizations using the Berndt, Hall, Hall, Hausman (1974) algorithm (which is a wrapper function to maxNR), maxBFGS for maximization using the BFGS, Nelder-Mead (NM), and Simulated Annealing (SANN) method (based optim), nlm for Newton-Raphson optimization, and optim for different gradient-based optimization methods.

Examples

## ML estimation of exponential duration model:
t <- rexp(100, 2)
loglik <- function(theta) sum(log(theta) - theta*t)
## Note the log-likelihood and gradient are summed over observations
gradlik <- function(theta) sum(1/theta - t)
hesslik <- function(theta) -100/theta^2
## Estimate with finite-difference gradient and Hessian
a <- maxNR(loglik, start=1, print.level=2)
summary(a)
## You would probably prefer 1/mean(t) instead ;-)
## Estimate with analytic gradient and Hessian
a <- maxNR(loglik, gradlik, hesslik, start=1)
summary(a)

## BFGS estimation with finite-difference gradient
a <- maxBFGSR( loglik, start=1 )
summary(a)
## BFGS estimation with analytic gradient
a <- maxBFGSR( loglik, gradlik, start=1 )
summary(a)

## For the BHHH method we need likelihood values and gradients
## of individual observations
loglikInd <- function(theta) log(theta) - theta*t
gradlikInd <- function(theta) 1/theta - t
## Estimate with finite-difference gradient
a <- maxBHHH(loglikInd, start=1, print.level=2)
summary(a)
## Estimate with analytic gradient
a <- maxBHHH(loglikInd, gradlikInd, start=1)
summary(a)

##
## Next, we give an example with vector argument:  Estimate the mean and
## variance of a random normal sample by maximum likelihood
## Note: you might want to use maxLik instead
##
loglik <- function(param) {
  mu <- param[1]
  sigma <- param[2]
  ll <- -0.5*N*log(2*pi) - N*log(sigma) - sum(0.5*(x - mu)^2/sigma^2)
  ll
}
x <- rnorm(1000, 1, 2) # use mean=1, stdd=2
N <- length(x)
res <- maxNR(loglik, start=c(0,1)) # use 'wrong' start values
summary(res)
###
### Now an example of constrained optimization
###
## We maximize exp(-x^2 - y^2) where x+y = 1
f <- function(theta) {
  x <- theta[1]
  y <- theta[2]
  exp(-(x^2 + y^2))
  ## Note: you may want to use exp(- theta %*% theta) instead ;-)
}
## use constraints: x + y = 1
A <- matrix(c(1, 1), 1, 2)
B <- -1
res <- maxNR(f, start=c(0,0), constraints=list(eqA=A, eqB=B), print.level=1)
print(summary(res))

[Package maxLik version 1.1-0 Index]