Levenberg–Marquardt algorithm

This is an old revision of this page, as edited by Alanb (talk | contribs) at 14:26, 28 April 2006 (The solution). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The Levenberg-Marquardt algorithm provides a numerical solution to the mathematical problem of minimizing a sum of squares of several, generally nonlinear functions that depend on a common set of parameters.

This minimization problem arises especially in least squares curve fitting (see also: nonlinear programming).

The Levenberg-Marquardt algorithm (LMA) interpolates between the Gauss-Newton algorithm (GNA) and the method of gradient descent. The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. On the other hand, for well-behaved functions and reasonable starting parameters, the LMA tends to be a bit slower than the GNA. The LMA is the most popular curve-fitting algorithm; it is used in almost any software that provides a generic curve-fitting tool; few users will ever need another curve-fitting algorithm.

The problem

Given m functions f1, ..., fm of n parameters p1, ..., pn with mn. It is convenient to use vector notation for both the functions and the parameters,

fT=(f1, ..., fm), and pT=(p1, ..., pn).

The least squares problem consists in finding the parameter vector p for which the cost function

S(p) = fTf=  [fi(p)]2

becomes minimal.

The main application is in the least squares curve fitting problem: given a set of empirical data pairs (ti,yi), optimize the parameters p of the model curve c(t|p) so that the sum of the squares of the deviations

fi(p)=yi - c(ti|p)

becomes minimal.

(A word on notation: we avoid the letter x because it is used sometimes in the place of our p, sometimes in the place of our t).

The solution

Like other numeric minimization algorithms, the Levenberg-Marquardt algorithm is an iterative procedure. To start a minimization, the user has to provide an initial guess for the parameter vector p. In many cases, an uninformed standard guess like pT=(1,1,...,1) will work fine; in other cases, the algorithm converges only if the initial guess is already somewhat close to the final solution.

In each iteration step, the parameter vector p is replaced by a new estimate p+q. To determine q, the functions fi(p+q) are approximated by their linearizations

f(p+q) ≈ f(p) + Jq

where J is the Jacobian of f at p.

At a minimum of the sum of squares S, we have ∇qS=0. With the above linearization, this leads to the following equation

(JTJ)q = -JTf

from which q can be obtained by inverting JTJ. The key to the LMA is to replace this equation by a 'damped version'

(JTJ + λ)q = -JTf.

The (non-negative) damping factor λ is adjusted at each iteration. If reduction of S is rapid a smaller value can be used bringing the algorithm closer to the GNA, whereas if an iteration gives insufficient reduction in the residual λ can be increased giving a step closer to the gradient descent direction. A similar damping factor appears in Tikhonov regularization, which is used to solve linear ill-posed problems.

If a retrieved step length or the reduction of sum of squares to the latest parameter vector p fall short to predefined limits, the iteration is aborted and the last parameter vector p is considered to be the solution.


Choice of Damping Parameter

Various more or less heuristic arguments have been put forward for the choice of the damping parameter λ while various theoretical arguments exists for optimium choice for local convergence of the algorithm; they can make the global convergence of the algorithm gain the undesirable properties of steepest-descent, in particular very slow convergence close to the optimum.

Obviously the absolute values of any choice depends on how well-scaled the initial problem is. Marquadt recommended starting with a value λ0 and a factor ν>1. Initiall set λ=λ0; and computing the residual sum of squares after one step from the starting point with the damping factor of λ=λ0; and secondly with λ/ν If both of these are worse than the initial point then the damping is increased by successive multiplication by ν until a better point is found with a new damping factor of λ νk for some k.

If use damping factor λ/ν results in a reduction in squared residual then this is taken as the new value of λ (and the new optimum ___location is taken as that obtained with this damping factor) and the process continues; if using λ/ν resulted in a worse residual, but using λ resulted in a better residual then &lamda; is left unchanged and the new optimum is taken as the value obatined with λ as damping factor.

References

The algorithm was first published by Kenneth Levenberg, while working at the Frankford Army Arsenal. It was rediscovered by Donald Marquardt who worked as a statistician at DuPont.

  • Levenberg, K. "A Method for the Solution of Certain Problems in Least Squares." Quart. Appl. Math. 2, 164-168, 1944.
  • Marquardt, D. "An Algorithm for Least-Squares Estimation of Nonlinear Parameters." SIAM J. Appl. Math. 11, 431-441, 1963.
  • Gill, P. E. and Murray, W. "Algorithms for the solution of the nonlinear least-squares problem", SIAM J. Numer. Anal. 15 [5] 977-992, 1978.

Available implementations: