Levenberg–Marquardt algorithm

The Levenberg-Marquardt algorithm provides a numerical solution to the mathematical problem of minimizing a sum of squares of several, generally nonlinear functions that depend on a common set of parameters.

This minimization problem arises especially in least squares curve fitting (see also: nonlinear programming).

The Levenberg-Marquardt algorithm (LMA) interpolates between the Gauss-Newton algorithm (GNA) and the method of gradient descent. The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. On the other hand, for well-behaved functions and reasonable starting parameters, the LMA tends to be a bit slower than the GNA. The LMA is the most popular curve-fitting algorithm; it is used in almost any software that provides a generic curve-fitting tool; few users will ever need another curve-fitting algorithm.

The problem

Given m functions f₁, ..., f_m of n parameters p₁, ..., p_n with m≥n. It is convenient to use vector notation for both the functions and the parameters,

f^T=(f₁, ..., f_m), and p^T=(p₁, ..., p_n).

The least squares problem consists in finding the parameter vector p for which the cost function

S(p) = f^Tf=

\sum _{i=1}^{m}

[f_i(p)]²

becomes minimal.

The main application is in the least squares curve fitting problem: given a set of empirical data pairs (t_i,y_i), optimize the parameters p of the model curve c(t|p) so that the sum of the squares of the deviations

f_i(p)=y_i - c(t_i|p)

becomes minimal.

(A word on notation: we avoid the letter x because it is used sometimes in the place of our p, sometimes in the place of our t).

The solution

Like other numeric minimization algorithms, the Levenberg-Marquardt algorithm is an iterative procedure. To start a minimization, the user has to provide an initial guess for the parameter vector p. In many cases, an uninformed standard guess like p^T=(1,1,...,1) will work fine; in other cases, the algorithm converges only if the initial guess is already somewhat close to the final solution.

In each iteration step, the parameter vector p is replaced by a new estimate p+q. To determine q, the functions f_i(p+q) are approximated by their linearizations

f(p+q) ≈ f(p) + Jq

where J is the Jacobian of f at p.

At a minimum of the sum of squares S, we have ∇_qS=0. With the above linearization, this leads to the following equation

(J^TJ)q = -J^Tf

from which q can be obtained by inverting J^TJ. The key to the LMA is to replace this equation by a 'damped version'

(J^TJ + λ)q = -J^Tf.

The (non-negative) damping factor λ is adjusted at each iteration. If reduction of S is rapid a smaller value can be used bringing the algorithm closer to the GNA, whereas if an iteration gives insufficient reduction in the residual λ can be increased giving a step closer to the gradient descent direction. A similar damping factor appears in Tikhonov regularization, which is used to solve linear ill-posed problems.

If a retrieved step length or the reduction of sum of squares to the latest parameter vector p fall short to predefined limits, the iteration is aborted and the last parameter vector p is considered to be the solution.

Choice of Damping Parameter

Various more or less heuristic arguments have been put forward for the choice of the damping parameter λ while various theoretical arguments exists for optimium choice for local convergence of the algorithm; they can make the global convergence of the algorithm gain the undesirable properties of steepest-descent, in particular very slow convergence close to the optimum.

Obviously the absolute values of any choice depends on how well-scaled the initial problem is. Marquadt recommended starting with a value λ₀ and a factor ν>1. Initiall set λ=λ₀; and computing the residual sum of squares after one step from the starting point with the damping factor of λ=λ₀; and secondly with λ/ν If both of these are worse than the initial point then the damping is increased by successive multiplication by ν until a better point is found with a new damping factor of λ ν^k for some k.

If use damping factor λ/ν results in a reduction in squared residual then this is taken as the new value of λ (and the new optimum ___location is taken as that obtained with this damping factor) and the process continues; if using λ/ν resulted in a worse residual, but using λ resulted in a better residual then &lamda; is left unchanged and the new optimum is taken as the value obatined with λ as damping factor.

References

The algorithm was first published by Kenneth Levenberg, while working at the Frankford Army Arsenal. It was rediscovered by Donald Marquardt who worked as a statistician at DuPont.

Levenberg, K. "A Method for the Solution of Certain Problems in Least Squares." Quart. Appl. Math. 2, 164-168, 1944.
Marquardt, D. "An Algorithm for Least-Squares Estimation of Nonlinear Parameters." SIAM J. Appl. Math. 11, 431-441, 1963.
Gill, P. E. and Murray, W. "Algorithms for the solution of the nonlinear least-squares problem", SIAM J. Numer. Anal. 15 [5] 977-992, 1978.

External links

Available implementations:

MINPACK has a FORTRAN implementation lmdif.
A C programming language implementation can be found at sourceforge::lmfit.
Open source Java programming language implementations are offered by J.P. Lewis and J. Holopainen (L-M fit package).
gnuplot uses its own implementation gnuplot.info.
levmar: Levenberg-Marquardt non-linear least squares algorithms in C/C++.