Levenberg–Marquardt algorithm

The Levenberg-Marquardt algorithm provides a numerical solution to the mathematical problem of minimizing a sum of squares of several, generally nonlinear functions that depend on a common set of parameters.

This minimization problem arises especially in least squares curve fitting (see also: nonlinear programming).

The Levenberg-Marquardt algorithm (LMA) interpolates between the Gauss-Newton algorithm (GNA) and the method of gradient descent. The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. On the other hand, for well-behaved functions and reasonable starting parameters, the LMA tends to be a bit slower than the GNA. The LMA is the most popular curve-fitting algorithm; it is used in almost any software that provides a generic curve-fitting tool; few users will ever need another curve-fitting algorithm.

The problem

Given are m functions f₁, ..., f_m of n parameters p₁, ..., p_n with m≥n. It is convenient to use vector notation for both the functions and the parameters,

f^T=(f₁, ..., f_m), and p^T=(p₁, ..., p_n).

The least squares problem consists in finding the parameter vector p for which the cost function

S(p) = f^Tf=

\sum _{i=1}^{m}

[f_i(p)]²

becomes minimal.

The main application is in the least squares curve fitting problem: given a set of empirical data pairs (t_i,y_i), optimize the parameters p of the model curve c(t|p) so that the sum of the squares of the deviations

f_i(p)=y_i - c(t_i|p)

becomes minimal.

(A word on notation: we avoid the letter x because it is used sometimes in the place of our p, sometimes in the place of our t).

The solution

Like other numeric minimization algorithms, the Levenberg-Marquardt algorithm is an iterative procedure. To start a minimization, the user has to provide an initial guess for the parameter vector p. In many cases, an uninformed standard guess like p^T=(1,1,...,1) will work fine; in other cases, the algorithm converges only if the initial guess is already somewhat close to the final solution.

In each iteration step, the parameter vector p is replaced by a new estimate p+q. To determine q, the functions f_i(p+q) are approximated by their linearizations

f(p+q) ≈ f(p) + Jq

where J is the Jacobian of f at p.

At a minimum of the sum of squares S, we have ∇_qS=0. With the above linearization, this leads to the following equation

(J^TJ)q = -J^Tf

from which q can be obtained by inverting J^TJ. The key to the LMA is to replace this equation by a 'damped version'

(J^TJ + λ)q = -J^Tf.

The (non-negative) damping factor λ is adjusted at each iteration. If reduction of S is rapid a smaller value can be used bringing the algorithm closer to the GNA, whereas if an iteration gives insufficient reduction in the residual λ can be increased giving a step closer to the gradient descent direction. A similar damping factor appears in Tikhonov regularization, which is used to solve linear ill-posed problems.

If a retrieved step length or the reduction of sum of squares to the latest parameter vector p fall short to predefined limits, the iteration is aborted and the last parameter vector p is considered to be the solution.

External links

Public ___domain implementations:

MINPACK has a FORTRAN implementation lmdif.
A C programming language implementation can be found at sourceforge::lmfit.
gnuplot uses its own implementation gnuplot.info.

References

The algorithm was first published by Kenneth Levenberg, while working at the Frankford Army Arsenal. It was rediscovered by Donald Marquardt who worked as a statistician at DuPont.

Levenberg, K. "A Method for the Solution of Certain Problems in Least Squares." Quart. Appl. Math. 2, 164-168, 1944.
Marquardt, D. "An Algorithm for Least-Squares Estimation of Nonlinear Parameters." SIAM J. Appl. Math. 11, 431-441, 1963.