The Levenberg-Marquardt algorithm provides a numerical solution to the mathematical problem of minimizing a sum of squares of several, generally nonlinear functions that depend on a common set of parameters.
This minimization problem arises especially in least squares curve fitting (see also: nonlinear programming).
The Levenberg-Marquardt algorithm (LMA) interpolates between the Gauss-Newton algorithm (GNA) and the method of gradient descent. The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. On the other hand, for well-behaved functions and reasonable starting parameters, the LMA tends to be a bit slower than the GNA. The LMA is the most popular curve-fitting algorithm; it is used in almost any software that provides a generic curve-fitting tool; few users will ever need another curve-fitting algorithm.
The problem
Given m functions f1, ..., fm of n parameters p1, ..., pn with m≥n. It is convenient to use vector notation for both the functions and the parameters,
- fT=(f1, ..., fm), and pT=(p1, ..., pn).
The least squares problem consists in finding the parameter vector p for which the cost function
- S(p) = fTf= [fi(p)]2
becomes minimal.
The main application is in the least squares curve fitting problem: given a set of empirical data pairs (ti,yi), optimize the parameters p of the model curve c(t|p) so that the sum of the squares of the deviations
- fi(p)=yi - c(ti|p)
becomes minimal.
(A word on notation: we avoid the letter x because it is used sometimes in the place of our p, sometimes in the place of our t).
The solution
Like other numeric minimization algorithms, the Levenberg-Marquardt algorithm is an iterative procedure. To start a minimization, the user has to provide an initial guess for the parameter vector p. In many cases, an uninformed standard guess like pT=(1,1,...,1) will work fine; in other cases, the algorithm converges only if the initial guess is already somewhat close to the final solution.
In each iteration step, the parameter vector p is replaced by a new estimate p+q. To determine q, the functions fi(p+q) are approximated by their linearizations
- f(p+q) ≈ f(p) + Jq
where J is the Jacobian of f at p.
At a minimum of the sum of squares S, we have ∇qS=0. With the above linearization, this leads to the following equation
- (JTJ)q = -JTf
from which q can be obtained by inverting JTJ. The key to the LMA is to replace this equation by a 'damped version'
- (JTJ + λ)q = -JTf.
The (non-negative) damping factor λ is adjusted at each iteration. If reduction of S is rapid a smaller value can be used bringing the algorithm closer to the GNA, whereas if an iteration gives insufficient reduction in the residual λ can be increased giving a step closer to the gradient descent direction. A similar damping factor appears in Tikhonov regularization, which is used to solve linear ill-posed problems.
If a retrieved step length or the reduction of sum of squares to the latest parameter vector p fall short to predefined limits, the iteration is aborted and the last parameter vector p is considered to be the solution.
Choice of Damping Parameter
Various more or less heuristic arguments have been put forward for the choice of the damping parameter λ while various theoretical arguments exists for optimium choice for local convergence of the algorithm; they can make the global convergence of the algorithm gain the undesirable properties of steepest-descent, in particular very slow convergence close to the optimum.
Obviously the absolute values of any choice depends on how well-scaled the initial problem is. Marquadt recommended starting with a value λ0 and a factor ν>1. Initiall set λ=λ0; and computing the residual sum of squares after one step from the starting point with the damping factor of λ=λ0; and secondly with λ/ν If both of these are worse than the initial point then the damping is increased by successive multiplication by ν until a better point is found with a new damping factor of λ νk for some k.
If use damping factor λ/ν results in a reduction in squared residual then this is taken as the new value of λ (and the new optimum ___location is taken as that obtained with this damping factor) and the process continues; if using λ/ν resulted in a worse residual, but using λ resulted in a better residual then &lamda; is left unchanged and the new optimum is taken as the value obatined with λ as damping factor.
References
The algorithm was first published by Kenneth Levenberg, while working at the Frankford Army Arsenal. It was rediscovered by Donald Marquardt who worked as a statistician at DuPont.
- Levenberg, K. "A Method for the Solution of Certain Problems in Least Squares." Quart. Appl. Math. 2, 164-168, 1944.
- Marquardt, D. "An Algorithm for Least-Squares Estimation of Nonlinear Parameters." SIAM J. Appl. Math. 11, 431-441, 1963.
- Gill, P. E. and Murray, W. "Algorithms for the solution of the nonlinear least-squares problem", SIAM J. Numer. Anal. 15 [5] 977-992, 1978.
External links
Available implementations:
- MINPACK has a FORTRAN implementation lmdif.
- A C programming language implementation can be found at sourceforge::lmfit.
- Open source Java programming language implementations are offered by J.P. Lewis and J. Holopainen (L-M fit package).
- gnuplot uses its own implementation gnuplot.info.
- levmar: Levenberg-Marquardt non-linear least squares algorithms in C/C++.