Levenberg–Marquardt algorithm: Difference between revisions

Content deleted Content added
Barkoza (talk | contribs)
uninformative and not encyclopaedic in tone, this "a bit" was added to the article *17 years ago* and never once copyedited in the meantime.
Marked a sentence as unclear.
Line 46:
The (non-negative) damping factor {{tmath|\lambda}} is adjusted at each iteration. If reduction of {{tmath|S}} is rapid, a smaller value can be used, bringing the algorithm closer to the [[Gauss–Newton algorithm]], whereas if an iteration gives insufficient reduction in the residual, {{tmath|\lambda}} can be increased, giving a step closer to the gradient-descent direction. Note that the [[gradient]] of {{tmath|S}} with respect to {{tmath|\boldsymbol\beta}} equals <math>-2\left (\mathbf J^{\mathrm T}\left [\mathbf y - \mathbf f\left (\boldsymbol\beta\right )\right ]\right )^{\mathrm T}</math>. Therefore, for large values of {{tmath|\lambda}}, the step will be taken approximately in the direction opposite to the gradient. If either the length of the calculated step {{tmath|\boldsymbol\delta}} or the reduction of sum of squares from the latest parameter vector {{tmath|\boldsymbol\beta + \boldsymbol\delta}} fall below predefined limits, iteration stops, and the last parameter vector {{tmath|\boldsymbol\beta}} is considered to be the solution.
 
Levenberg's algorithm has the disadvantage that if the value of damping factor {{tmath|\lambda}} is large, inverting {{tmath|\mathbf J^\text{T}\mathbf J + \lambda\mathbf I}} is not used at all.{{Clarify|reason=Unclear sentence. Also possibly unclear mathematical reasons for the statement.|date=May 2021}} Fletcher provided the insight that we can scale each component of the gradient according to the curvature, so that there is larger movement along the directions where the gradient is smaller. This avoids slow convergence in the direction of small gradient. Therefore, Fletcher in his 1971 paper ''A modified Marquardt subroutine for non-linear least squares'' replaced the identity matrix {{tmath|\mathbf I}} with the diagonal matrix consisting of the diagonal elements of {{tmath|\mathbf J^\text{T}\mathbf J}}, thus making the solution scale invariant:
 
:<math>\left [\mathbf J^{\mathrm T} \mathbf J + \lambda \operatorname{diag}\left (\mathbf J^{\mathrm T} \mathbf J\right )\right ] \boldsymbol\delta = \mathbf J^{\mathrm T}\left [\mathbf y - \mathbf f\left (\boldsymbol\beta\right )\right ].</math>