Content deleted Content added
m Maintain {{WPBS}}: 2 WikiProject templates. Remove 1 deprecated parameter: field. Tag: |
|||
(40 intermediate revisions by 23 users not shown) | |||
Line 1:
{{WikiProject banner shell|class=C|
{{WikiProject Mathematics|importance=low}}
{{WikiProject Statistics|importance=low}}
}}
{{annual readership|scale=log}}
== L-M is for nonlinear least squares problems only ==
The article needs to make clear that the L-M algorithm is for solving nonlinear least squares problems, not the more general class of nonlinear optimization problems. In fact, the introduction used to give the impression that the L-M algorithm is used to solve general nonlinear optimization problems, which is not true.
== Functions vs. Values ==
The article talks about "the functions <math>f(x_i,\boldsymbol \beta+\boldsymbol \delta)</math>". This confused the heck out of me, until I realized that what is meant is the values of the function <math>f(x,\boldsymbol \beta+\boldsymbol \delta)</math> at <math>x_i</math>. Similar confused terminology is used at other points in the article, for example when calling <math>J_i</math> the gradient. These kind of terminology issues make the article hard to read. <span style="font-size: smaller;" class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/80.65.109.181|80.65.109.181]] ([[User talk:80.65.109.181|talk]]) 12:09, 13 September 2013 (UTC)</span><!-- Template:Unsigned IP --> <!--Autosigned by SineBot-->
== Inconsistency ==
In the article is says that Levenberg's contribution is in introducing lambda. But at the same time in the references it says that he was the first to publish the algorithm in 1944 or something. This seems a bit contradictory to me.
[[User:Ravn-hawk|Ravn-hawk]] ([[User talk:Ravn-hawk|talk]]) 15:38, 28 September 2011 (UTC)
== Possible replacement article? ==
Line 13 ⟶ 31:
:Both formulas have the same meaning, namely '''J'''<sup>T</sup>'''Jq''' + λ'''q''' = −'''J'''<sup>T</sup>'''f'''. As far as I am concerned, it is a matter of taste which you prefer, but the second is perhaps clearer. -- [[User:Jitse Niesen|Jitse Niesen]] ([[User talk:Jitse Niesen|talk]]) 20:46, 30 August 2005 (UTC)
------------------------------------------------------------------------------------
All formulas have a lack of negative sign. For example:
:<math>\mathbf{(J^{T}J)\boldsymbol \delta = J^{T} [y - f(\boldsymbol \beta)]} \!</math>
should be:
:<math>\mathbf{(J^{T}J)\boldsymbol \delta = -J^{T} [y - f(\boldsymbol \beta)]} \!</math>
finding this bug took quite some time. And it should be pointed out, how to get to this formula. --vogt31337 9:00 26 January 2012 (UTC) <small><span class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[User:Vogt31337|Vogt31337]] ([[User talk:Vogt31337|talk]] • [[Special:Contributions/Vogt31337|contribs]]) </span></small><!-- Template:Unsigned --> <!--Autosigned by SineBot-->
@ Vogt31337:
i think there is no sign missing since '''J''' is the jacobian of '''f''' and '''f''' has a negativ sign in '''S'''. the goal is to minimize '''S''' and therefore we want to go in direction of negativ jacobian of
'''S''' which turns out to be the negative jacobian of '''-f''' and thus '''-(-J) = J'''. -- Georg
@ Vogt31337: I agree with you. -- D <!-- Template:Unsigned IP --><small class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/192.55.54.42|192.55.54.42]] ([[User talk:192.55.54.42#top|talk]]) 23:13, 28 June 2019 (UTC)</small> <!--Autosigned by SineBot-->
==Transpose==
Line 27 ⟶ 59:
:Perhaps you know this, and your question is: why do we want to use a column vector? Well, '''f''' has to be a column vector because '''f'''<sup>T</sup>'''f''' further down the article should be a [[inner product]], and '''p''' has to be a column vector because it's added to '''q''', which is a column vector because otherwise the product '''Jq''' does not make sense. -- [[User:Jitse Niesen|Jitse Niesen]] ([[User talk:Jitse Niesen|talk]]) 23:19, 15 December 2005 (UTC)
== Improvement by
What's about the improvement Marquardt made to the algorithm?
Line 44 ⟶ 76:
I believe Marquardt's suggested improvement was actually to scale the approximate hessian matrix, <math>J^{T}J</math>. The scaling had an effect similar to replacing the identity matrix with the diagonal of the hessian approximation. Marquardt suggests scaling the hessian approximation by an amount that makes the diagonal elements ones. Adding a constant diagonal matrix to the scaled matrix is similar to adding a proportion of the diagonal elements to the unscaled matrix. The scaling applied to the other elements of the hessian approximation improves the condition of the matrix. I suggest the following:
<math>q = \
where <math>\
<math>\Sigma_J = [\mbox{diag}[J^TJ]]^{-\frac{1}{2}}</math>
and the scaled Jacobian, <math>\hat{J}</math>, is:
<math>\hat{J} = J\Sigma_J</math>
The square matrix, <math>\hat{J}^T\hat{J}</math>, is then a scaled version
Line 75 ⟶ 108:
- should discuss the approach of Moré, the standard used in most production-quality numerical libraries. It tends to be both faster and more robust than earlier more naive methods. [[User:Jheald|Jheald]] ([[User talk:Jheald|talk]]) 22:26, 30 January 2008 (UTC)
== Explanation of the poor performance of L-M in example ==
The data sequence has very high frequency components. The sparse sampling plan shown in the plots will necessarily contain huge aliasing error that is beyond any inverse algorithm. The way to correct this apparent poor performance of L-M is to sample the data train much more densely, e.g. in 0.001 step size, then the L-M will find the correct answer easily under a very wide range of initial conditions. It is really not L-M that is at fault here. --[[User:Xzhou.lumi|Xzhou.lumi]] ([[User talk:Xzhou.lumi|talk]]) 22:32, 2 July 2008 (UTC)
== Room for clarification ==
I think the article could do with some clarification; I know I could. Most optimization algorithms I have looked at focus on searching in R<sup>n</sup> for the minimum of some scalar-valued function of R<sup>n</sup>. In this article, that function is clearly <math>S(\boldsymbol{\beta})</math>, but the article doesn't emphasize that as much as it emphasizes <math>f(x_i,\boldsymbol{\beta})</math>.
As I try to fully understand L–M, I keep seeing it as a [[trust region]] algorithm in which we do a second-order Taylor expansion, <math>S(\boldsymbol{\beta}+\boldsymbol{\delta})</math> in the neighborhood of <math>\boldsymbol{\delta}=\mathbf{0}</math>. In that case we would have
: <math>S(\boldsymbol{\beta}+\boldsymbol{\delta}) \approx S(\boldsymbol{\beta}) + \operatorname{grad}(S) \cdot \boldsymbol{\delta} + \boldsymbol{\delta}^\mathrm{T} \operatorname{Hessian}(S) \boldsymbol{\delta}</math>
It looks like L–M uses <math>\mathbf{J}^\mathrm{T}\mathbf{J}</math> for the Hessian. Is this the exact Hessian or just an approximation? (It [[Gauss–Newton_algorithm#Derivation_from_Newton.27s_method|looks like an approximation]], but I can't picture the difference between the true Hessian and the approximation.)
Now, the gradient of ''S'' WRT '''β''' is given in this article as
: <math>\operatorname{grad}(S) = (-2)(\mathbf{J}^{T} [y - f(\boldsymbol \beta) ] )^{T}</math>.
With some hand waving, I can see this giving us the equation for the minimum of ''S'' of
:<math>\mathbf{(J^{T}J)\boldsymbol \delta = J^{T} [y - f(\boldsymbol \beta)]} \!</math>
Then it looks like the damping parameter basically defines the radius of the trust region and that it isn't a hard-edged trust region but rather the <math>\lambda I</math> term modifies the cost function to keep the local minimum from being too far away (basically adding a paraboloid, scaled by λ to ''S''. Is that about right?
Finally, I don't quite see the intuition for using <math>\operatorname{diag}(\mathbf{D}^\mathrm{T}\mathbf{D})</math> instead of the identity.
Any answers/thoughts/corrections would be appreciated. Thanks. [[User:BenFrantzDale|—Ben FrantzDale]] ([[User talk:BenFrantzDale|talk]]) 22:04, 7 December 2008 (UTC)
--------------------------------------------------------------------------
In between the section Solution there is the sentence:
: >> Note that the gradient of ''S'' with respect to ''δ'' equals <math>-2(\mathbf{J}^{T} [\mathbf{y} - \mathbf{f}(\boldsymbol \beta) ] )^T </math> <<
: Is it not supposed to be: ... gradient of ''S'' with respect to '''β''' ?? -Georg <span style="font-size: smaller;" class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/31.18.248.89|31.18.248.89]] ([[User talk:31.18.248.89|talk]]) 20:45, 27 May 2014 (UTC)</span><!-- Template:Unsigned IP --> <!--Autosigned by SineBot-->
== Notes and references==
There was a slight problem with the use of Notes and References. I have fixed it. --[[User:MihalOrela|Михал Орела]] ([[User talk:MihalOrela|talk]]) 20:50, 3 February 2009 (UTC)
== Comparison of Efficacy and Performance with other similar Algorithms ==
Can anyone provide a section or information comparing the LMA with other algorithms such as the Conjugate-Gradient method?[[User:Mxbuck|MxBuck]] ([[User talk:Mxbuck|talk]]) 19:18, 23 September 2009 (UTC)
== Jacobian J ==
I think there's a little mistake in the definition of J:
Shouldn't it be: J = S(β) / dβ ?
This would also correspond to this (from the external links):
http://www.siam.org/books/textbooks/fr18_book.pdf
We're trying to minimize the error, so taking the Jacobian of the original function wouldn't make sense. <span style="font-size: smaller;" class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/129.132.154.39|129.132.154.39]] ([[User talk:129.132.154.39|talk]]) 10:42, 20 October 2011 (UTC)</span><!-- Template:Unsigned IP --> <!--Autosigned by SineBot-->
== Covariance of the solution ==
There should be discussion of the covariance of the solution. I think it's that if J is left-multiplied by the inverse of the input covariance, than <math>J^\top \Sigma^{-1} J</math> is the covariance of the output, or somesuch. [[User:BenFrantzDale|—Ben FrantzDale]] ([[User talk:BenFrantzDale|talk]]) 17:20, 26 October 2011 (UTC)
== External links modified ==
Hello fellow Wikipedians,
I have just modified 2 external links on [[Levenberg–Marquardt algorithm]]. Please take a moment to review [[special:diff/816549943|my edit]]. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit [[User:Cyberpower678/FaQs#InternetArchiveBot|this simple FaQ]] for additional information. I made the following changes:
*Added archive https://web.archive.org/web/20110723020836/http://trac.astrometry.net/wiki/PyLevmar to http://trac.astrometry.net/wiki/PyLevmar
*Added archive https://web.archive.org/web/20130722142233/http://www2.imm.dtu.dk/~hbni/Software/SMarquardt.m to http://www2.imm.dtu.dk/~hbni/Software/SMarquardt.m
When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.
{{sourcecheck|checked=false|needhelp=}}
Cheers.—[[User:InternetArchiveBot|'''<span style="color:darkgrey;font-family:monospace">InternetArchiveBot</span>''']] <span style="color:green;font-family:Rockwell">([[User talk:InternetArchiveBot|Report bug]])</span> 02:18, 22 December 2017 (UTC)
== Broken images ==
Hi just a note to say that the image links on this page appear to be broken. Thanks. <!-- Template:Unsigned IP --><small class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/2A01:110:8012:1012:0:0:0:82|2A01:110:8012:1012:0:0:0:82]] ([[User talk:2A01:110:8012:1012:0:0:0:82#top|talk]]) 11:59, 17 January 2018 (UTC)</small> <!--Autosigned by SineBot-->
|