Talk:Levenberg–Marquardt algorithm

This is an old revision of this page, as edited by G.k. (talk | contribs) at 13:46, 27 April 2006. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Latest comment: 19 years ago by G.k. in topic Improvement by Marqardt

Should

(JTJ + λ)q = -JTf

be

(JTJ + λ I)q = -JTf. ?

The preceding unsigned comment was added by 129.13.70.88 (talk • contribs) on 08:18, 29 August 2005.

Both formulas have the same meaning, namely JTJq + λq = −JTf. As far as I am concerned, it is a matter of taste which you prefer, but the second is perhaps clearer. -- Jitse Niesen (talk) 20:46, 30 August 2005 (UTC)Reply

Transpose

Where f and p are introduced:

fT=(f1, ..., fm), and pT=(p1, ..., pn).

they are presented with superscript T, which I think indicates that they are transposed. But f extends over 1..m, and p extends over 1..n, so I don't understand why both are transposed. Can someone please explain what I am missing here? Thanks. --CarlManaster 22:49, 15 December 2005 (UTC)Reply

I think the reasoning is that (f1, ..., fm) is a row vector. However, we want f to be a column vector. So the transpose is used to convert from rows to columns. It is perhaps clearer to write
f=(f1, ..., fm)T, and p=(p1, ..., pn)T.
To be honest, many mathematical texts do not bother to distinguish between row and column vectors, hoping that the reader can deduce from the context which one is meant.
Perhaps you know this, and your question is: why do we want to use a column vector? Well, f has to be a column vector because fTf further down the article should be a inner product, and p has to be a column vector because it's added to q, which is a column vector because otherwise the product Jq does not make sense. -- Jitse Niesen (talk) 23:19, 15 December 2005 (UTC)Reply

Improvement by Marqardt

What's about the improvement Marquardt made to the algorithm? He replaced I by diag[H], i.e. the diagonal of the (approximated) Hessian, to incorporate some local curvature estimation. This makes the algorithm go further in directions of smaller gradient to get out of narrow valleys on the error surface.

last entry by G.k. 13:46, 27 April 2006 (UTC)Reply