Multinomial logistic regression: Difference between revisions

Content deleted Content added
m As a set of independent binary regressions: change spacing for readability
OAbot (talk | contribs)
m Open access bot: doi added to citation with #oabot.
Line 103:
===Estimating the coefficients===
 
The unknown parameters in each vector ''β<sub>k</sub>'' are typically jointly estimated by [[maximum a posteriori]] (MAP) estimation, which is an extension of [[maximum likelihood]] using [[regularization (mathematics)|regularization]] of the weights to prevent pathological solutions (usually a squared regularizing function, which is equivalent to placing a zero-mean [[Gaussian distribution|Gaussian]] [[prior distribution]] on the weights, but other distributions are also possible). The solution is typically found using an iterative procedure such as [[generalized iterative scaling]],<ref>{{Cite journal |title=Generalized iterative scaling for log-linear models |author1=Darroch, J.N. |author2=Ratcliff, D. |lastauthoramp=yes |journal=The Annals of Mathematical Statistics |volume=43 |issue=5 |pages=1470–1480 |year=1972 |url=http://projecteuclid.org/download/pdf_1/euclid.aoms/1177692379 |doi=10.1214/aoms/1177692379|doi-access=free }}</ref> [[iteratively reweighted least squares]] (IRLS),<ref>{{cite book |first=Christopher M. |last=Bishop |year=2006 |title=Pattern Recognition and Machine Learning |publisher=Springer |pages=206–209}}</ref> by means of [[gradient-based optimization]] algorithms such as [[L-BFGS]],<ref name="malouf"/> or by specialized [[coordinate descent]] algorithms.<ref>{{cite journal |first1=Hsiang-Fu |last1=Yu |first2=Fang-Lan |last2=Huang |first3=Chih-Jen |last3=Lin |year=2011 |title=Dual coordinate descent methods for logistic regression and maximum entropy models |journal=Machine Learning |volume=85 |issue=1–2 |pages=41–75 |url=http://www.csie.ntu.edu.tw/~cjlin/papers/maxent_dual.pdf |doi=10.1007/s10994-010-5221-8}}</ref>
 
===As a log-linear model===