Revision as of 13:25, 11 April 2020 edit Gciriani (talk \| contribs) Extended confirmed users 2,358 edits m →As a set of independent binary regressions: change spacing for readability ← Previous edit		Revision as of 05:34, 18 April 2020 edit undo OAbot (talk \| contribs) Bots 643,717 edits m Open access bot: doi added to citation with #oabot. Next edit →
Line 103: ===Estimating the coefficients=== The unknown parameters in each vector ''β<sub>k</sub>'' are typically jointly estimated by [[maximum a posteriori]] (MAP) estimation, which is an extension of [[maximum likelihood]] using [[regularization (mathematics)\|regularization]] of the weights to prevent pathological solutions (usually a squared regularizing function, which is equivalent to placing a zero-mean [[Gaussian distribution\|Gaussian]] [[prior distribution]] on the weights, but other distributions are also possible). The solution is typically found using an iterative procedure such as [[generalized iterative scaling]],<ref>{{Cite journal \|title=Generalized iterative scaling for log-linear models \|author1=Darroch, J.N. \|author2=Ratcliff, D. \|lastauthoramp=yes \|journal=The Annals of Mathematical Statistics \|volume=43 \|issue=5 \|pages=1470–1480 \|year=1972 \|url=http://projecteuclid.org/download/pdf_1/euclid.aoms/1177692379 \|doi=10.1214/aoms/1177692379\|doi-access=free }}</ref> [[iteratively reweighted least squares]] (IRLS),<ref>{{cite book \|first=Christopher M. \|last=Bishop \|year=2006 \|title=Pattern Recognition and Machine Learning \|publisher=Springer \|pages=206–209}}</ref> by means of [[gradient-based optimization]] algorithms such as [[L-BFGS]],<ref name="malouf"/> or by specialized [[coordinate descent]] algorithms.<ref>{{cite journal \|first1=Hsiang-Fu \|last1=Yu \|first2=Fang-Lan \|last2=Huang \|first3=Chih-Jen \|last3=Lin \|year=2011 \|title=Dual coordinate descent methods for logistic regression and maximum entropy models \|journal=Machine Learning \|volume=85 \|issue=1–2 \|pages=41–75 \|url=http://www.csie.ntu.edu.tw/~cjlin/papers/maxent_dual.pdf \|doi=10.1007/s10994-010-5221-8}}</ref> ===As a log-linear model===

Multinomial logistic regression: Difference between revisions