Revision as of 21:51, 25 July 2019 edit 64.58.145.95 (talk) added table of losses Tag: Visual edit ← Previous edit		Revision as of 22:43, 25 July 2019 edit undo Holydiver4 (talk \| contribs) 13 edits No edit summary Next edit →
Line 1: [[File:BayesConsistentLosses2.jpg\|thumb\|Bayes Consistent Losses: Zero-One Loss (gray), Savage Loss (green), Logistic Loss (orange), Exponential Loss (purple), Tangent Loss (brown), Square Loss (blue)]] [[File:Loss function surrogates.svg\|thumb\|Plot of various functions. Blue is the 0–1 indicator function. Green is the square loss function. Purple is the hinge loss function. Yellow is the logistic loss function. Note that all surrogates give a loss penalty of 1 for {{math\|''y''{{=}}''f''(''x''{{=}} 0) }}]] In [[machine learning]] and [[mathematical optimization]], '''loss functions for classification''' are computationally feasible [[loss functions]] representing the price paid for inaccuracy of predictions in [[statistical classification\|classification problem]]s (problems of identifying which category a particular observation belongs to).<ref name="mit">{{Cite journal \| last1 = Rosasco \| first1 = L. \| last2 = De Vito \| first2 = E. D. \| last3 = Caponnetto \| first3 = A. \| last4 = Piana \| first4 = M. \| last5 = Verri \| first5 = A. \| url = http://web.mit.edu/lrosasco/www/publications/loss.pdf\| title = Are Loss Functions All the Same? \| doi = 10.1162/089976604773135104 \| journal = Neural Computation \| volume = 16 \| issue = 5 \| pages = 1063–1076 \| year = 2004 \| pmid = 15070510\| pmc = \| citeseerx = 10.1.1.109.6786 }}</ref> Given <math>X</math> as the vector space of all possible inputs, and ''Y'' = {–1,1} as the vector space of all possible outputs, we wish to find a function <math>f: X \mapsto \mathbb{R}</math> which best maps <math>\vec{x}</math> to <math>y</math>.<ref name="penn">{{Citation \| last= Shen \| first= Yi \| title= Loss Functions For Binary Classification and Class Probability Estimation \| publisher= University of Pennsylvania \| year= 2005 \| url= http://stat.wharton.upenn.edu/~buja/PAPERS/yi-shen-dissertation.pdf \| accessdate= 6 December 2014}}</ref> However, because of incomplete information, noise in the measurement, or probabilistic components in the underlying process, it is possible for the same <math>\vec{x}</math> to generate different <math>y</math>.<ref name="mitlec">{{Citation \| last= Rosasco \| first= Lorenzo \| last2= Poggio \| first2= Tomaso \| title= A Regularization Tour of Machine Learning \| series= MIT-9.520 Lectures Notes \| volume= Manuscript \| year= 2014}}</ref> As a result, the goal of the learning problem is to minimize expected risk, defined as :<math>I[f] = \displaystyle \int_{X \times Y} V(f(\vec{x}),y) p(\vec{x},y) \, d\vec{x} \, dy</math>

Loss functions for classification: Difference between revisions