Loss functions for classification: Difference between revisions

Content deleted Content added
Eudamonic (talk | contribs)
added Differentiable computing navbox
update URL for Piyush PDF since old URL broken
Line 36:
 
where <math>H</math> indicates the [[Heaviside step function]].
However, this loss function is non-convex and non-smooth, and solving for the optimal solution is an [[NP-hard]] combinatorial optimization problem.<ref name="Utah">{{Citation | last= Piyush | first= Rai | title= Support Vector Machines (Contd.), Classification Loss Functions and Regularizers | publisher= Utah CS5350/6350: Machine Learning | date= 13 September 2011 | url= httphttps://wwwcis.cs.utahtemple.edu/~piyushlatecki/teachingCourses/13AI-9-printFall12/Lectures/SVM.pdf | access-date= 64 DecemberMay 20142021}}</ref> As a result, it is better to substitute '''loss function surrogates''' which are tractable for commonly used learning algorithms, as they have convenient properties such as being convex and smooth. In addition to their computational tractability, one can show that the solutions to the learning problem using these loss surrogates allow for the recovery of the actual solution to the original classification problem.<ref name="uci">{{Citation | last= Ramanan | first= Deva | title= Lecture 14 | publisher= UCI ICS273A: Machine Learning | date= 27 February 2008 | url= http://www.ics.uci.edu/~dramanan/teaching/ics273a_winter08/lectures/lecture14.pdf | access-date= 6 December 2014}}</ref> Some of these surrogates are described below.
 
In practice, the probability distribution <math>p(\vec{x},y)</math> is unknown. Consequently, utilizing a training set of <math>n</math> [[iid|independently and identically distributed]] sample points