Probabilistic classification: Difference between revisions

Content deleted Content added
m Reverted edits by 186.223.223.194 (talk) (HG) (3.1.21)
No edit summary
Tags: references removed Visual edit
Line 1:
{{machine learning bar}}
In [[machine learning]], a '''probabilistic classifier''' is a [[statistical classification|classifier]] that is able to predict, given a sample input, a [[probability distribution]] over a [[Set (mathematics)|set]] of classes, rather than only outputting the most likely class that the sample should belong to. Probabilistic classifiers provide classification with a degree of certainty, which can be useful in its own right,<ref>{{cite book |first1=Trevor |last1=Hastie |first2=Robert |last2=Tibshirani |first3=Jerome |last3=Friedman |year=2009 |title=The Elements of Statistical Learning |url=http://statweb.stanford.edu/~tibs/ElemStatLearn/ |page=348 |quote=[I]n [[data mining]] applications the interest is often more in the class probabilities <math>p_\ell(x), \ell = 1, \dots, K</math> themselves, rather than in performing a class assignment.}}</ref> or when combining classifiers into [[ensemble classifier|ensembles]].
 
==Types of classification==
Line 28 ⟶ 27:
 
For the [[binary classification|binary]] case, a common approach is to apply [[Platt scaling]], which learns a [[logistic regression]] model on the scores.<ref name="platt99">{{cite journal |last=Platt |first=John |authorlink=John Platt (computer scientist) |title=Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods |journal=Advances in large margin classifiers |volume=10 |issue=3 |year=1999 |pages=61–74 |url=http://www.researchgate.net/publication/2594015_Probabilistic_Outputs_for_Support_Vector_Machines_and_Comparisons_to_Regularized_Likelihood_Methods/file/504635154cff5262d6.pdf}}</ref>
An alternative method using [[isotonic regression]]<ref>{{Cite book | last1 = Zadrozny | first1 = Bianca| last2 = Elkan | first2 = Charles| doi = 10.1145/775047.775151 | chapter = Transforming classifier scores into accurate multiclass probability estimates | url = http://www.cs.cornell.edu/courses/cs678/2007sp/ZadroznyElkan.pdf| title = Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02 | pages = 694–699| year = 2002 | isbn = 1-58113-567-X| pmid = | pmc = | id = [[CiteSeerX]]: {{URL|1=citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.7457|2=10.1.1.13.7457}}}}</ref> is generally superior to Platt's method when sufficient training data is available.<ref name="Niculescu"/>r
 
In the [[multiclass classification|multiclass]] case, one can use a reduction to binary tasks, followed by univariate calibration with an algorithm as described above and further application of the pairwise coupling algorithm by Hastie and Tibshirani.<ref>{{Cite journal | last1 = Hastie | first1 = Trevor| last2 = Tibshirani | first2 = Robert| doi = 10.1214/aos/1028144844 | title = Classification by pairwise coupling | journal = [[The Annals of Statistics]] | volume = 26 | issue = 2 | pages = 451–471| year = 1998 | pmid = | pmc = | zbl = 0932.62071| id = [[CiteSeerX]]: {{URL|1=citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.6032|2=10.1.1.46.6032}}}}</ref>