Content deleted Content added
m formatting |
m →Probability calibration: HTTP to HTTPS for Cornell University |
||
(5 intermediate revisions by 5 users not shown) | |||
Line 24:
==Probability calibration==
{{Main article|Calibration (statistics)}}
Not all classification models are naturally probabilistic, and some that are, notably naive Bayes classifiers, [[decision tree learning|decision trees]] and [[Boosting (machine learning)|boosting]] methods, produce distorted class probability distributions.<ref name="Niculescu">{{cite conference |last1=Niculescu-Mizil |first1=Alexandru |first2=Rich |last2=Caruana |title=Predicting good probabilities with supervised learning |conference=ICML |year=2005 |url=http://machinelearning.wustl.edu/mlpapers/paper_files/icml2005_Niculescu-MizilC05.pdf |doi=10.1145/1102351.1102430 |url-status=dead |archive-url=https://web.archive.org/web/20140311005243/http://machinelearning.wustl.edu/mlpapers/paper_files/icml2005_Niculescu-MizilC05.pdf |archive-date=2014-03-11 }}</ref> In the case of decision trees, where {{math|Pr(''y''{{!}}'''x''')}} is the proportion of training samples with label {{mvar|y}} in the leaf where {{math|'''x'''}} ends up, these distortions come about because learning algorithms such as [[C4.5]] or [[Predictive analytics#Classification and regression trees|CART]] explicitly aim to produce homogeneous leaves (giving probabilities close to zero or one, and thus high [[Bias of an estimator|bias]]) while using few samples to estimate the relevant proportion (high [[Bias–variance tradeoff|variance]]).<ref>{{cite conference |first1=Bianca |last1=Zadrozny |first2=Charles |last2=Elkan |title=Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers |url=http://cseweb.ucsd.edu/~elkan/calibrated.pdf |year=2001 |conference=ICML |pages=609–616}}</ref>
Line 29 ⟶ 30:
For the [[binary classification|binary]] case, a common approach is to apply [[Platt scaling]], which learns a [[logistic regression]] model on the scores.<ref name="platt99">{{cite journal |last=Platt |first=John |author-link=John Platt (computer scientist) |title=Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods |journal=Advances in Large Margin Classifiers |volume=10 |issue=3 |year=1999 |pages=61–74 |url=https://www.researchgate.net/publication/2594015}}</ref>
An alternative method using [[isotonic regression]]<ref>{{Cite book | last1 = Zadrozny | first1 = Bianca| last2 = Elkan | first2 = Charles| doi = 10.1145/775047.775151 | chapter = Transforming classifier scores into accurate multiclass probability estimates | chapter-url =
In the [[multiclass classification|multiclass]] case, one can use a reduction to binary tasks, followed by univariate calibration with an algorithm as described above and further application of the pairwise coupling algorithm by Hastie and Tibshirani.<ref>{{Cite journal | last1 = Hastie | first1 = Trevor| last2 = Tibshirani | first2 = Robert| doi = 10.1214/aos/1028144844 | title = Classification by pairwise coupling | journal = [[The Annals of Statistics]] | volume = 26 | issue = 2 | pages = 451–471| year = 1998 | zbl = 0932.62071| id = [[CiteSeerX]]: {{URL|1=citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.6032|2=10.1.1.46.6032}}| citeseerx = 10.1.1.309.4720 }}</ref>
|