Probabilistic classification: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 00:13, 26 December 2023 edit Mazewaxie (talk \| contribs) Extended confirmed users, Pending changes reviewers, Rollbackers 113,751 edits m formatting Tag: AWB ← Previous edit		Latest revision as of 17:26, 19 August 2025 edit undo Bender the Bot (talk \| contribs) Bots 1,064,377 edits m →Probability calibration: HTTP to HTTPS for Cornell University Tag: AWB
(5 intermediate revisions by 5 users not shown)
Line 24: ==Probability calibration== {{Main article\|Calibration (statistics)}} Not all classification models are naturally probabilistic, and some that are, notably naive Bayes classifiers, [[decision tree learning\|decision trees]] and [[Boosting (machine learning)\|boosting]] methods, produce distorted class probability distributions.<ref name="Niculescu">{{cite conference \|last1=Niculescu-Mizil \|first1=Alexandru \|first2=Rich \|last2=Caruana \|title=Predicting good probabilities with supervised learning \|conference=ICML \|year=2005 \|url=http://machinelearning.wustl.edu/mlpapers/paper_files/icml2005_Niculescu-MizilC05.pdf \|doi=10.1145/1102351.1102430 \|url-status=dead \|archive-url=https://web.archive.org/web/20140311005243/http://machinelearning.wustl.edu/mlpapers/paper_files/icml2005_Niculescu-MizilC05.pdf \|archive-date=2014-03-11 }}</ref> In the case of decision trees, where {{math\|Pr(''y''{{!}}'''x''')}} is the proportion of training samples with label {{mvar\|y}} in the leaf where {{math\|'''x'''}} ends up, these distortions come about because learning algorithms such as [[C4.5]] or [[Predictive analytics#Classification and regression trees\|CART]] explicitly aim to produce homogeneous leaves (giving probabilities close to zero or one, and thus high [[Bias of an estimator\|bias]]) while using few samples to estimate the relevant proportion (high [[Bias–variance tradeoff\|variance]]).<ref>{{cite conference \|first1=Bianca \|last1=Zadrozny \|first2=Charles \|last2=Elkan \|title=Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers \|url=http://cseweb.ucsd.edu/~elkan/calibrated.pdf \|year=2001 \|conference=ICML \|pages=609–616}}</ref> Line 29 ⟶ 30: For the [[binary classification\|binary]] case, a common approach is to apply [[Platt scaling]], which learns a [[logistic regression]] model on the scores.<ref name="platt99">{{cite journal \|last=Platt \|first=John \|author-link=John Platt (computer scientist) \|title=Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods \|journal=Advances in Large Margin Classifiers \|volume=10 \|issue=3 \|year=1999 \|pages=61–74 \|url=https://www.researchgate.net/publication/2594015}}</ref> An alternative method using [[isotonic regression]]<ref>{{Cite book \| last1 = Zadrozny \| first1 = Bianca\| last2 = Elkan \| first2 = Charles\| doi = 10.1145/775047.775151 \| chapter = Transforming classifier scores into accurate multiclass probability estimates \| chapter-url = ~~http~~https://www.cs.cornell.edu/courses/cs678/2007sp/ZadroznyElkan.pdf\| title = Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02 \| pages = 694–699\| year = 2002 \| isbn = 978-1-58113-567-1\| id = [[CiteSeerX]]: {{URL\|1=citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.7457\|2=10.1.1.13.7457}}\| citeseerx = 10.1.1.164.8140\| s2cid = 3349576}}</ref> is generally superior to Platt's method when sufficient training data is available.<ref name="Niculescu"/> In the [[multiclass classification\|multiclass]] case, one can use a reduction to binary tasks, followed by univariate calibration with an algorithm as described above and further application of the pairwise coupling algorithm by Hastie and Tibshirani.<ref>{{Cite journal \| last1 = Hastie \| first1 = Trevor\| last2 = Tibshirani \| first2 = Robert\| doi = 10.1214/aos/1028144844 \| title = Classification by pairwise coupling \| journal = [[The Annals of Statistics]] \| volume = 26 \| issue = 2 \| pages = 451–471\| year = 1998 \| zbl = 0932.62071\| id = [[CiteSeerX]]: {{URL\|1=citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.6032\|2=10.1.1.46.6032}}\| citeseerx = 10.1.1.309.4720 }}</ref>