Computational learning theory: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 11:29, 30 July 2023 edit Skyerise (talk \| contribs) Extended confirmed users, Pending changes reviewers, Rollbackers 149,364 edits →See also: sort ← Previous edit		Latest revision as of 11:35, 20 August 2025 edit undo Jochen Burghardt (talk \| contribs) Extended confirmed users 24,305 edits →Overview: resolve {{cn}}
(18 intermediate revisions by 16 users not shown)
Line 6: ==Overview== Theoretical results in machine learning ~~mainly~~often ~~deal~~focus ~~with~~on a type of inductive learning ~~called~~known as [[supervised learning]]. In supervised learning, an algorithm is ~~given~~provided ~~samples~~with ~~that are~~[[Labeled data\|labeled]] ~~in some useful way~~samples. For ~~example~~instance, the samples might be descriptions of mushrooms, ~~and the~~with labels ~~could be~~indicating whether orthey ~~not~~are ~~the~~edible ~~mushrooms~~or ~~are edible~~not. The algorithm ~~takes~~uses these ~~previously~~ labeled samples ~~and uses them~~ to ~~induce~~create a classifier. This classifier ~~is a function that~~ assigns labels to new samples, including ~~samples~~those ~~that~~it ~~have~~has not ~~been seen~~ previously ~~by the algorithm~~encountered. The goal of the supervised learning algorithm is to optimize ~~some~~performance ~~measure of performance~~metrics, such as minimizing ~~the number of mistakes made~~errors on new samples. In addition to performance bounds, computational learning theory studies the time complexity and feasibility of learning.{{citation needed\|date=October 2017}} In Line 13: * Positive results{{spaced ndash}}Showing that a certain class of functions is learnable in polynomial time. * Negative results{{spaced ndash}}Showing that certain classes cannot be learned in polynomial time.<ref>{{Cite book \|last1=Kearns \|first1=Michael \|title=An Introduction to Computational Learning Theory \|last2=Vazirani \|first2=Umesh \|date=August 15, 1994 \|publisher=MIT Press \|isbn=978-0262111935}}</ref> Negative results often rely on commonly believed, but yet unproven assumptions,{{citation needed\|date=October 2017}} such as: Line 22: There are several different approaches to computational learning theory based on making different assumptions about the [[inference]] principles used to generalise from limited data. This includes different definitions of [[probability]] (see [[frequency probability]], [[Bayesian probability]]) and different assumptions on the generation of samples.{{citation needed\|date=October 2017}} The different approaches include: * Exact learning, proposed by [[Dana Angluin]];<ref>{{cite thesis \| type=Ph.D. thesis \| author=Dana Angluin \| title=An Application of the Theory of Computational Complexity to the Study of Inductive Inference \| institution=University of California at Berkeley \| year=1976 }}</ref><ref>{{cite journal \| url=http://www.sciencedirect.com/science/article/pii/S0019995878906836 \| author=D. Angluin \| title=On the Complexity of Minimum Inference of Regular Sets \| journal=Information and Control \| volume=39 \| number=3 \| pages=337–350 \| year=1978 }}</ref> * Exact learning, proposed by [[Dana Angluin]]{{citation needed\|date=October 2017}}; * [[Probably approximately correct learning]] (PAC learning), proposed by [[Leslie Valiant]];<ref>{{cite journal \|last1=Valiant \|first1=Leslie \|title=A Theory of the Learnable \|journal=Communications of the ACM \|date=1984 \|volume=27 \|issue=11 \|pages=1134–1142 \|doi=10.1145/1968.1972 \|s2cid=12837541 \|url=https://www.montefiore.ulg.ac.be/~geurts/Cours/AML/Readings/Valiant.pdf \|ref=ValTotL \|access-date=2022-11-24 \|archive-date=2019-05-17 \|archive-url=https://web.archive.org/web/20190517235548/http://www.montefiore.ulg.ac.be/~geurts/Cours/AML/Readings/Valiant.pdf \|url-status=dead }}</ref> * [[VC theory]], proposed by [[Vladimir Vapnik]] and [[Alexey Chervonenkis]];<ref>{{cite journal \|last1=Vapnik \|first1=V. \|last2=Chervonenkis \|first2=A. \|title=On the uniform convergence of relative frequencies of events to their probabilities \|journal=Theory of Probability and Its Applications \|date=1971 \|volume=16 \|issue=2 \|pages=264–280 \|doi=10.1137/1116025 \|url=https://courses.engr.illinois.edu/ece544na/fa2014/vapnik71.pdf \|ref=VCdim}}</ref> * [[Solomonoff's theory of inductive inference\|Inductive inference]] as developed by [[Ray Solomonoff]];<ref>{{cite journal \|last1=Solomonoff \|first1=Ray \|title=A Formal Theory of Inductive Inference Part 1 \|journal=Information and Control \|date=March 1964 \|volume=7 \|issue=1 \|pages=~~1-22~~1–22 \|doi=10.1016/S0019-9958(64)90223-2\|doi-access=free }}</ref><ref>{{cite journal \|last1=Solomonoff \|first1=Ray \|title=A Formal Theory of Inductive Inference Part 2 \|journal=Information and Control \|date=1964 \|volume=7 \|issue=2 \|pages=~~224-254~~224–254 \|doi=10.1016/S0019-9958(64)90131-7}}</ref> * [[Algorithmic learning theory]], from the work of [[E. Mark Gold]];<ref>{{Cite journal \| last1 = Gold \| first1 = E. Mark \| year = 1967 \| title = Language identification in the limit \| journal = Information and Control \| volume = 10 \| issue = 5 \| pages = 447–474 \| doi = 10.1016/S0019-9958(67)91165-5 \| url=http://web.mit.edu/~6.863/www/spring2009/readings/gold67limit.pdf \| doi-access = free }}</ref> * [[Online machine learning]], from the work of Nick Littlestone{{citation needed\|date=October 2017}}. Line 35: * [[Grammar induction]] * [[Information theory]] * [[Occam learning]] * [[Stability (learning theory)]] Line 41 ⟶ 42: ==Further reading== A description of some of these publications is given at ~~[[list of important publications in computer science#Machine learning\|~~important publications in machine learning]].▼ ===Surveys=== * Angluin, D. 1992. Computational learning theory: Survey and selected bibliography. In Proceedings of the Twenty-Fourth Annual ACM Symposium on Theory of Computing (May 1992), pages 351–369. http://portal.acm.org/citation.cfm?id=129712.129746 Line 52 ⟶ 54: ===Negative results=== * M. Kearns and [[Leslie Valiant]]. 1989. Cryptographic limitations on learning boolean formulae and finite automata. In Proceedings of the 21st Annual ACM Symposium on Theory of Computing, pages 433–444, New York. ACM. http://citeseer.ist.psu.edu/kearns89cryptographic.html{{dl\|date=August 2024}} ===Error tolerance=== Line 61 ⟶ 63: * D.Haussler, M.Kearns, N.Littlestone and [[Manfred K. Warmuth\|M. Warmuth]], Equivalence of models for polynomial learnability, Proc. 1st ACM Workshop on Computational Learning Theory, (1988) 42-55. * {{Cite journal \| last1 = Pitt \| first1 = L. \| last2 = Warmuth \| first2 = M. K. \| year = 1990 \| title = Prediction-Preserving Reducibility \| journal = Journal of Computer and System Sciences \| volume = 41 \| issue = 3\| pages = 430–467 \| doi = 10.1016/0022-0000(90)90028-J \| doi-access = free }} ▲A description of some of these publications is given at [[list of important publications in computer science#Machine learning\|important publications in machine learning]]. ==External links==