Computational learning theory: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 02:06, 29 October 2020 edit Irontitan76 (talk \| contribs) 254 edits No edit summary Tag: Reverted ← Previous edit		Latest revision as of 11:35, 20 August 2025 edit undo Jochen Burghardt (talk \| contribs) Extended confirmed users 24,305 edits →Overview: resolve {{cn}}
(48 intermediate revisions by 24 users not shown)
Line 1: {{see also\|Statistical learning theory}} {{Short description\|Theory of machine learning}}{{more citations needed\|date=November 2018}} {{Machine learning ~~bar~~\|Theory}} In [[computer science]], '''computational learning theory''' (or just '''learning theory''') is a subfield of [[artificial intelligence]] devoted to studying the design and analysis of [[machine learning]] algorithms.<ref name="ACL">{{Cite web \| url=http://www.learningtheory.org/ \| title=ACL - Association for Computational Learning}}</ref> ==Overview== Theoretical results in machine learning ~~mainly~~often ~~deal~~focus ~~with~~on a type of inductive learning ~~called~~known as [[supervised learning]]. In supervised learning, an algorithm is ~~given~~provided ~~samples~~with ~~that are~~[[Labeled data\|labeled]] ~~in some useful way~~samples. For ~~example~~instance, the samples might be descriptions of mushrooms, ~~and the~~with labels ~~could be~~indicating whether orthey ~~not~~are ~~the~~edible ~~mushrooms~~or ~~are edible~~not. The algorithm ~~takes~~uses these ~~previously~~ labeled samples ~~and uses them~~ to ~~induce~~create a classifier. This classifier ~~is a function that~~ assigns labels to new samples, including ~~samples~~those ~~that~~it ~~have~~has not ~~been seen~~ previously ~~by the algorithm~~encountered. The goal of the supervised learning algorithm is to optimize ~~some~~performance ~~measures of performance~~metrics, such as minimizing ~~the number of mistakes made~~errors on new samples. In addition to performance bounds, computational learning theory studies the time complexity and feasibility of learning.{{citation needed\|date=October 2017}} In Line 13: * Positive results{{spaced ndash}}Showing that a certain class of functions is learnable in polynomial time. * Negative results{{spaced ndash}}Showing that certain classes cannot be learned in polynomial time.<ref>{{Cite book \|last1=Kearns \|first1=Michael \|title=An Introduction to Computational Learning Theory \|last2=Vazirani \|first2=Umesh \|date=August 15, 1994 \|publisher=MIT Press \|isbn=978-0262111935}}</ref> Negative results often rely on commonly believed, but yet unproven assumptions,{{citation needed\|date=October 2017}} such as: * Computational complexity – [[P versus NP problem\|P ~~≠~~≠ NP (the P versus NP problem)]]; * [[cryptography\|Cryptographic]] – [[One-way function]]s exist. There are several different approaches to computational learning theory based on making different assumptions about the [[inference]] principles used to generalise from limited data. This includes different definitions of [[probability]] (see [[frequency probability]], [[Bayesian probability]]) and different assumptions on the generation of samples.{{citation needed\|date=October 2017}} The different approaches include: [[inference]] principles used to generalize from limited data. This includes different definitions of [[probability]] (see [[frequency probability]], [[Bayesian probability]]) and different assumptions on the generation of samples.{{citation needed\|date=October 2017}} The different approaches include:{{citation needed\|date=October 2017}} * Exact learning, proposed by [[Dana Angluin]];<ref>{{cite thesis \| type=Ph.D. thesis \| author=Dana Angluin \| title=An Application of the Theory of Computational Complexity to the Study of Inductive Inference \| institution=University of California at Berkeley \| year=1976 }}</ref><ref>{{cite journal \| url=http://www.sciencedirect.com/science/article/pii/S0019995878906836 \| author=D. Angluin \| title=On the Complexity of Minimum Inference of Regular Sets \| journal=Information and Control \| volume=39 \| number=3 \| pages=337–350 \| year=1978 }}</ref> * Exact learning, proposed by [[Dana Angluin]]; * [[Probably approximately correct learning]] (PAC learning), proposed by [[Leslie Valiant]];<ref>{{cite journal \|last1=Valiant \|first1=Leslie \|title=A Theory of the Learnable \|journal=Communications of the ACM \|date=1984 \|volume=27 \|issue=11 \|pages=1134–1142 \|doi=10.1145/1968.1972 \|s2cid=12837541 \|url=https://www.montefiore.ulg.ac.be/~geurts/Cours/AML/Readings/Valiant.pdf \|ref=ValTotL \|access-date=2022-11-24 \|archive-date=2019-05-17 \|archive-url=https://web.archive.org/web/20190517235548/http://www.montefiore.ulg.ac.be/~geurts/Cours/AML/Readings/Valiant.pdf \|url-status=dead }}</ref> * [[Probably approximately correct learning]] (PAC learning), proposed by [[Leslie Valiant]]; * [[VC theory]], proposed by [[Vladimir Vapnik]] and [[Alexey Chervonenkis]];<ref>{{cite journal \|last1=Vapnik \|first1=V. \|last2=Chervonenkis \|first2=A. \|title=On the uniform convergence of relative frequencies of events to their probabilities \|journal=Theory of Probability and Its Applications \|date=1971 \|volume=16 \|issue=2 \|pages=264–280 \|doi=10.1137/1116025 \|url=https://courses.engr.illinois.edu/ece544na/fa2014/vapnik71.pdf \|ref=VCdim}}</ref> * [[VC theory]], proposed by [[Vladimir Vapnik]] and [[Alexey Chervonenkis]]; * [[Solomonoff's theory of inductive inference\|Inductive inference]] as developed by [[Ray Solomonoff]];<ref>{{cite journal \|last1=Solomonoff \|first1=Ray \|title=A Formal Theory of Inductive Inference Part 1 \|journal=Information and Control \|date=March 1964 \|volume=7 \|issue=1 \|pages=1–22 \|doi=10.1016/S0019-9958(64)90223-2\|doi-access=free }}</ref><ref>{{cite journal \|last1=Solomonoff \|first1=Ray \|title=A Formal Theory of Inductive Inference Part 2 \|journal=Information and Control \|date=1964 \|volume=7 \|issue=2 \|pages=224–254 \|doi=10.1016/S0019-9958(64)90131-7}}</ref> * [[Bayesian inference]]; * [[Algorithmic learning theory]], from the work of [[E. Mark Gold]];<ref>{{Cite journal \| last1 = Gold \| first1 = E. Mark \| year = 1967 \| title = Language identification in the limit \| journal = Information and Control \| volume = 10 \| issue = 5 \| pages = 447–474 ~~\| jstor =~~ \| doi = 10.1016/S0019-9958(67)91165-5 \| url=http://web.mit.edu/~6.863/www/spring2009/readings/gold67limit.pdf \| ~~format~~doi-access = ~~\| accessdate =~~free }}</ref>▼ * [[Algorithmic learning theory]], from the work of [[E. Mark Gold]]; * [[Online machine learning]], from the work of Nick Littlestone{{citation needed\|date=October 2017}}. While its primary goal is to understand learning abstractly, computational learning theory has led to the development of practical algorithms. For example, PAC theory inspired [[Boosting (meta-algorithm)\|boosting]], VC theory led to [[support vector machine]]s, and Bayesian inference led to [[belief networks]]. ==See also== * [[Error ~~Tolerance~~tolerance (PAC learning)]]▼ * [[Grammar induction]] * [[Information theory]] * [[~~Category:Machine~~Occam learning]]▼ * [[Stability (learning theory)]] ▲* [[Error Tolerance (PAC learning)]] ==References== {{Reflist}} ==Further reading== A description of some of these publications is given at ~~[[list of important publications in computer science#Machine learning\|~~important publications in machine learning]].▼ ===Surveys=== * Angluin, D. 1992. Computational learning theory: Survey and selected bibliography. In Proceedings of the Twenty-Fourth Annual ACM Symposium on Theory of Computing (May 1992), pages 351–369. http://portal.acm.org/citation.cfm?id=129712.129746 * D. Haussler. Probably approximately correct learning. In AAAI-90 Proceedings of the Eight National Conference on Artificial Intelligence, Boston, MA, pages 1101–1108. American Association for Artificial Intelligence, 1990. http://citeseer.ist.psu.edu/haussler90probably.html ~~===[[VC dimension]]===~~ * V. Vapnik and A. Chervonenkis. [https://courses.engr.illinois.edu/ece544na/fa2014/vapnik71.pdf On the uniform convergence of relative frequencies of events to their probabilities]. Theory of Probability and Its Applications, 16(2):264–280, 1971. ===Feature selection=== * A. Dhagat and L. Hellerstein, "PAC learning with irrelevant attributes", in 'Proceedings of the IEEE Symp. on Foundation of Computer Science', 1994. http://citeseer.ist.psu.edu/dhagat94pac.html ~~===Inductive inference===~~ ▲* {{Cite journal \| last1 = Gold \| first1 = E. Mark \| year = 1967 \| title = Language identification in the limit \| journal = Information and Control \| volume = 10 \| issue = 5 \| pages = 447–474 \| jstor = \| doi = 10.1016/S0019-9958(67)91165-5 \| url=http://web.mit.edu/~6.863/www/spring2009/readings/gold67limit.pdf \| format = \| accessdate = }} ===Optimal O notation learning=== * [[Oded Goldreich]], [[Dana Ron]]. ''[http://www.~~eng~~wisdom.~~tau~~weizmann.ac.il/~~~danar~~oded/~~Public-pdf~~PS/ul.~~pdf~~ps On universal learning algorithms]''. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.2224 ===Negative results=== * M. Kearns and [[Leslie Valiant]]. 1989. Cryptographic limitations on learning boolean formulae and finite automata. In Proceedings of the 21st Annual ACM Symposium on Theory of Computing, pages 433–444, New York. ACM. http://citeseer.ist.psu.edu/kearns89cryptographic.html{{dl\|date=August 2024}} ~~===[[Boosting (machine learning)]]===~~ * Robert E. Schapire. The strength of weak learnability. Machine Learning, 5(2):197–227, 1990 http://citeseer.ist.psu.edu/schapire90strength.html ~~===[[Occam Learning]]===~~ * Blumer, A.; Ehrenfeucht, A.; Haussler, D.; Warmuth, M. K. [http://www.cse.buffalo.edu/~hungngo/classes/2008/694/papers/occam.pdf Occam's razor] Inf.Proc.Lett. 24, 377–380, 1987. * Blumer, A.; Ehrenfeucht, A.; Haussler, D.; Warmuth, M. K. [http://www.trhvidsten.com/docs/classics/Blumer-1989.pdf Learnability and the Vapnik-Chervonenkis dimension]. Journal of the ACM, 36(4):929–865, 1989. ~~===[[Probably approximately correct learning]]===~~ * L. Valiant. [http://www.montefiore.ulg.ac.be/~geurts/Cours/AML/Readings/Valiant.pdf A Theory of the Learnable]. Communications of the ACM, 27(11):1134–1142, 1984. ===Error tolerance=== Line 76 ⟶ 62: ===Equivalence=== * D.Haussler, M.Kearns, N.Littlestone and [[Manfred K. Warmuth\|M. Warmuth]], Equivalence of models for polynomial learnability, Proc. 1st ACM Workshop on Computational Learning Theory, (1988) 42-55. * {{Cite journal \| last1 = Pitt \| first1 = L. \| last2 = Warmuth \| first2 = M. K. \| year = 1990 \| title = Prediction-Preserving Reducibility \| journal = Journal of Computer and System Sciences \| volume = 41 \| issue = 3\| pages = 430–467 ~~\| jstor =~~ \| doi = 10.1016/0022-0000(90)90028-J \| doi-access = free }} ▲A description of some of these publications is given at [[list of important publications in computer science#Machine learning\|important publications in machine learning]]. ~~===[[Distribution learning theory\|Distribution Learning Theory]]===~~ ==External links== Line 88 ⟶ 70: [[Category:Computational learning theory\| ]] ~~[[Category:Theoretical computer science]]~~ ▲[[Category:Machine learning]] [[Category:Computational fields of study]]