Universal approximation theorem: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Add: arxiv, pages, bibcode. | Use this bot. Report bugs. | Suggested by Dominic3203 | #UCB_webform 65/199
Line 18:
 
=== Arbitrary width ===
The first examples were the ''arbitrary width'' case. [[George Cybenko]] in 1989 proved it for [[sigmoid function|sigmoid]] activation functions.<ref name="cyb">{{cite journal |citeseerx=10.1.1.441.7873 |doi=10.1007/BF02551274|title=Approximation by superpositions of a sigmoidal function|year=1989|last1=Cybenko|first1=G.|journal=Mathematics of Control, Signals, and Systems|volume=2|issue=4|pages=303–314|bibcode=1989MCSS....2..303C |s2cid=3958369}}</ref> {{ill|Kurt Hornik|de}}, Maxwell Stinchcombe, and [[Halbert White]] showed in 1989 that multilayer [[feed-forward network]]s with as few as one hidden layer are universal approximators.<ref name="MLP-UA" /> Hornik also showed in 1991<ref name="horn">{{Cite journal|doi=10.1016/0893-6080(91)90009-T|title=Approximation capabilities of multilayer feedforward networks|year=1991|last1=Hornik|first1=Kurt|journal=Neural Networks|volume=4|issue=2|pages=251–257|s2cid=7343126 }}</ref> that it is not the specific choice of the activation function but rather the multilayer feed-forward architecture itself that gives neural networks the potential of being universal approximators. Moshe Leshno ''et al'' in 1993<ref name="leshno">{{Cite journal|last1=Leshno|first1=Moshe|last2=Lin|first2=Vladimir Ya.|last3=Pinkus|first3=Allan|last4=Schocken|first4=Shimon|date=January 1993|title=Multilayer feedforward networks with a nonpolynomial activation function can approximate any function|journal=Neural Networks|volume=6|issue=6|pages=861–867|doi=10.1016/S0893-6080(05)80131-5|s2cid=206089312|url=http://archive.nyu.edu/handle/2451/14329 }}</ref> and later Allan Pinkus in 1999<ref name="pinkus">{{Cite journal|last=Pinkus|first=Allan|date=January 1999|title=Approximation theory of the MLP model in neural networks|journal=Acta Numerica|volume=8|pages=143–195|doi=10.1017/S0962492900002919|bibcode=1999AcNum...8..143P|s2cid=16800260 }}</ref> showed that the universal approximation property is equivalent to having a nonpolynomial activation function.
 
=== Arbitrary depth ===
The ''arbitrary depth'' case was also studied by a number of authors such as Gustaf Gripenberg in 2003,<ref name= gripenberg >{{Cite journal|last1=Gripenberg|first1=Gustaf|date=June 2003|title= Approximation by neural networks with a bounded number of nodes at each level|journal= Journal of Approximation Theory |volume=122|issue=2|pages=260–266|doi= 10.1016/S0021-9045(03)00078-9 |doi-access=}}</ref> Dmitry Yarotsky,<ref>{{cite journal |last1=Yarotsky |first1=Dmitry |title=Error bounds for approximations with deep ReLU networks |journal=Neural Networks |date=October 2017 |volume=94 |pages=103–114 |doi=10.1016/j.neunet.2017.07.002 |pmid=28756334 |arxiv=1610.01145 |s2cid=426133 }}</ref> Zhou Lu ''et al'' in 2017,<ref name="ZhouLu">{{cite journal |last1=Lu |first1=Zhou |last2=Pu |first2=Hongming |last3=Wang |first3=Feicheng |last4=Hu |first4=Zhiqiang |last5=Wang |first5=Liwei |title=The Expressive Power of Neural Networks: A View from the Width |journal=Advances in Neural Information Processing Systems |volume=30 |year=2017 |pages=6231–6239 |url=http://papers.nips.cc/paper/7203-the-expressive-power-of-neural-networks-a-view-from-the-width |publisher=Curran Associates |arxiv=1709.02540 }}</ref> Boris Hanin and Mark Sellke in 2018<ref name=hanin>{{cite arXiv |last1=Hanin|first1=Boris|last2=Sellke|first2=Mark|title=Approximating Continuous Functions by ReLU Nets of Minimal Width|eprint=1710.11278|class=stat.ML|date=2018}}</ref> who focused on neural networks with ReLU activation function. In 2020, Patrick Kidger and Terry Lyons<ref name=kidger>{{Cite conference|last1=Kidger|first1=Patrick|last2=Lyons|first2=Terry|date=July 2020|title=Universal Approximation with Deep Narrow Networks|arxiv=1905.08539|conference=Conference on Learning Theory}}</ref> extended those results to neural networks with ''general activation functions'' such, e.g. tanh, GeLU, or Swish.
 
One special case of arbitrary depth is that each composition component comes from a finite set of mappings. In 2024, Cai <ref name= cai2024 >{{Cite journal|last1=Yongqiang|first1=Cai|date=2024|title= Vocabulary for Universal Approximation: A Linguistic Perspective of Mapping Compositions|journal= ICML|pages=5189–5208 |arxiv=2305.12205 |url= https://proceedings.mlr.press/v235/cai24a.html}}</ref> constructed a finite set of mappings, named a vocabulary, such that any continuous function can be approximated by compositing a sequence from the vocabulary. This is similar to the concept of compositionality in linguistics, which is the idea that a finite vocabulary of basic elements can be combined via grammar to express an infinite range of meanings.
 
=== Bounded depth and bounded width ===