Revision as of 20:38, 15 September 2024 edit Bender235 (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers, Rollbackers, Template editors 472,799 edits →Kolmogorov network: rewritten ← Previous edit		Revision as of 04:39, 9 October 2024 edit undo Citation bot (talk \| contribs) Bots 5,862,969 edits Add: arxiv, pages, bibcode. \| Use this bot. Report bugs. \| Suggested by Dominic3203 \| #UCB_webform 65/199 Next edit →
Line 18: === Arbitrary width === The first examples were the ''arbitrary width'' case. [[George Cybenko]] in 1989 proved it for [[sigmoid function\|sigmoid]] activation functions.<ref name="cyb">{{cite journal \|citeseerx=10.1.1.441.7873 \|doi=10.1007/BF02551274\|title=Approximation by superpositions of a sigmoidal function\|year=1989\|last1=Cybenko\|first1=G.\|journal=Mathematics of Control, Signals, and Systems\|volume=2\|issue=4\|pages=303–314\|bibcode=1989MCSS....2..303C \|s2cid=3958369}}</ref> {{ill\|Kurt Hornik\|de}}, Maxwell Stinchcombe, and [[Halbert White]] showed in 1989 that multilayer [[feed-forward network]]s with as few as one hidden layer are universal approximators.<ref name="MLP-UA" /> Hornik also showed in 1991<ref name="horn">{{Cite journal\|doi=10.1016/0893-6080(91)90009-T\|title=Approximation capabilities of multilayer feedforward networks\|year=1991\|last1=Hornik\|first1=Kurt\|journal=Neural Networks\|volume=4\|issue=2\|pages=251–257\|s2cid=7343126 }}</ref> that it is not the specific choice of the activation function but rather the multilayer feed-forward architecture itself that gives neural networks the potential of being universal approximators. Moshe Leshno ''et al'' in 1993<ref name="leshno">{{Cite journal\|last1=Leshno\|first1=Moshe\|last2=Lin\|first2=Vladimir Ya.\|last3=Pinkus\|first3=Allan\|last4=Schocken\|first4=Shimon\|date=January 1993\|title=Multilayer feedforward networks with a nonpolynomial activation function can approximate any function\|journal=Neural Networks\|volume=6\|issue=6\|pages=861–867\|doi=10.1016/S0893-6080(05)80131-5\|s2cid=206089312\|url=http://archive.nyu.edu/handle/2451/14329 }}</ref> and later Allan Pinkus in 1999<ref name="pinkus">{{Cite journal\|last=Pinkus\|first=Allan\|date=January 1999\|title=Approximation theory of the MLP model in neural networks\|journal=Acta Numerica\|volume=8\|pages=143–195\|doi=10.1017/S0962492900002919\|bibcode=1999AcNum...8..143P\|s2cid=16800260 }}</ref> showed that the universal approximation property is equivalent to having a nonpolynomial activation function. === Arbitrary depth === The ''arbitrary depth'' case was also studied by a number of authors such as Gustaf Gripenberg in 2003,<ref name= gripenberg >{{Cite journal\|last1=Gripenberg\|first1=Gustaf\|date=June 2003\|title= Approximation by neural networks with a bounded number of nodes at each level\|journal= Journal of Approximation Theory \|volume=122\|issue=2\|pages=260–266\|doi= 10.1016/S0021-9045(03)00078-9 \|doi-access=}}</ref> Dmitry Yarotsky,<ref>{{cite journal \|last1=Yarotsky \|first1=Dmitry \|title=Error bounds for approximations with deep ReLU networks \|journal=Neural Networks \|date=October 2017 \|volume=94 \|pages=103–114 \|doi=10.1016/j.neunet.2017.07.002 \|pmid=28756334 \|arxiv=1610.01145 \|s2cid=426133 }}</ref> Zhou Lu ''et al'' in 2017,<ref name="ZhouLu">{{cite journal \|last1=Lu \|first1=Zhou \|last2=Pu \|first2=Hongming \|last3=Wang \|first3=Feicheng \|last4=Hu \|first4=Zhiqiang \|last5=Wang \|first5=Liwei \|title=The Expressive Power of Neural Networks: A View from the Width \|journal=Advances in Neural Information Processing Systems \|volume=30 \|year=2017 \|pages=6231–6239 \|url=http://papers.nips.cc/paper/7203-the-expressive-power-of-neural-networks-a-view-from-the-width \|publisher=Curran Associates \|arxiv=1709.02540 }}</ref> Boris Hanin and Mark Sellke in 2018<ref name=hanin>{{cite arXiv \|last1=Hanin\|first1=Boris\|last2=Sellke\|first2=Mark\|title=Approximating Continuous Functions by ReLU Nets of Minimal Width\|eprint=1710.11278\|class=stat.ML\|date=2018}}</ref> who focused on neural networks with ReLU activation function. In 2020, Patrick Kidger and Terry Lyons<ref name=kidger>{{Cite conference\|last1=Kidger\|first1=Patrick\|last2=Lyons\|first2=Terry\|date=July 2020\|title=Universal Approximation with Deep Narrow Networks\|arxiv=1905.08539\|conference=Conference on Learning Theory}}</ref> extended those results to neural networks with ''general activation functions'' such, e.g. tanh, GeLU, or Swish. One special case of arbitrary depth is that each composition component comes from a finite set of mappings. In 2024, Cai <ref name= cai2024 >{{Cite journal\|last1=Yongqiang\|first1=Cai\|date=2024\|title= Vocabulary for Universal Approximation: A Linguistic Perspective of Mapping Compositions\|journal= ICML\|pages=5189–5208 \|arxiv=2305.12205 \|url= https://proceedings.mlr.press/v235/cai24a.html}}</ref> constructed a finite set of mappings, named a vocabulary, such that any continuous function can be approximated by compositing a sequence from the vocabulary. This is similar to the concept of compositionality in linguistics, which is the idea that a finite vocabulary of basic elements can be combined via grammar to express an infinite range of meanings. === Bounded depth and bounded width ===

Universal approximation theorem: Difference between revisions