Revision as of 19:39, 15 July 2024 edit An Yongle (talk \| contribs) 206 edits →Arbitrary-width case: Even though Cybenko affirms he uses the "Riesz Representation theorem" (which applies only to Hilbert spaces) in his 1989 paper, he actually uses its generalized version, the Riesz–Markov–Kakutani representation theorem-- which applies to the Banach spaces he considers. ← Previous edit		Revision as of 08:58, 30 July 2024 edit undo 2404:c140:1f03:4:832:24ff:fe8f:c783 (talk) →Arbitrary depth Next edit →
Line 22: === Arbitrary depth === The ''arbitrary depth'' case was also studied by a number of authors such as Gustaf Gripenberg in 2003,<ref name= gripenberg >{{Cite journal\|last1=Gripenberg\|first1=Gustaf\|date=June 2003\|title= Approximation by neural networks with a bounded number of nodes at each level\|journal= Journal of Approximation Theory \|volume=122\|issue=2\|pages=260–266\|doi= 10.1016/S0021-9045(03)00078-9 \|doi-access=}}</ref> Dmitry Yarotsky,<ref>{{cite journal \|last1=Yarotsky \|first1=Dmitry \|title=Error bounds for approximations with deep ReLU networks \|journal=Neural Networks \|date=October 2017 \|volume=94 \|pages=103–114 \|doi=10.1016/j.neunet.2017.07.002 \|pmid=28756334 \|arxiv=1610.01145 \|s2cid=426133 }}</ref> Zhou Lu ''et al'' in 2017,<ref name="ZhouLu">{{cite journal \|last1=Lu \|first1=Zhou \|last2=Pu \|first2=Hongming \|last3=Wang \|first3=Feicheng \|last4=Hu \|first4=Zhiqiang \|last5=Wang \|first5=Liwei \|title=The Expressive Power of Neural Networks: A View from the Width \|journal=Advances in Neural Information Processing Systems \|volume=30 \|year=2017 \|pages=6231–6239 \|url=http://papers.nips.cc/paper/7203-the-expressive-power-of-neural-networks-a-view-from-the-width \|publisher=Curran Associates \|arxiv=1709.02540 }}</ref> Boris Hanin and Mark Sellke in 2018<ref name=hanin>{{cite arXiv \|last1=Hanin\|first1=Boris\|last2=Sellke\|first2=Mark\|title=Approximating Continuous Functions by ReLU Nets of Minimal Width\|eprint=1710.11278\|class=stat.ML\|date=2018}}</ref> who focused on neural networks with ReLU activation function. In 2020, Patrick Kidger and Terry Lyons<ref name=kidger>{{Cite conference\|last1=Kidger\|first1=Patrick\|last2=Lyons\|first2=Terry\|date=July 2020\|title=Universal Approximation with Deep Narrow Networks\|arxiv=1905.08539\|conference=Conference on Learning Theory}}</ref> extended those results to neural networks with ''general activation functions'' such, e.g. tanh, GeLU, or Swish. One special case of arbitrary depth is that each composition component comes from a finite set of mappings. In 2024, Cai <ref name= cai2024 >{{Cite journal\|last1=Yongqiang\|first1=Cai\|date=2024\|title= Vocabulary for Universal Approximation: A Linguistic Perspective of Mapping Compositions\|journal= ICML\|url= https://proceedings.mlr.press/v235/cai24a.html}}</ref> constructed a finite set of mappings, named a vocabulary, such that any continuous function can be approximated by compositing a sequence from the vocabulary. This is similar to the concept of compositionality in linguistics, which is the idea that a finite vocabulary of basic elements can be combined via grammar to express an infinite range of meanings. === Bounded depth and bounded width ===

Universal approximation theorem: Difference between revisions