Revision as of 22:59, 14 April 2024 edit Hellacioussatyr (talk \| contribs) Extended confirmed users 10,148 edits →Bounded depth and bounded width case Tags: Mobile edit Mobile web edit Advanced mobile edit ← Previous edit		Revision as of 12:07, 21 April 2024 edit undo Saung Tadashi (talk \| contribs) Extended confirmed users 2,083 edits →History: Add errata Tag: Visual edit Next edit →
Line 19: The ''arbitrary depth'' case was also studied by a number of authors such as Gustaf Gripenberg in 2003,<ref name= gripenberg >{{Cite journal\|last1=Gripenberg\|first1=Gustaf\|date=June 2003\|title= Approximation by neural networks with a bounded number of nodes at each level\|journal= Journal of Approximation Theory \|volume=122\|issue=2\|pages=260–266\|doi= 10.1016/S0021-9045(03)00078-9 \|doi-access=}}</ref> Dmitry Yarotsky,<ref>{{cite journal \|last1=Yarotsky \|first1=Dmitry \|title=Error bounds for approximations with deep ReLU networks \|journal=Neural Networks \|date=October 2017 \|volume=94 \|pages=103–114 \|doi=10.1016/j.neunet.2017.07.002 \|pmid=28756334 \|arxiv=1610.01145 \|s2cid=426133 }}</ref> Zhou Lu ''et al'' in 2017,<ref name="ZhouLu">{{cite journal \|last1=Lu \|first1=Zhou \|last2=Pu \|first2=Hongming \|last3=Wang \|first3=Feicheng \|last4=Hu \|first4=Zhiqiang \|last5=Wang \|first5=Liwei \|title=The Expressive Power of Neural Networks: A View from the Width \|journal=Advances in Neural Information Processing Systems \|volume=30 \|year=2017 \|pages=6231–6239 \|url=http://papers.nips.cc/paper/7203-the-expressive-power-of-neural-networks-a-view-from-the-width \|publisher=Curran Associates \|arxiv=1709.02540 }}</ref> Boris Hanin and Mark Sellke in 2018<ref name=hanin>{{cite arXiv \|last1=Hanin\|first1=Boris\|last2=Sellke\|first2=Mark\|title=Approximating Continuous Functions by ReLU Nets of Minimal Width\|eprint=1710.11278\|class=stat.ML\|date=2018}}</ref> who focused on neural networks with ReLU activation function. In 2020, Patrick Kidger and Terry Lyons<ref name=kidger>{{Cite conference\|last1=Kidger\|first1=Patrick\|last2=Lyons\|first2=Terry\|date=July 2020\|title=Universal Approximation with Deep Narrow Networks\|arxiv=1905.08539\|conference=Conference on Learning Theory}}</ref> extended those results to neural networks with ''general activation functions'' such, e.g. tanh, GeLU, or Swish, and in 2022, their result was made quantitative by Leonie Papon and Anastasis Kratsios<ref name="jmlr.org">{{Cite journal \|last1=Kratsios \|first1=Anastasis \|last2=Papon \|first2=Léonie \|date=2022 \|title=Universal Approximation Theorems for Differentiable Geometric Deep Learning \|url=http://jmlr.org/papers/v23/21-0716.html \|journal=Journal of Machine Learning Research \|volume=23 \|issue=196 \|pages=1–73 \|arxiv=2101.05390 }}</ref> who derived explicit depth estimates depending on the regularity of the target function and of the activation function. The question of minimal possible width for universality was first studied in 2021, Park et al obtained the minimum width required for the universal approximation of ''[[Lp space\|L<sup>p</sup>]]'' functions using feed-forward neural networks with [[Rectifier (neural networks)\|ReLU]] as activation functions.<ref name="park">{{Cite conference \|last1=Park \|first1=Sejun \|last2=Yun \|first2=Chulhee \|last3=Lee \|first3=Jaeho \|last4=Shin \|first4=Jinwoo \|date=2021 \|title=Minimum Width for Universal Approximation \|conference=International Conference on Learning Representations \|arxiv=2006.08859}}</ref> Similar results that can be directly applied to [[residual neural network]]s were also obtained in the same year by Paulo Tabuada and Bahman Gharesifard using [[Control theory\|control-theoretic]] arguments.<ref>{{Cite conference \|last1=Tabuada \|first1=Paulo \|last2=Gharesifard \|first2=Bahman \|date=2021 \|title=Universal approximation power of deep residual neural networks via nonlinear control theory \|conference=International Conference on Learning Representations \|arxiv=2007.06007}}</ref><ref>{{cite journal \|last1=Tabuada \|first1=Paulo \|last2=Gharesifard \|first2=Bahman \|title=Universal Approximation Power of Deep Residual Neural Networks Through the Lens of Control \|journal=IEEE Transactions on Automatic Control \|date=May 2023 \|volume=68 \|issue=5 \|pages=2715–2728 \|doi=10.1109/TAC.2022.3190051 \|s2cid=250512115 }}{{Erratum\|doi=10.1109/TAC.2024.3390099\|checked=yes}}</ref> In 2023, Cai<ref name=":1">{{Cite journal \|last=Cai \|first=Yongqiang \|date=2023-02-01 \|title=Achieve the Minimum Width of Neural Networks for Universal Approximation \|url=https://openreview.net/forum?id=hfUJ4ShyDEU \|journal=ICLR \|arxiv=2209.11395 \|language=en}}</ref> obtained the optimal minimum width bound for the universal approximation. The bounded depth and bounded width case was first studied by Maiorov and Pinkus in 1999.<ref name=maiorov>{{Cite journal\|last1=Maiorov\|first1=Vitaly\|last2=Pinkus\|first2=Allan\|date=April 1999\|title=Lower bounds for approximation by MLP neural networks\|journal=Neurocomputing\|volume=25\|issue=1–3\|pages=81–91\|doi=10.1016/S0925-2312(98)00111-8}}</ref> They showed that there exists an analytic sigmoidal activation function such that two hidden layer neural networks with bounded number of units in hidden layers are universal approximators.

Universal approximation theorem: Difference between revisions