Revision as of 16:16, 15 August 2024 edit Dejeneret (talk \| contribs) 1 edit m →Arbitrary-depth case ← Previous edit		Revision as of 21:45, 15 August 2024 edit undo David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,585 edits Stefanie Jegelka Next edit →
Line 48: === Variants === Discontinuous activation functions,<ref name="leshno" /> noncompact domains,<ref name="kidger" /><ref>{{Cite journal \|last1=van Nuland \|first1=Teun \|year=2024 \|title=Noncompact uniform universal approximation \|url=https://doi.org/10.1016/j.neunet.2024.106181 \|journal=Neural Networks \|volume=173\|doi=10.1016/j.neunet.2024.106181 \|pmid=38412737 \|arxiv=2308.03812 }}</ref> certifiable networks,<ref>{{cite conference \|last1=Baader \|first1=Maximilian \|last2=Mirman \|first2=Matthew \|last3=Vechev \|first3=Martin \|date=2020 \|title=Universal Approximation with Certified Networks \|url=https://openreview.net/forum?id=B1gX8kBtPr \|conference=ICLR}}</ref> random neural networks,<ref>{{Cite journal \|last1=Gelenbe \|first1=Erol \|last2=Mao \|first2=Zhi Hong \|last3=Li \|first3=Yan D. \|year=1999 \|title=Function approximation with spiked random networks \|url=https://zenodo.org/record/6817275 \|journal=IEEE Transactions on Neural Networks \|volume=10 \|issue=1 \|pages=3–9 \|doi=10.1109/72.737488 \|pmid=18252498}}</ref> and alternative network architectures and topologies.<ref name="kidger" /><ref>{{Cite conference \|last1=Lin \|first1=Hongzhou \|last2=Jegelka \|first2=Stefanie\|author2-link=Stefanie Jegelka \|date=2018 \|title=ResNet with one-neuron hidden layers is a Universal Approximator \|url=https://papers.nips.cc/paper/7855-resnet-with-one-neuron-hidden-layers-is-a-universal-approximator \|publisher=Curran Associates \|volume=30 \|pages=6169–6178 \|journal=Advances in Neural Information Processing Systems}}</ref> The universal approximation property of width-bounded networks has been studied as a ''dual'' of classical universal approximation results on depth-bounded networks. For input dimension dx and output dimension dy the minimum width required for the universal approximation of the ''[[Lp space\|L<sup>p</sup>]]'' functions is exactly max{dx + 1, dy} (for a ReLU network). <!-- ReLU alone is not sufficient in general "In light of Theorem 2, is it impossible to approximate C(K, R dy) in general while maintaining width max{dx + 1, dy}? Theorem 3 shows that an additional activation comes to rescue." --> More generally this also holds if ''both'' ReLU and a [[step function\|threshold activation function]] are used.<ref name="park" /> Universal function approximation on graphs (or rather on [[Graph isomorphism\|graph isomorphism classes]]) by popular [[Graph neural network\|graph convolutional neural networks]] (GCNs or GNNs) can be made as discriminative as the Weisfeiler–Leman graph isomorphism test.<ref name="PowerGNNs">{{Cite conference \|last1=Xu \|first1=Keyulu \|last2=Hu \|first2=Weihua \|last3=Leskovec \|first3=Jure \|last4=Jegelka \|first4=Stefanie\|author4-link=Stefanie Jegelka \|date=2019 \|title=How Powerful are Graph Neural Networks? \|url=https://openreview.net/forum?id=ryGs6iA5Km \|journal=International Conference on Learning Representations}}</ref> In 2020,<ref name="UniversalGraphs">{{Cite conference \|last1=Brüel-Gabrielsson \|first1=Rickard \|date=2020 \|title=Universal Function Approximation on Graphs \|url=https://proceedings.neurips.cc//paper/2020/hash/e4acb4c86de9d2d9a41364f93951028d-Abstract.html \|publisher=Curran Associates \|volume=33 \|journal=Advances in Neural Information Processing Systems}}</ref> a universal approximation theorem result was established by Brüel-Gabrielsson, showing that graph representation with certain injective properties is sufficient for universal function approximation on bounded graphs and restricted universal function approximation on unbounded graphs, with an accompanying <math>\mathcal O(\left\|V\right\| \cdot \left\|E\right\|)</math>-runtime method that performed at state of the art on a collection of benchmarks (where <math>V</math> and <math>E</math> are the sets of nodes and edges of the graph respectively). There are also a variety of results between [[non-Euclidean space]]s<ref name="NonEuclidean">{{Cite conference \|last1=Kratsios \|first1=Anastasis \|last2=Bilokopytov \|first2=Eugene \|date=2020 \|title=Non-Euclidean Universal Approximation \|url=https://papers.nips.cc/paper/2020/file/786ab8c4d7ee758f80d57e65582e609d-Paper.pdf \|publisher=Curran Associates \|volume=33 \|journal=Advances in Neural Information Processing Systems}}</ref> and other commonly used architectures and, more generally, algorithmically generated sets of functions, such as the [[convolutional neural network]] (CNN) architecture,<ref>{{cite journal \|last1=Zhou \|first1=Ding-Xuan \|year=2020 \|title=Universality of deep convolutional neural networks \|journal=[[Applied and Computational Harmonic Analysis]] \|volume=48 \|issue=2 \|pages=787–794 \|arxiv=1805.10769 \|doi=10.1016/j.acha.2019.06.004 \|s2cid=44113176}}</ref><ref>{{Cite journal \|last1=Heinecke \|first1=Andreas \|last2=Ho \|first2=Jinn \|last3=Hwang \|first3=Wen-Liang \|year=2020 \|title=Refinement and Universal Approximation via Sparsely Connected ReLU Convolution Nets \|journal=IEEE Signal Processing Letters \|volume=27 \|pages=1175–1179 \|bibcode=2020ISPL...27.1175H \|doi=10.1109/LSP.2020.3005051 \|s2cid=220669183}}</ref> [[radial basis functions]],<ref>{{Cite journal \|last1=Park \|first1=J. \|last2=Sandberg \|first2=I. W. \|year=1991 \|title=Universal Approximation Using Radial-Basis-Function Networks \|journal=Neural Computation \|volume=3 \|issue=2 \|pages=246–257 \|doi=10.1162/neco.1991.3.2.246 \|pmid=31167308 \|s2cid=34868087}}</ref> or neural networks with specific properties.<ref>{{cite journal \|last1=Yarotsky \|first1=Dmitry \|year=2021 \|title=Universal Approximations of Invariant Maps by Neural Networks \|journal=Constructive Approximation \|volume=55 \|pages=407–474 \|arxiv=1804.10306 \|doi=10.1007/s00365-021-09546-1 \|s2cid=13745401}}</ref><ref>{{cite journal \|last1=Zakwan \|first1=Muhammad \|last2=d’Angelo \|first2=Massimiliano \|last3=Ferrari-Trecate \|first3=Giancarlo \|date=2023 \|title=Universal Approximation Property of Hamiltonian Deep Neural Networks \|journal=IEEE Control Systems Letters \|page=1 \|arxiv=2303.12147 \|doi=10.1109/LCSYS.2023.3288350 \|s2cid=257663609}}</ref>

Universal approximation theorem: Difference between revisions