Content deleted Content added
Line 48:
=== Variants ===
Discontinuous activation functions,<ref name="leshno" /> noncompact domains,<ref name="kidger" /><ref>{{Cite journal |last1=van Nuland |first1=Teun |year=2024 |title=Noncompact uniform universal approximation |url=https://doi.org/10.1016/j.neunet.2024.106181 |journal=Neural Networks |volume=173|doi=10.1016/j.neunet.2024.106181 |pmid=38412737 |arxiv=2308.03812 }}</ref> certifiable networks,<ref>{{cite conference |last1=Baader |first1=Maximilian |last2=Mirman |first2=Matthew |last3=Vechev |first3=Martin |date=2020 |title=Universal Approximation with Certified Networks |url=https://openreview.net/forum?id=B1gX8kBtPr |conference=ICLR}}</ref>
random neural networks,<ref>{{Cite journal |last1=Gelenbe |first1=Erol |last2=Mao |first2=Zhi Hong |last3=Li |first3=Yan D. |year=1999 |title=Function approximation with spiked random networks |url=https://zenodo.org/record/6817275 |journal=IEEE Transactions on Neural Networks |volume=10 |issue=1 |pages=3–9 |doi=10.1109/72.737488 |pmid=18252498}}</ref> and alternative network architectures and topologies.<ref name="kidger" /><ref>{{Cite conference |last1=Lin |first1=Hongzhou |last2=Jegelka |first2=Stefanie|author2-link=Stefanie Jegelka |date=2018 |title=ResNet with one-neuron hidden layers is a Universal Approximator |url=https://papers.nips.cc/paper/7855-resnet-with-one-neuron-hidden-layers-is-a-universal-approximator |publisher=Curran Associates |volume=30 |pages=6169–6178 |journal=Advances in Neural Information Processing Systems}}</ref>
The universal approximation property of width-bounded networks has been studied as a ''dual'' of classical universal approximation results on depth-bounded networks. For input dimension dx and output dimension dy the minimum width required for the universal approximation of the ''[[Lp space|L<sup>p</sup>]]'' functions is exactly max{dx + 1, dy} (for a ReLU network). <!-- ReLU alone is not sufficient in general "In light of Theorem 2, is it impossible to approximate C(K, R dy) in general while maintaining width max{dx + 1, dy}? Theorem 3 shows that an additional activation comes to rescue." --> More generally this also holds if ''both'' ReLU and a [[step function|threshold activation function]] are used.<ref name="park" />
Universal function approximation on graphs (or rather on [[Graph isomorphism|graph isomorphism classes]]) by popular [[Graph neural network|graph convolutional neural networks]] (GCNs or GNNs) can be made as discriminative as the Weisfeiler–Leman graph isomorphism test.<ref name="PowerGNNs">{{Cite conference |last1=Xu |first1=Keyulu |last2=Hu |first2=Weihua |last3=Leskovec |first3=Jure |last4=Jegelka |first4=Stefanie|author4-link=Stefanie Jegelka |date=2019 |title=How Powerful are Graph Neural Networks? |url=https://openreview.net/forum?id=ryGs6iA5Km |journal=International Conference on Learning Representations}}</ref> In 2020,<ref name="UniversalGraphs">{{Cite conference |last1=Brüel-Gabrielsson |first1=Rickard |date=2020 |title=Universal Function Approximation on Graphs |url=https://proceedings.neurips.cc//paper/2020/hash/e4acb4c86de9d2d9a41364f93951028d-Abstract.html |publisher=Curran Associates |volume=33 |journal=Advances in Neural Information Processing Systems}}</ref> a universal approximation theorem result was established by Brüel-Gabrielsson, showing that graph representation with certain injective properties is sufficient for universal function approximation on bounded graphs and restricted universal function approximation on unbounded graphs, with an accompanying <math>\mathcal O(\left|V\right| \cdot \left|E\right|)</math>-runtime method that performed at state of the art on a collection of benchmarks (where <math>V</math> and <math>E</math> are the sets of nodes and edges of the graph respectively).
There are also a variety of results between [[non-Euclidean space]]s<ref name="NonEuclidean">{{Cite conference |last1=Kratsios |first1=Anastasis |last2=Bilokopytov |first2=Eugene |date=2020 |title=Non-Euclidean Universal Approximation |url=https://papers.nips.cc/paper/2020/file/786ab8c4d7ee758f80d57e65582e609d-Paper.pdf |publisher=Curran Associates |volume=33 |journal=Advances in Neural Information Processing Systems}}</ref> and other commonly used architectures and, more generally, algorithmically generated sets of functions, such as the [[convolutional neural network]] (CNN) architecture,<ref>{{cite journal |last1=Zhou |first1=Ding-Xuan |year=2020 |title=Universality of deep convolutional neural networks |journal=[[Applied and Computational Harmonic Analysis]] |volume=48 |issue=2 |pages=787–794 |arxiv=1805.10769 |doi=10.1016/j.acha.2019.06.004 |s2cid=44113176}}</ref><ref>{{Cite journal |last1=Heinecke |first1=Andreas |last2=Ho |first2=Jinn |last3=Hwang |first3=Wen-Liang |year=2020 |title=Refinement and Universal Approximation via Sparsely Connected ReLU Convolution Nets |journal=IEEE Signal Processing Letters |volume=27 |pages=1175–1179 |bibcode=2020ISPL...27.1175H |doi=10.1109/LSP.2020.3005051 |s2cid=220669183}}</ref> [[radial basis functions]],<ref>{{Cite journal |last1=Park |first1=J. |last2=Sandberg |first2=I. W. |year=1991 |title=Universal Approximation Using Radial-Basis-Function Networks |journal=Neural Computation |volume=3 |issue=2 |pages=246–257 |doi=10.1162/neco.1991.3.2.246 |pmid=31167308 |s2cid=34868087}}</ref> or neural networks with specific properties.<ref>{{cite journal |last1=Yarotsky |first1=Dmitry |year=2021 |title=Universal Approximations of Invariant Maps by Neural Networks |journal=Constructive Approximation |volume=55 |pages=407–474 |arxiv=1804.10306 |doi=10.1007/s00365-021-09546-1 |s2cid=13745401}}</ref><ref>{{cite journal |last1=Zakwan |first1=Muhammad |last2=d’Angelo |first2=Massimiliano |last3=Ferrari-Trecate |first3=Giancarlo |date=2023 |title=Universal Approximation Property of Hamiltonian Deep Neural Networks |journal=IEEE Control Systems Letters |page=1 |arxiv=2303.12147 |doi=10.1109/LCSYS.2023.3288350 |s2cid=257663609}}</ref>
|