Revision as of 15:20, 27 July 2025 edit TokenByToken (talk \| contribs) Extended confirmed users 1,392 edits fix citations Tag: Visual edit: Switched ← Previous edit		Revision as of 10:53, 10 August 2025 edit undo TristanDC (talk \| contribs) 76 edits m disambiguate what looks like differentials Next edit →
Line 45: random neural networks,<ref>{{Cite journal \|last1=Gelenbe \|first1=Erol \|last2=Mao \|first2=Zhi Hong \|last3=Li \|first3=Yan D. \|year=1999 \|title=Function approximation with spiked random networks \|url=https://zenodo.org/record/6817275 \|journal=IEEE Transactions on Neural Networks \|volume=10 \|issue=1 \|pages=3–9 \|doi=10.1109/72.737488 \|pmid=18252498}}</ref> and alternative network architectures and topologies.<ref name="kidger" /><ref>{{Cite conference \|last1=Lin \|first1=Hongzhou \|last2=Jegelka \|first2=Stefanie\|author2-link=Stefanie Jegelka \|date=2018 \|title=ResNet with one-neuron hidden layers is a Universal Approximator \|url=https://papers.nips.cc/paper/7855-resnet-with-one-neuron-hidden-layers-is-a-universal-approximator \|publisher=Curran Associates \|volume=30 \|pages=6169–6178 \|journal=Advances in Neural Information Processing Systems}}</ref> The universal approximation property of width-bounded networks has been studied as a ''dual'' of classical universal approximation results on depth-bounded networks. For input dimension dx<math>d_x</math> and output dimension dy<math>d_y</math> the minimum width required for the universal approximation of the ''[[Lp space\|L<sup>p</sup>]]'' functions is exactly <math>max\{dxd_x + 1, dyd_y\}</math> (for a ReLU network). <!-- ReLU alone is not sufficient in general "In light of Theorem 2, is it impossible to approximate <math>C(K, R dyd_y)</math> in general while maintaining width <math>max\{dxd_x + 1, dyd_y\}</math>? Theorem 3 shows that an additional activation comes to rescue." --> More generally this also holds if ''both'' ReLU and a [[step function\|threshold activation function]] are used.<ref name="park" /> Universal function approximation on graphs (or rather on [[Graph isomorphism\|graph isomorphism classes]]) by popular [[Graph neural network\|graph convolutional neural networks]] (GCNs or GNNs) can be made as discriminative as the Weisfeiler–Leman graph isomorphism test.<ref name="PowerGNNs">{{Cite conference \|last1=Xu \|first1=Keyulu \|last2=Hu \|first2=Weihua \|last3=Leskovec \|first3=Jure \|last4=Jegelka \|first4=Stefanie\|author4-link=Stefanie Jegelka \|date=2019 \|title=How Powerful are Graph Neural Networks? \|url=https://openreview.net/forum?id=ryGs6iA5Km \|journal=International Conference on Learning Representations}}</ref> In 2020,<ref name="UniversalGraphs">{{Cite conference \|last1=Brüel-Gabrielsson \|first1=Rickard \|date=2020 \|title=Universal Function Approximation on Graphs \|url=https://proceedings.neurips.cc//paper/2020/hash/e4acb4c86de9d2d9a41364f93951028d-Abstract.html \|publisher=Curran Associates \|volume=33 \|journal=Advances in Neural Information Processing Systems}}</ref> a universal approximation theorem result was established by Brüel-Gabrielsson, showing that graph representation with certain injective properties is sufficient for universal function approximation on bounded graphs and restricted universal function approximation on unbounded graphs, with an accompanying <math>\mathcal O(\left\|V\right\| \cdot \left\|E\right\|)</math>-runtime method that performed at state of the art on a collection of benchmarks (where <math>V</math> and <math>E</math> are the sets of nodes and edges of the graph respectively).

Universal approximation theorem: Difference between revisions