Revision as of 12:07, 21 April 2024 edit Saung Tadashi (talk \| contribs) Extended confirmed users 2,083 edits →History: Add errata Tag: Visual edit ← Previous edit		Revision as of 18:53, 21 April 2024 edit undo Hellacioussatyr (talk \| contribs) Extended confirmed users 10,148 edits No edit summary Tag: Visual edit Next edit →
Line 45: Such an <math>f</math> can also be approximated by a network of greater depth by using the same construction for the first layer and approximating the identity function with later layers. {{Math proof {{Math proof\|title=Proof sketch\|proof=It suffices to prove the case where <math>m = 1</math>, since uniform convergence in <math>\R^m</math> is just uniform convergence in each coordinate.▼ \| title = Proof sketch ▲~~{{Math~~\| proof~~\|title~~ =~~Proof~~ ~~sketch\|proof=~~It suffices to prove the case where <math>m = 1</math>, since uniform convergence in <math>\R^m</math> is just uniform convergence in each coordinate. Let <math>F_\sigma</math> be the set of all one-hidden-layer neural networks constructed with <math>\sigma</math>. Let <math>C_0(\R^d, \R)</math> be the set of all <math>C(\R^d, \R)</math> with compact support. Line 52 ⟶ 54: Otherwise, we show that <math>F_\sigma</math>'s closure is all of <math>C_0(\R^d, \R)</math>. Suppose we can construct arbitrarily good approximations of the ramp function <math display="block">r(x) = \begin{cases} -1 & \text{if } x < -1 \\ \phantom{+}x & \text{if } \|x\|\leq 1 \\ Line 120 ⟶ 122: # For any <math> f \in C[a,b] </math> and <math> \varepsilon > 0</math> there exist numbers <math> c_1,c_2,\theta_1</math> and <math> \theta_2</math> such that for all <math>x \in [a,b] </math> <math display='block'> \|f(x) - c_1 \sigma(x - \theta_1) - c_2 \sigma(x - \theta_2)\| < \varepsilon</math> # For any continuous function <math>F</math> on the <math>d</math>-dimensional box <math>[a,b]^{d}</math> and <math>\varepsilon > 0</math>, there exist constants <math>e_p</math>, <math>c_{pq}</math>, <math>\theta_{pq}</math> and <math>\zeta_p</math> such that the inequality <math display='block'> \left\| F(\mathbf{x}) - \sum_{p=1}^{2d+2} e_p \sigma \left( \sum_{q=1}^{d} c_{pq} \sigma(\mathbf{w}^{q} \cdot \mathbf{x} - \theta_{pq}) - \zeta_p \right) \right\| < \varepsilon</math> holds for all <math>\mathbf{x} = (x_1, \ldots, x_d) \in [a, b]^{d}</math>. Here the weights <math>\mathbf{w}^{q}</math>, <math>q = 1, \ldots, d</math>, are fixed as follows: <math display='block'> \mathbf{w}^{1} = (1, 0, \ldots, 0), \quad \mathbf{w}^{2} = (0, 1, \ldots, 0), \quad \ldots, \quad \mathbf{w}^{d} = (0, 0, \ldots, 1). </math> In addition, all the coefficients <math>e_p</math>, except one, are equal. }} Line 128 ⟶ 130: == Graph input == Achieving useful universal function approximation on graphs (or rather on [[Graph isomorphism\|graph isomorphism classes]]) has been a longstanding problem. The popular graph convolutional neural networks (GCNs or GNNs) can be made as discriminative as the Weisfeiler–Leman graph isomorphism test.<ref name=PowerGNNs>{{Cite conference\|last1=Xu\|first1=Keyulu\|last2=Hu\|first2=Weihua\|last3=Leskovec\|first3=Jure\|last4=Jegelka\|first4=Stefanie\|date=2019\|title=How Powerful are Graph Neural Networks?\|url=https://openreview.net/forum?id=ryGs6iA5Km\|journal=International Conference on Learning Representations}}</ref> In 2020,<ref name=UniversalGraphs>{{Cite conference\|last1=Brüel-Gabrielsson\|first1=Rickard\|date=2020\|title=Universal Function Approximation on Graphs\|url=https://proceedings.neurips.cc//paper/2020/hash/e4acb4c86de9d2d9a41364f93951028d-Abstract.html\|publisher=Curran Associates\|journal=Advances in Neural Information Processing Systems \|volume=33}}</ref> a universal approximation theorem result was established by Brüel-Gabrielsson, showing that graph representation with certain injective properties is sufficient for universal function approximation on bounded graphs and restricted universal function approximation on unbounded graphs, with an accompanying <math>\mathcal O(~~</math>#edges<math>~~\~~times</math>#nodes<math>~~left\|V\right\| \cdot \left\|E\right\|)</math>-runtime method that performed at state of the art on a collection of benchmarks (where <math>V</math> and <math>E</math> are the sets of nodes and edges of the graph respectively). == See also ==

Universal approximation theorem: Difference between revisions