Universal approximation theorem: Difference between revisions

Content deleted Content added
History: Add errata
No edit summary
Line 45:
Such an <math>f</math> can also be approximated by a network of greater depth by using the same construction for the first layer and approximating the identity function with later layers.
 
{{Math proof
{{Math proof|title=Proof sketch|proof=It suffices to prove the case where <math>m = 1</math>, since uniform convergence in <math>\R^m</math> is just uniform convergence in each coordinate.
| title = Proof sketch
{{Math| proof|title =Proof sketch|proof=It suffices to prove the case where <math>m = 1</math>, since uniform convergence in <math>\R^m</math> is just uniform convergence in each coordinate.
 
Let <math>F_\sigma</math> be the set of all one-hidden-layer neural networks constructed with <math>\sigma</math>. Let <math>C_0(\R^d, \R)</math> be the set of all <math>C(\R^d, \R)</math> with compact support.
Line 52 ⟶ 54:
 
Otherwise, we show that <math>F_\sigma</math>'s closure is all of <math>C_0(\R^d, \R)</math>. Suppose we can construct arbitrarily good approximations of the ramp function
<math display="block">r(x) = \begin{cases}
-1 & \text{if } x < -1 \\
\phantom{+}x & \text{if } |x|\leq 1 \\
Line 120 ⟶ 122:
 
# For any <math> f \in C[a,b] </math> and <math> \varepsilon > 0</math> there exist numbers <math> c_1,c_2,\theta_1</math> and <math> \theta_2</math> such that for all <math>x \in [a,b] </math> <math display='block'> |f(x) - c_1 \sigma(x - \theta_1) - c_2 \sigma(x - \theta_2)| < \varepsilon</math>
# For any continuous function <math>F</math> on the <math>d</math>-dimensional box <math>[a,b]^{d}</math> and <math>\varepsilon > 0</math>, there exist constants <math>e_p</math>, <math>c_{pq}</math>, <math>\theta_{pq}</math> and <math>\zeta_p</math> such that the inequality <math display='block'> \left| F(\mathbf{x}) - \sum_{p=1}^{2d+2} e_p \sigma \left( \sum_{q=1}^{d} c_{pq} \sigma(\mathbf{w}^{q} \cdot \mathbf{x} - \theta_{pq}) - \zeta_p \right) \right| < \varepsilon</math> holds for all <math>\mathbf{x} = (x_1, \ldots, x_d) \in [a, b]^{d}</math>. Here the weights <math>\mathbf{w}^{q}</math>, <math>q = 1, \ldots, d</math>, are fixed as follows: <math display='block'> \mathbf{w}^{1} = (1, 0, \ldots, 0), \quad \mathbf{w}^{2} = (0, 1, \ldots, 0), \quad \ldots, \quad \mathbf{w}^{d} = (0, 0, \ldots, 1). </math> In addition, all the coefficients <math>e_p</math>, except one, are equal.
}}
 
Line 128 ⟶ 130:
== Graph input ==
 
Achieving useful universal function approximation on graphs (or rather on [[Graph isomorphism|graph isomorphism classes]]) has been a longstanding problem. The popular graph convolutional neural networks (GCNs or GNNs) can be made as discriminative as the Weisfeiler–Leman graph isomorphism test.<ref name=PowerGNNs>{{Cite conference|last1=Xu|first1=Keyulu|last2=Hu|first2=Weihua|last3=Leskovec|first3=Jure|last4=Jegelka|first4=Stefanie|date=2019|title=How Powerful are Graph Neural Networks?|url=https://openreview.net/forum?id=ryGs6iA5Km|journal=International Conference on Learning Representations}}</ref> In 2020,<ref name=UniversalGraphs>{{Cite conference|last1=Brüel-Gabrielsson|first1=Rickard|date=2020|title=Universal Function Approximation on Graphs|url=https://proceedings.neurips.cc//paper/2020/hash/e4acb4c86de9d2d9a41364f93951028d-Abstract.html|publisher=Curran Associates|journal=Advances in Neural Information Processing Systems |volume=33}}</ref> a universal approximation theorem result was established by Brüel-Gabrielsson, showing that graph representation with certain injective properties is sufficient for universal function approximation on bounded graphs and restricted universal function approximation on unbounded graphs, with an accompanying <math>\mathcal O(</math>#edges<math>\times</math>#nodes<math>left|V\right| \cdot \left|E\right|)</math>-runtime method that performed at state of the art on a collection of benchmarks (where <math>V</math> and <math>E</math> are the sets of nodes and edges of the graph respectively).
 
== See also ==