Universal approximation theorem: Difference between revisions

Content deleted Content added
Tags: Mobile edit Mobile web edit Advanced mobile edit
Tags: Mobile edit Mobile web edit Advanced mobile edit
Line 105:
 
The first result on approximation capabilities of neural networks with bounded number of layers, each containing a limited number of artificial neurons was obtained by Maiorov and Pinkus.<ref name=maiorov /> Their remarkable result revealed that such networks can be universal approximators and for achieving this property two hidden layers are enough.
{{math theorem
<blockquote>
'''| name = Universal approximation theorem:<ref name=maiorov />''' There exists an activation function <math>\sigma</math> which is analytic, strictly increasing and
| math_statement= There exists an activation function <math>\sigma</math> which is analytic, strictly increasing and sigmoidal and has the following property: For any <math> f\in C[0,1]^{d}</math> and <math>
\varepsilon >0</math> there exist constants <math>d_{i}, c_{ij}, \theta _{ij}, \gamma _{i}</math>, and vectors <math> \mathbf{w}^{ij}\in \mathbb{R}^{d}</math> for which
<math display='block'> \left\vert f(\mathbf{x})-\sum_{i=1}^{6d+3} d_{i}\sigma \left(
 
\sum_{j=1}^{3d} c_{ij} \sigma (\mathbf{w}^{ij}\cdot \mathbf{x-}\thetatheta_{ij}) - \gamma_{i}\right) \right\vert <\varepsilon </math>
<math display='block'> \left\vert f(\mathbf{x})-\sum_{i=1}^{6d+3}d_{i}\sigma \left(
\sum_{j=1}^{3d}c_{ij}\sigma (\mathbf{w}^{ij}\cdot \mathbf{x-}\theta
_{ij})-\gamma _{i}\right) \right\vert <\varepsilon </math>
 
for all <math> \mathbf{x}=(x_{1},...,x_{d})\in [0,1]^{d}</math>.
}}
</blockquote>
 
This is an existence result. It says that activation functions providing universal approximation property for bounded depth bounded width networks exist. Using certain algorithmic and computer programming techniques, Guliyev and Ismailov efficiently constructed such activation functions depending on a numerical parameter. The developed algorithm allows one to compute the activation functions at any point of the real axis instantly. For the algorithm and the corresponding computer code see.<ref name=guliyev1 /> The theoretical result can be formulated as follows.
{{math theorem
<blockquote>
'''| name = Universal approximation theorem:<ref name=guliyev1 /><ref name=guliyev2 />'''
| math_statement = Let <math> [a,b]</math> be a finite segment of the real line, <math> s=b-a</math> and <math> \lambda</math> be any positive number. Then one can algorithmically construct a computable sigmoidal activation function <math> \sigma \colon \mathbb{R} \to \mathbb{R}</math>, which is infinitely differentiable, strictly increasing on <math> (-\infty, s) </math>, <math> \lambda</math> -strictly increasing on <math> [s,+\infty) </math>, and satisfies the following properties:
 
1) For any <math> f \in C[a,b] </math> and <math> \varepsilon > 0</math> there exist numbers <math> c_1,c_2,\theta_1</math> and <math> \theta_2</math> such that for all <math>x \in [a,b] </math>
<math display='block'> |f(x) - c_1 \sigma(x - \theta_1) - c_2 \sigma(x - \theta_2)| < \varepsilon</math>
 
2)# For any continuous function <math>F</math> onf the <math>d</math>-dimensional\in box <math>C[a,b]^{d} </math> and <math> \varepsilon > 0</math>, there exist constantsnumbers <math>e_p c_1,c_2,\theta_1</math>, and <math>c_{pq} \theta_2</math>, such that for all <math>x \theta_{pq}in [a,b] </math> and <math display='block'> |f(x) - c_1 \zeta_p</math>sigma(x such- that\theta_1) the- inequalityc_2 \sigma(x - \theta_2)| < \varepsilon</math>
# For any continuous function <math>F</math> on the <math>d</math>-dimensional box <math>[a,b]^{d}</math> and <math>\varepsilon >0</math>, there exist constants <math>e_p</math>, <math>c_{pq}</math>, <math>\theta_{pq}</math> and <math>\zeta_p</math> such that the inequality <math display='block'> \left| F(\mathbf{x}) - \sum_{p=1}^{2d+2} e_p \sigma \left( \sum_{q=1}^{d} c_{pq} \sigma(\mathbf{w}^{q} \cdot \mathbf{x} - \theta_{pq}) - \zeta_p \right) \right| < \varepsilon</math> holds for all <math>\mathbf{x} = (x_1, \ldots, x_d) \in [a, b]^{d}</math>. Here the weights <math>\mathbf{w}^{q}</math>, <math>q = 1, \ldots, d</math>, are fixed as follows: <math display='block'> \mathbf{w}^{1} = (1, 0, \ldots, 0), \quad \mathbf{w}^{2} = (0, 1, \ldots, 0), \quad \ldots, \quad \mathbf{w}^{d} = (0, 0, \ldots, 1). </math> In addition, all the coefficients <math>e_p</math>, except one, are equal.
}}
holds for all <math>\mathbf{x} = (x_1, \ldots, x_d) \in [a, b]^{d}</math>. Here the weights <math>\mathbf{w}^{q}</math>, <math>q = 1, \ldots, d</math>, are fixed as follows:
<math display='block'> \mathbf{w}^{1} = (1, 0, \ldots, 0), \quad \mathbf{w}^{2} = (0, 1, \ldots, 0), \quad \ldots, \quad \mathbf{w}^{d} = (0, 0, \ldots, 1). </math>
In addition, all the coefficients <math>e_p</math>, except one, are equal.
</blockquote>
 
Here “<math> \sigma \colon \mathbb{R} \to \mathbb{R}</math> is <math>\lambda</math>-strictly increasing on some set <math>X</math>” means that there exists a strictly increasing function <math>u \colon X \to \mathbb{R}</math> such that <math>|\sigma(x) - u(x)| \le \lambda</math> for all <math>x \in X</math>. Clearly, a <math>\lambda</math>-increasing function behaves like a usual increasing function as <math>\lambda</math> gets small.