Universal approximation theorem: Difference between revisions

Content deleted Content added
No edit summary
Google search list is inadequate citation for supposed statement, tidy cites, remove unnecessary subscripts
Line 1:
In mathematics, the '''universal approximation theorem''' states<ref>Balázs Csanád Csáji. Approximation with Artificial Neural Networks; Faculty of Sciences; Eötvös Loránd University, Hungary</ref> that the standard [[Multilayer_perceptron|multilayer]] [[feedforward neural network|feed-forward]] network with a single hidden layer that contains finite number of hidden [[neuron]]s, and with arbitrary activation function are universal approximators on a compact subset of [[Euclidean space|'''R'''<mathsup>\mathbb{R}^n</mathsup>]].
 
The [[theorem]] was first proved{{cn}} by [[George Cybenko]] in 1989 for a [[sigmoid function|sigmoid]] activation function, thus it is also called the '''Cybenko theorem'''.<ref name=cyb>Cybenko., G. (1989) [http://wwwactcomm.googledartmouth.comedu/search?q=Cybenko+theoremgvc/papers/approx_by_superposition.pdf "Approximations by superpositions of sigmoidal functions"], ''Mathematics of Control, Signals, and Systems'', 2 (4), 303-314</ref>.
 
Kurt Hornik (1991){{cn}} showed that it is not the specific choice of the activation function, but rather the multilayer feedforward architecture itself which gives neural networks the potential of being universal approximators. The output units are always assumed to be linear. For notational convenience we shall explicitly formulate our results only for the case where there is only one output unit. (The general case can easily be deduced from the simple case.)
 
The theorem<ref name=cyb/><ref>
The theorem<ref>G. Cybenko. Approximations by superpositions of sigmoidal functions. Mathematics of Control, Signals, and Systems, 2:303–314, no. 4 pp.&nbsp;303-314. [http://actcomm.dartmouth.edu/gvc/papers/approx_by_superposition.pdf electronic version], 1989.</ref><ref>
Kurt Hornik: (1991) "Approximation Capabilities of Multilayer Feedforward Networks.",
''Neural Networks, vol. 4, 1991.''{{full}}</ref><ref>Haykin, Simon (1998). ''Neural Networks: A Comprehensive Foundation'', Volume 2, Prentice Hall. ISBN 0132733501.</ref><ref>Hassoun, M. (1995) ''Fundamentals of Artificial Neural Networks'' MIT Press, p.&nbsp;48</ref> in mathematical terms:
 
== Formal statement ==
Line 13:
<blockquote>
 
Let φ(·) be a nonconstant, bounded, and [[monotonic function|monotonically]]-increasing [[continuous function|continuous]] function. Let ''I''<sub>''m''<sub>0</sub></sub> denote the ''m''<sub>0</sub>-dimensional unit hypercube [0,1]<sup>''m''<sub>0</sub></sup>. The space of continuous functions on ''I''<sub>''m''<sub>0</sub></sub> is denoted by ''C''(''I''<sub>''m''<sub>0</sub></sub>). Then, given any function ''f'' ∈ ''C''(''I''<sub>''m''<sub>0</sub></sub>) and є &gt; 0, there exist an integer ''mN''<sub>1</sub> and sets of real constants ''α''<sub>''i''</sub>, ''b''<sub>''i''</sub> ∈ '''R''', ''w''<sub>''i''</sub> ∈ '''R'''<sup>''m''<sub>0</sub></sup>, where ''i'' = 1, ..., ''mN''<sub>1</sub> such that we may define:
 
: <math>
F( x ) =
\sum_{i=1}^{m_1N} \alpha_i \varphi \left( w_i^T x + b_i\right)
</math>
 
Line 26:
</math>
 
for all ''x'' ∈ ''I''<sub>''m''<sub>0</sub></sub>.
</blockquote>