Universal approximation theorem

In the mathematical theory of neural networks, the universal approximation theorem states^[1] that the standard multilayer feed-forward network with a single hidden layer, a perceptron, which contains finite number of hidden neurons, is a universal approximator among continuous functions on compact subsets of Rⁿ, under mild assumptions on the activation function.

One of the first versions of the theorem was proved by George Cybenko in 1989 for sigmoid activation functions.^[2]

Kurt Hornik showed in 1991^[3] that it is not the specific choice of the activation function, but rather the multilayer feedforward architecture itself which gives neural networks the potential of being universal approximators. The output units are always assumed to be linear. For notational convenience we shall explicitly formulate our results only for the case where there is only one output unit. (The general case can easily be deduced from the simple case.)

The theorem^[2]^[3]^[4]^[5] in mathematical terms:

Formal statement

Let φ(·) be a nonconstant, bounded, and monotonically-increasing continuous function. Let I_m denote the m-dimensional unit hypercube [0,1]^m. The space of continuous functions on I_m is denoted by C(I_m). Then, given any function f ∈ C(I_m) and є > 0, there exist an integer N and real constants α_i, b_i ∈ R, w_i ∈ R^m, where i = 1, ..., N such that we may define:

$F(x)=\sum _{i=1}^{N}\alpha _{i}\varphi \left(w_{i}^{T}x+b_{i}\right)$

as an approximate realization of the function f where f is independent of φ; that is,

$|F(x)-f(x)|<\varepsilon$

for all x ∈ I_m. In other words, functions of the form F(x) are dense in C(I_m).

References

^ Balázs Csanád Csáji. Approximation with Artificial Neural Networks; Faculty of Sciences; Eötvös Loránd University, Hungary
^ ^a ^b Cybenko., G. (1989) "Approximations by superpositions of sigmoidal functions", Mathematics of Control, Signals, and Systems, 2 (4), 303-314
^ ^a ^b Kurt Hornik (1991) "Approximation Capabilities of Multilayer Feedforward Networks", Neural Networks, 4(2), 251–257
^ Haykin, Simon (1998). Neural Networks: A Comprehensive Foundation, Volume 2, Prentice Hall. ISBN 0-13-273350-1.
^ Hassoun, M. (1995) Fundamentals of Artificial Neural Networks MIT Press, p. 48

This applied mathematics–related article is a stub. You can help Wikipedia by expanding it.

[1] Balázs Csanád Csáji. Approximation with Artificial Neural Networks; Faculty of Sciences; Eötvös Loránd University, Hungary

[cyb-2] Cybenko., G. (1989) "Approximations by superpositions of sigmoidal functions", Mathematics of Control, Signals, and Systems, 2 (4), 303-314

[horn-3] Kurt Hornik (1991) "Approximation Capabilities of Multilayer Feedforward Networks", Neural Networks, 4(2), 251–257

[4] Haykin, Simon (1998). Neural Networks: A Comprehensive Foundation, Volume 2, Prentice Hall. ISBN 0-13-273350-1.

[5] Hassoun, M. (1995) Fundamentals of Artificial Neural Networks MIT Press, p. 48

[1]

[2]

[3]

[4]

[5]