Universal approximation theorem: Difference between revisions

Content deleted Content added
revise for more general audience
Tags: possible prose issues Visual edit
citation fix
Line 2:
{{Technical|date=July 2023}}
 
In the field of [[machine learning]], the '''universal approximation theorems''' state that [[Artificial neural network|neural networks]] with a certain structure can, in principle, approximate any [[continuous function]] to any desired degree of accuracy. These theorems provide a mathematical justification for using neural networks, assuring researchers that a sufficiently large or deep network can model the complex, non-linear relationships often found in real-world data.<ref name="MLP-UA2UA">{{cite journal |last1=Hornik |first1=Kurt |last2=Stinchcombe |first2=Maxwell |last3=White |first3=Halbert |date=January 1989 |title=Multilayer feedforward networks are universal approximators |journal=Neural Networks |volume=2 |issue=5 |pages=359–366 |doi=10.1016/0893-6080(89)90020-8}}</ref><ref>Balázs Csanád Csáji (2001) Approximation with Artificial Neural Networks; Faculty of Sciences; Eötvös Loránd University, Hungary</ref>
 
The most well-known version of the theorem applies to [[Feedforward neural network|feedforward networks]] with a single hidden layer. It states that if the layer's [[activation function]] is non-[[polynomial]] (which is true for common choices like the [[sigmoid function]] or [[Rectifier (neural networks)|ReLU]]), then the network can act as a "universal approximator." Universality is achieved by increasing the number of neurons in the hidden layer, making the network "wider." Other versions of the theorem show that universality can also be achieved by keeping the network's width fixed but increasing its number of layers, making it "deeper."