Universal approximation theorem: Difference between revisions

Content deleted Content added
WikiCleanerBot (talk | contribs)
m v2.05b - Bot T20 CW#61 - Fix errors for CW project (Reference before punctuation)
m link pyramidal neurons
Line 4:
[[Artificial neural networks]] are combinations of multiple simple mathematical functions that implement more complicated functions from (typically) real-valued [[vector (mathematics and physics)|vectors]] to real-valued [[vector (mathematics and physics)|vectors]]. The spaces of multivariate functions that can be implemented by a network are determined by the structure of the network, the set of simple functions, and its multiplicative parameters. A great deal of theoretical work has gone into characterizing these function spaces.
 
In the [[mathematics|mathematical]] theory of [[artificial neural networks]], '''universal approximation theorems''' are results<ref name=MLP-UA>{{cite journal |last1=Hornik |first1=Kurt |last2=Stinchcombe |first2=Maxwell |last3=White |first3=Halbert |title=Multilayer feedforward networks are universal approximators |journal=Neural Networks |date=January 1989 |volume=2 |issue=5 |pages=359–366 |doi=10.1016/0893-6080(89)90020-8 }}</ref><ref>Balázs Csanád Csáji (2001) Approximation with Artificial Neural Networks; Faculty of Sciences; Eötvös Loránd University, Hungary</ref> that put limits on what neural networks can theoretically learn. Specifically, given an algorithm that generates the networks within a class of functions, the theorems establish the [[dense set|density]] of the generated functions within a given function space of interest. Typically, these results concern the approximation capabilities of the [[feedforward neural network|feedforward architecture]] on the space of continuous functions between two [[Euclidean space]]s, and the approximation is with respect to the [[compact convergence]] topology. What must be stressed, is that while some functions can be arbitrarily well approximated in a region, the proofs do not apply outside of the region, i.e. the approximated functions do not [[extrapolate]] outside of the region. That applies for all non-periodic [[activation function]]s, i.e. what's in practice used and most proofs assume. In recent years neocortical [[pyramidal neurons]] with oscillating activation function that can individually learn the XOR function have been discovered in the human brain and oscillating activation functions have been explored and shown to outperform popular activation functions on a variety of benchmarks.<ref>{{cite journal |last1=Gidon |first1=Albert |last2=Zolnik |first2=Timothy Adam |last3=Fidzinski |first3=Pawel |last4=Bolduan |first4=Felix |last5=Papoutsi |first5=Athanasia |last6=Poirazi |first6=Panayiota |last7=Holtkamp |first7=Martin |last8=Vida |first8=Imre |last9=Larkum |first9=Matthew Evan |title=Dendritic action potentials and computation in human layer 2/3 cortical neurons |journal=Science |date=3 January 2020 |volume=367 |issue=6473 |pages=83–87 |doi=10.1126/science.aax6239 |pmid=31896716 |bibcode=2020Sci...367...83G }}</ref>
 
However, there are also a variety of results between [[non-Euclidean space]]s<ref name=NonEuclidean>{{Cite conference|last1=Kratsios|first1=Anastasis|last2=Bilokopytov|first2=Eugene|date=2020|title=Non-Euclidean Universal Approximation|url=https://papers.nips.cc/paper/2020/file/786ab8c4d7ee758f80d57e65582e609d-Paper.pdf|publisher=Curran Associates|journal=Advances in Neural Information Processing Systems |volume=33}}</ref> and other commonly used architectures and, more generally, algorithmically generated sets of functions, such as the [[convolutional neural network]] (CNN) architecture,<ref>{{cite journal |doi=10.1016/j.acha.2019.06.004 |arxiv=1805.10769|title=Universality of deep convolutional neural networks|year=2020|last1=Zhou|first1=Ding-Xuan|journal=[[Applied and Computational Harmonic Analysis]]|volume=48|issue=2|pages=787–794|s2cid=44113176}}</ref><ref>{{Cite journal|doi = 10.1109/LSP.2020.3005051|title = Refinement and Universal Approximation via Sparsely Connected ReLU Convolution Nets|year = 2020|last1 = Heinecke|first1 = Andreas|last2 = Ho|first2 = Jinn|last3 = Hwang|first3 = Wen-Liang|journal = IEEE Signal Processing Letters|volume = 27|pages = 1175–1179|bibcode = 2020ISPL...27.1175H|s2cid = 220669183}}</ref> [[radial basis functions]],<ref>{{Cite journal|doi=10.1162/neco.1991.3.2.246|title=Universal Approximation Using Radial-Basis-Function Networks|year=1991|last1=Park|first1=J.|last2=Sandberg|first2=I. W.|journal=Neural Computation|volume=3|issue=2|pages=246–257|pmid=31167308|s2cid=34868087}}</ref> or neural networks with specific properties.<ref>{{cite journal |doi=10.1007/s00365-021-09546-1|arxiv=1804.10306|title=Universal Approximations of Invariant Maps by Neural Networks|year=2021|last1=Yarotsky|first1=Dmitry|journal=Constructive Approximation|volume=55 |pages=407–474 |s2cid=13745401}}</ref><ref>{{cite journal |last1=Zakwan |first1=Muhammad |last2=d’Angelo |first2=Massimiliano |last3=Ferrari-Trecate |first3=Giancarlo |title=Universal Approximation Property of Hamiltonian Deep Neural Networks |journal=IEEE Control Systems Letters |date=2023 |page=1 |doi=10.1109/LCSYS.2023.3288350 |arxiv=2303.12147 |s2cid=257663609 }}</ref> Most universal approximation theorems can be parsed into two classes. The first quantifies the approximation capabilities of neural networks with an arbitrary number of artificial neurons ("''arbitrary width''" case) and the second focuses on the case with an arbitrary number of hidden layers, each containing a limited number of artificial neurons ("''arbitrary depth''" case). In addition to these two classes, there are also universal approximation theorems for neural networks with bounded number of hidden layers and a limited number of neurons in each layer ("''bounded depth and bounded width''" case).