Universal approximation theorem: Difference between revisions

Content deleted Content added
mNo edit summary
mNo edit summary
 
Line 28:
In 2018, Guliyev and Ismailov<ref name="guliyev1">{{Cite journal |last1=Guliyev |first1=Namig |last2=Ismailov |first2=Vugar |date=November 2018 |title=Approximation capability of two hidden layer feedforward neural networks with fixed weights |journal=Neurocomputing |volume=316 |pages=262–269 |arxiv=2101.09181 |doi=10.1016/j.neucom.2018.07.075 |s2cid=52285996}}</ref> constructed a smooth sigmoidal activation function providing universal approximation property for two hidden layer feedforward neural networks with less units in hidden layers. In 2018, they also constructed<ref name="guliyev2">{{Cite journal|last1=Guliyev|first1=Namig|last2=Ismailov|first2=Vugar|date=February 2018|title=On the approximation by single hidden layer feedforward neural networks with fixed weights|journal=Neural Networks|volume=98| pages=296–304|doi=10.1016/j.neunet.2017.12.007|pmid=29301110 |arxiv=1708.06219 |s2cid=4932839 }}</ref> single hidden layer networks with bounded width that are still universal approximators for univariate functions. However, this does not apply for multivariable functions.
 
In 2022, Shen ''et al.''<ref name=shen22>{{cite journal |last1=Shen |first1=Zuowei |last2=Yang |first2=Haizhao |last3=Zhang |first3=Shijun |date=January 2022 |title=Optimal approximation rate of ReLU networks in terms of width and depth |journal=Journal de Mathématiques Pures et Appliquées |volume=157 |pages=101–135 |arxiv=2103.00502 |doi=10.1016/j.matpur.2021.07.009 |s2cid=232075797}}</ref> obtained precise quantitative information on the depth and width required to approximate a target function by deep and wide ReLU neural networks.
 
=== Quantitative bounds ===
Line 111:
Remark: If the activation is replaced by leaky-ReLU, and the input is restricted in a compact ___domain, then the exact minimum width is<ref name=":1" /> <math>d_m = \max\{n, m, 2\}</math>.
 
''Quantitative refinement:'' In the case where <math>f:[0, 1]^n \rightarrow \mathbb{R} </math>, (i.e. <math> m = 1 </math>) and <math>\sigma</math> is the [[Rectifier (neural networks)|ReLU activation function]], the exact depth and width for a ReLU network to achieve <math>\varepsilon</math> error is also known.<ref>{{cite journal |last1name=Shen |first1=Zuowei |last2=Yang |first2=Haizhao |last3=Zhang |first3=Shijun |title=Optimal approximation rate of ReLU networks in terms of width and depth |journal=Journal de Mathématiques Pures et Appliquées |date=January 2022 |volume=157 |pages=101–135 |doi=10.1016shen22/j.matpur.2021.07.009 |arxiv=2103.00502 |s2cid = 232075797 }}</ref> If, moreover, the target function <math>f</math> is smooth, then the required number of layer and their width can be exponentially smaller.<ref>{{cite journal |last1=Lu |first1=Jianfeng |last2=Shen |first2=Zuowei |last3=Yang |first3=Haizhao |last4=Zhang |first4=Shijun |title=Deep Network Approximation for Smooth Functions |journal = SIAM Journal on Mathematical Analysis |date=January 2021 |volume=53 |issue=5 |pages=5465–5506 |doi=10.1137/20M134695X |arxiv=2001.03040 |s2cid=210116459 }}</ref> Even if <math>f</math> is not smooth, the curse of dimensionality can be broken if <math>f</math> admits additional "compositional structure".<ref>{{Cite journal |last1=Juditsky |first1=Anatoli B. |last2=Lepski |first2=Oleg V. |last3=Tsybakov |first3=Alexandre B. |date=2009-06-01 |title=Nonparametric estimation of composite functions |journal=The Annals of Statistics |volume=37 |issue=3 |doi=10.1214/08-aos611 |s2cid=2471890 |issn=0090-5364|doi-access=free |arxiv=0906.0865 }}</ref><ref>{{Cite journal |last1=Poggio |first1=Tomaso |last2=Mhaskar |first2=Hrushikesh |last3=Rosasco |first3=Lorenzo |last4=Miranda |first4=Brando |last5=Liao |first5=Qianli |date=2017-03-14 |title=Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review |journal=International Journal of Automation and Computing |volume=14 |issue=5 |pages=503–519 |doi=10.1007/s11633-017-1054-2 |s2cid=15562587 |issn=1476-8186|doi-access=free |arxiv=1611.00740 }}</ref>
}}