Revision as of 19:05, 17 August 2025 edit Pol098 (talk \| contribs) Extended confirmed users, Pending changes reviewers 123,894 edits mNo edit summary ← Previous edit		Latest revision as of 19:08, 17 August 2025 edit undo Pol098 (talk \| contribs) Extended confirmed users, Pending changes reviewers 123,894 edits mNo edit summary
Line 28: In 2018, Guliyev and Ismailov<ref name="guliyev1">{{Cite journal \|last1=Guliyev \|first1=Namig \|last2=Ismailov \|first2=Vugar \|date=November 2018 \|title=Approximation capability of two hidden layer feedforward neural networks with fixed weights \|journal=Neurocomputing \|volume=316 \|pages=262–269 \|arxiv=2101.09181 \|doi=10.1016/j.neucom.2018.07.075 \|s2cid=52285996}}</ref> constructed a smooth sigmoidal activation function providing universal approximation property for two hidden layer feedforward neural networks with less units in hidden layers. In 2018, they also constructed<ref name="guliyev2">{{Cite journal\|last1=Guliyev\|first1=Namig\|last2=Ismailov\|first2=Vugar\|date=February 2018\|title=On the approximation by single hidden layer feedforward neural networks with fixed weights\|journal=Neural Networks\|volume=98\| pages=296–304\|doi=10.1016/j.neunet.2017.12.007\|pmid=29301110 \|arxiv=1708.06219 \|s2cid=4932839 }}</ref> single hidden layer networks with bounded width that are still universal approximators for univariate functions. However, this does not apply for multivariable functions. In 2022, Shen ''et al.''<ref name=shen22>{{cite journal \|last1=Shen \|first1=Zuowei \|last2=Yang \|first2=Haizhao \|last3=Zhang \|first3=Shijun \|date=January 2022 \|title=Optimal approximation rate of ReLU networks in terms of width and depth \|journal=Journal de Mathématiques Pures et Appliquées \|volume=157 \|pages=101–135 \|arxiv=2103.00502 \|doi=10.1016/j.matpur.2021.07.009 \|s2cid=232075797}}</ref> obtained precise quantitative information on the depth and width required to approximate a target function by deep and wide ReLU neural networks. === Quantitative bounds === Line 111: Remark: If the activation is replaced by leaky-ReLU, and the input is restricted in a compact ___domain, then the exact minimum width is<ref name=":1" /> <math>d_m = \max\{n, m, 2\}</math>. ''Quantitative refinement:'' In the case where <math>f:[0, 1]^n \rightarrow \mathbb{R} </math>, (i.e. <math> m = 1 </math>) and <math>\sigma</math> is the [[Rectifier (neural networks)\|ReLU activation function]], the exact depth and width for a ReLU network to achieve <math>\varepsilon</math> error is also known.<ref~~>{{cite~~ ~~journal \|last1~~name=Shen \|first1=Zuowei \|last2=Yang \|first2=Haizhao \|last3=Zhang \|first3=Shijun \|title=Optimal approximation rate of ReLU networks in terms of width and depth \|journal=Journal de Mathématiques Pures et Appliquées \|date=January 2022 \|volume=157 \|pages=101–135 \|doi=10.1016shen22/~~j.matpur.2021.07.009 \|arxiv=2103.00502 \|s2cid = 232075797 }}</ref~~> If, moreover, the target function <math>f</math> is smooth, then the required number of layer and their width can be exponentially smaller.<ref>{{cite journal \|last1=Lu \|first1=Jianfeng \|last2=Shen \|first2=Zuowei \|last3=Yang \|first3=Haizhao \|last4=Zhang \|first4=Shijun \|title=Deep Network Approximation for Smooth Functions \|journal = SIAM Journal on Mathematical Analysis \|date=January 2021 \|volume=53 \|issue=5 \|pages=5465–5506 \|doi=10.1137/20M134695X \|arxiv=2001.03040 \|s2cid=210116459 }}</ref> Even if <math>f</math> is not smooth, the curse of dimensionality can be broken if <math>f</math> admits additional "compositional structure".<ref>{{Cite journal \|last1=Juditsky \|first1=Anatoli B. \|last2=Lepski \|first2=Oleg V. \|last3=Tsybakov \|first3=Alexandre B. \|date=2009-06-01 \|title=Nonparametric estimation of composite functions \|journal=The Annals of Statistics \|volume=37 \|issue=3 \|doi=10.1214/08-aos611 \|s2cid=2471890 \|issn=0090-5364\|doi-access=free \|arxiv=0906.0865 }}</ref><ref>{{Cite journal \|last1=Poggio \|first1=Tomaso \|last2=Mhaskar \|first2=Hrushikesh \|last3=Rosasco \|first3=Lorenzo \|last4=Miranda \|first4=Brando \|last5=Liao \|first5=Qianli \|date=2017-03-14 \|title=Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review \|journal=International Journal of Automation and Computing \|volume=14 \|issue=5 \|pages=503–519 \|doi=10.1007/s11633-017-1054-2 \|s2cid=15562587 \|issn=1476-8186\|doi-access=free \|arxiv=1611.00740 }}</ref> }}

Universal approximation theorem: Difference between revisions