Revision as of 03:10, 20 April 2025 edit JJMC89 bot III (talk \| contribs) Bots, Administrators 4,308,426 edits m Moving Category:Theorems in analysis to Category:Theorems in mathematical analysis per Wikipedia:Categories for discussion/Log/2025 April 12#Category:Theorems in analysis ← Previous edit		Revision as of 05:12, 2 June 2025 edit undo OAbot (talk \| contribs) Bots 643,717 edits m Open access bot: arxiv updated in citation with #oabot. Next edit →
Line 111: Remark: If the activation is replaced by leaky-ReLU, and the input is restricted in a compact ___domain, then the exact minimum width is<ref name=":1" /> <math>d_m = \max\{n, m, 2\}</math>. ''Quantitative refinement:'' In the case where <math>f:[0, 1]^n \rightarrow \mathbb{R} </math>, (i.e. <math> m = 1 </math>) and <math>\sigma</math> is the [[Rectifier (neural networks)\|ReLU activation function]], the exact depth and width for a ReLU network to achieve <math>\varepsilon</math> error is also known.<ref>{{cite journal \|last1=Shen \|first1=Zuowei \|last2=Yang \|first2=Haizhao \|last3=Zhang \|first3=Shijun \|title=Optimal approximation rate of ReLU networks in terms of width and depth \|journal=Journal de Mathématiques Pures et Appliquées \|date=January 2022 \|volume=157 \|pages=101–135 \|doi=10.1016/j.matpur.2021.07.009 \|arxiv=2103.00502 \|s2cid = 232075797 }}</ref> If, moreover, the target function <math>f</math> is smooth, then the required number of layer and their width can be exponentially smaller.<ref>{{cite journal \|last1=Lu \|first1=Jianfeng \|last2=Shen \|first2=Zuowei \|last3=Yang \|first3=Haizhao \|last4=Zhang \|first4=Shijun \|title=Deep Network Approximation for Smooth Functions \|journal = SIAM Journal on Mathematical Analysis \|date=January 2021 \|volume=53 \|issue=5 \|pages=5465–5506 \|doi=10.1137/20M134695X \|arxiv=2001.03040 \|s2cid=210116459 }}</ref> Even if <math>f</math> is not smooth, the curse of dimensionality can be broken if <math>f</math> admits additional "compositional structure".<ref>{{Cite journal \|last1=Juditsky \|first1=Anatoli B. \|last2=Lepski \|first2=Oleg V. \|last3=Tsybakov \|first3=Alexandre B. \|date=2009-06-01 \|title=Nonparametric estimation of composite functions \|journal=The Annals of Statistics \|volume=37 \|issue=3 \|doi=10.1214/08-aos611 \|s2cid=2471890 \|issn=0090-5364\|doi-access=free \|arxiv=0906.0865 }}</ref><ref>{{Cite journal \|last1=Poggio \|first1=Tomaso \|last2=Mhaskar \|first2=Hrushikesh \|last3=Rosasco \|first3=Lorenzo \|last4=Miranda \|first4=Brando \|last5=Liao \|first5=Qianli \|date=2017-03-14 \|title=Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review \|journal=International Journal of Automation and Computing \|volume=14 \|issue=5 \|pages=503–519 \|doi=10.1007/s11633-017-1054-2 \|s2cid=15562587 \|issn=1476-8186\|doi-access=free \|arxiv=1611.00740 }}</ref> }}

Universal approximation theorem: Difference between revisions