Revision as of 09:06, 18 August 2025 edit Hooman Mallahzadeh (talk \| contribs) Extended confirmed users 4,649 edits →Comparison of activation functions ← Previous edit		Revision as of 12:51, 18 August 2025 edit undo Hooman Mallahzadeh (talk \| contribs) Extended confirmed users 4,649 edits →Mathematical details Next edit →
Line 20: The most common activation functions can be divided into three categories: [[ridge function]]s, [[radial function]]s and [[fold function]]s. An activation function <math>f</math> is '''saturating''' if <math>\lim_{\|v\|\to \infty} \|\nabla f(v)\| = 0</math>. It is '''nonsaturating''' if ~~it is~~ <math>\lim_{\|v\|\to \infty} \|\nabla f(v)\| \neq 0</math>. Non-saturating activation functions, such as [[ReLU]], may be better than saturating activation functions, because they are less likely to suffer from the [[vanishing gradient problem]].<ref>{{Cite journal \|last1=Krizhevsky \|first1=Alex \|last2=Sutskever \|first2=Ilya \|last3=Hinton \|first3=Geoffrey E. \|date=24 May 2017 \|title=ImageNet classification with deep convolutional neural networks \|journal=Communications of the ACM \|volume=60 \|issue=6 \|pages=84–90 \|doi=10.1145/3065386 \|s2cid=195908774 \|issn=0001-0782\|doi-access=free }}</ref> === Ridge activation functions ===

Activation function: Difference between revisions