Content deleted Content added
Line 2:
{{About||the formalism used to approximate the influence of an extracellular electrical field on neurons|activating function|a linear system’s transfer function|transfer function}}
{{Machine learning}}
{{Use dmy dates|date=August 2025}}
[[File:Logistic-curve.svg|thumb|Logistic activation function]]
The '''activation function''' of a node in an [[artificial neural network]] is a function that calculates the output of the node based on its individual inputs and their weights. Nontrivial problems can be solved using only a few nodes if the activation function is ''nonlinear''.<ref>{{Cite web|url=http://didattica.cs.unicam.it/lib/exe/fetch.php?media=didattica:magistrale:kebi:ay_1718:ke-11_neural_networks.pdf|title=Neural Networks, p. 7|last=Hinkelmann|first=Knut|website=University of Applied Sciences Northwestern Switzerland|access-date=6 October 2018
Modern activation functions include the logistic ([[Sigmoid function|sigmoid]]) function used in the 2012 [[speech recognition]] model developed by [[Geoffrey Hinton|Hinton]] et al;<ref>{{Cite journal |last1=Hinton |first1=Geoffrey |last2=Deng |first2=Li |last3=Deng |first3=Li |last4=Yu |first4=Dong |last5=Dahl |first5=George |last6=Mohamed |first6=Abdel-rahman |last7=Jaitly |first7=Navdeep |last8=Senior |first8=Andrew |last9=Vanhoucke |first9=Vincent |last10=Nguyen |first10=Patrick |last11=Sainath |first11=Tara|author11-link= Tara Sainath |last12=Kingsbury |first12=Brian |year=2012 |title=Deep Neural Networks for Acoustic Modeling in Speech Recognition |journal=IEEE Signal Processing Magazine |volume=29 |issue=6 |pages=82–97 |doi=10.1109/MSP.2012.2205597|s2cid=206485943 }}</ref> the [[ReLU]] used in the 2012 [[AlexNet]] computer vision model<ref>{{Cite journal |last1=Krizhevsky |first1=Alex |last2=Sutskever |first2=Ilya |last3=Hinton |first3=Geoffrey E. |date=
==Comparison of activation functions==
Line 19 ⟶ 20:
The most common activation functions can be divided into three categories: [[ridge function]]s, [[radial function]]s and [[fold function]]s.
An activation function <math>f</math> is '''saturating''' if <math>\lim_{|v|\to \infty} |\nabla f(v)| = 0</math>. It is '''nonsaturating''' if it is <math>\lim_{|v|\to \infty} |\nabla f(v)| \neq 0</math>. Non-saturating activation functions, such as [[ReLU]], may be better than saturating activation functions, because they are less likely to suffer from the [[vanishing gradient problem]].<ref>{{Cite journal |last1=Krizhevsky |first1=Alex |last2=Sutskever |first2=Ilya |last3=Hinton |first3=Geoffrey E. |date=
=== Ridge activation functions ===
Line 30 ⟶ 31:
* [[logistic function|Logistic]] activation: <math>\phi(\mathbf v) = (1+\exp(-a-\mathbf v'\mathbf b))^{-1}</math>.
In [[Biological neural network|biologically inspired neural networks]], the activation function is usually an abstraction representing the rate of [[action potential]] firing in the cell.<ref>{{Cite journal|last1=Hodgkin|first1=A. L.|last2=Huxley|first2=A. F.|date=
The function looks like <math>\phi(\mathbf v)=U(a + \mathbf v'\mathbf b)</math>, where <math>U</math> is the [[Heaviside step function]].
Line 50 ⟶ 51:
Periodic functions can serve as activation functions. Usually the [[Sine wave|sinusoid]] is used, as any periodic function is decomposable into sinusoids by the [[Fourier transform]].<ref>{{Cite journal |last1=Sitzmann |first1=Vincent |last2=Martel |first2=Julien |last3=Bergman |first3=Alexander |last4=Lindell |first4=David |last5=Wetzstein |first5=Gordon |date=2020 |title=Implicit Neural Representations with Periodic Activation Functions |url=https://proceedings.neurips.cc/paper/2020/hash/53c04118df112c13a8c34b38343b9c10-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=33 |pages=7462–7473|arxiv=2006.09661 }}</ref>
Quadratic activation maps <math>x \mapsto x^2</math>.<ref>{{Citation |last=Flake |first=Gary William |title=Square Unit Augmented Radially Extended Multilayer Perceptrons |date=1998 |work=Neural Networks: Tricks of the Trade |series=Lecture Notes in Computer Science |volume=1524 |pages=145–163 |editor-last=Orr |editor-first=Genevieve B. |url=https://link.springer.com/chapter/10.1007/3-540-49430-8_8 |access-date=5 October 2024
=== Folding activation functions ===
Line 145 ⟶ 146:
| <math>C^\infty</math>
|-
| [[Rectifier (neural networks)#ELU|Exponential linear unit (ELU)]]<ref>{{Cite arXiv|last1=Clevert|first1=Djork-Arné|last2=Unterthiner|first2=Thomas|last3=Hochreiter|first3=Sepp|date=
| [[File:Activation elu.svg]]
| <math>\begin{cases}
Line 162 ⟶ 163:
\end{cases}</math>
|-
| Scaled exponential linear unit (SELU)<ref>{{Cite journal |last1=Klambauer |first1=Günter |last2=Unterthiner |first2=Thomas |last3=Mayr |first3=Andreas |last4=Hochreiter |first4=Sepp |date=8 June 2017
| [[File:Activation selu.png]]
| <math>\lambda \begin{cases}
Line 189 ⟶ 190:
| <math>C^0</math>
|-
| Parametric rectified linear unit (PReLU)<ref>{{Cite arXiv |last1=He |first1=Kaiming |last2=Zhang |first2=Xiangyu |last3=Ren |first3=Shaoqing |last4=Sun |first4=Jian |date=6 February 2015
| [[File:Activation prelu.svg]]
| <math>\begin{cases}
Line 234 ⟶ 235:
| <math>C^\infty</math>
|-
|Exponential Linear Sigmoid SquasHing (ELiSH)<ref>{{Citation |last1=Basirat |first1=Mina |title=The Quest for the Golden Activation Function |date=2 August 2018
|[[File:Elish_activation_function.png|thumb|An image of the ELiSH activation function plotted over the range [-3, 3] with a minumum value of ~0.881 at x ~= -0.172]]
|<math>\begin{cases}
Line 302 ⟶ 303:
== Further reading ==
* {{Citation |last1=Kunc |first1=Vladimír |title=Three Decades of Activations: A Comprehensive Survey of 400 Activation Functions for Neural Networks |date=
* {{cite arXiv |last1=Nwankpa |first1=Chigozie |title=Activation Functions: Comparison of trends in Practice and Research for Deep Learning |date=8 November 2018
* {{cite journal |last1=Dubey |first1=Shiv Ram |last2=Singh |first2=Satish Kumar |last3=Chaudhuri |first3=Bidyut Baran |year=2022 |title=Activation functions in deep learning: A comprehensive survey and benchmark |journal=Neurocomputing |publisher=Elsevier BV |volume=503 |pages=92–108 |doi=10.1016/j.neucom.2022.06.111 |issn=0925-2312 |doi-access=free|arxiv=2109.14545 }}
|