Activation function: Difference between revisions

Content deleted Content added
Linking.
Line 5:
The '''activation function''' of a node in an [[artificial neural network]] is a function that calculates the output of the node based on its individual inputs and their weights. Nontrivial problems can be solved using only a few nodes if the activation function is ''nonlinear''.<ref>{{Cite web|url=http://didattica.cs.unicam.it/lib/exe/fetch.php?media=didattica:magistrale:kebi:ay_1718:ke-11_neural_networks.pdf|title=Neural Networks, p. 7|last=Hinkelmann|first=Knut|website=University of Applied Sciences Northwestern Switzerland|access-date=2018-10-06|archive-date=2018-10-06|archive-url=https://web.archive.org/web/20181006235506/http://didattica.cs.unicam.it/lib/exe/fetch.php?media=didattica:magistrale:kebi:ay_1718:ke-11_neural_networks.pdf|url-status=dead}}</ref>
 
Modern activation functions include the logistic ([[Sigmoid function|sigmoid]]) function used in the 2012 [[speech recognition]] model developed by [[Geoffrey Hinton|Hinton]] et al;<ref>{{Cite journal |last1=Hinton |first1=Geoffrey |last2=Deng |first2=Li |last3=Deng |first3=Li |last4=Yu |first4=Dong |last5=Dahl |first5=George |last6=Mohamed |first6=Abdel-rahman |last7=Jaitly |first7=Navdeep |last8=Senior |first8=Andrew |last9=Vanhoucke |first9=Vincent |last10=Nguyen |first10=Patrick |last11=Sainath |first11=Tara|author11-link= Tara Sainath |last12=Kingsbury |first12=Brian |year=2012 |title=Deep Neural Networks for Acoustic Modeling in Speech Recognition |journal=IEEE Signal Processing Magazine |volume=29 |issue=6 |pages=82–97 |doi=10.1109/MSP.2012.2205597|s2cid=206485943 }}</ref> the [[ReLU]] used in the 2012 [[AlexNet]] computer vision model<ref>{{Cite journal |last1=Krizhevsky |first1=Alex |last2=Sutskever |first2=Ilya |last3=Hinton |first3=Geoffrey E. |date=2017-05-24 |title=ImageNet classification with deep convolutional neural networks |url=https://dl.acm.org/doi/10.1145/3065386 |journal=Communications of the ACM |language=en |volume=60 |issue=6 |pages=84–90 |doi=10.1145/3065386 |issn=0001-0782}}</ref><ref>{{Cite journal |last1=King Abdulaziz University |last2=Al-johania |first2=Norah |last3=Elrefaei |first3=Lamiaa |last4=Benha University |date=2019-06-30 |title=Dorsal Hand Vein Recognition by Convolutional Neural Networks: Feature Learning and Transfer Learning Approaches |url=http://www.inass.org/2019/2019063019.pdf |journal=International Journal of Intelligent Engineering and Systems |volume=12 |issue=3 |pages=178–191 |doi=10.22266/ijies2019.0630.19}}</ref> and in the 2015 [[Residual neural network|ResNet]] model; and the smooth version of the ReLU, the [[ReLU#Gaussian-error linear unit (GELU)|GELU]], which was used in the 2018 [[BERT (language model)|BERT]] model.<ref name="ReferenceA">{{Cite arXiv |eprint=1606.08415 |title=Gaussian Error Linear Units (GELUs) |last1=Hendrycks |first1=Dan |last2=Gimpel |first2=Kevin |year=2016 |class=cs.LG}}</ref>
 
==Comparison of activation functions==