Activation function: Difference between revisions

Content deleted Content added
Line 168:
| <math>(-\lambda\alpha,\infty)</math>
| <math>C^0</math>
|-
|Swish plus GLU (SwiGLU)
|
| <math>x \cdot \sigma(\beta x) \cdot \operatorname{GLU}(x)</math>, where <math>\sigma(x)</math> is the sigmoid function, and <math>\operatorname{GLU}(x)</math> is a linear transformation gated by a sigmoid. Specifically, <math>\operatorname{GLU}(x) = W_1 x \cdot \sigma(W_2 x)</math>.
| <math>\sigma(\beta x) + x \beta \sigma(\beta x)(1-\sigma(\beta x))</math> (derivative of the swish part) times the GLU part, plus the derivative of the GLU part times the swish part. This gets quite complex so the best option here is not to write the derivative in its explicit mathematical form.
| <math>(-\infty, \infty)</math>
| <math>C^1</math>
|-
| Leaky rectified linear unit (Leaky ReLU)<ref>{{cite journal |last1=Maas |first1=Andrew L. |last2=Hannun |first2=Awni Y. |last3=Ng |first3=Andrew Y. |s2cid=16489696 |title=Rectifier nonlinearities improve neural network acoustic models |journal=Proc. ICML |date=June 2013 |volume=30 |issue=1}}</ref>
Line 189 ⟶ 182:
| <math>C^0</math>
|-
| Parametric rectified linear unit (PReLU)<ref>{{Cite arXiv |last1=He |first1=Kaiming |last2=Zhang |first2=Xiangyu |last3=Ren |first3=Shaoqing |last4=Sun |first4=Jian |date=2015-02-06 |title=Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification |eprint=1502.01852 |class=cs.CV}}</ref>
| [[File:Activation prelu.svg]]
| <math>\begin{cases}