Content deleted Content added
Split "modern activation functions" into its own paragraph and reordered the functions, putting GELU after ReLU. |
→Table of activation functions: swiglu |
||
Line 126:
&\frac{1}{2} x \left(1 + \text{erf}\left(\frac{x}{\sqrt{2}}\right)\right) \\
{}={} &x\Phi(x)
\end{align}</math>
| <math>\Phi(x) + \frac 12 x\phi(x)</math>
| <math>(-0.17\ldots, \infty)</math>
| <math>C^\infty</math>
Line 169:
| <math>C^0</math>
|-
|Swish plus GLU (SwiGLU)
| Leaky rectified linear unit (Leaky ReLU)<ref>{{cite journal|last1=Maas|first1=Andrew L.|last2=Hannun|first2=Awni Y.|last3=Ng|first3=Andrew Y.|s2cid=16489696|title=Rectifier nonlinearities improve neural network acoustic models|journal=Proc. ICML|date=June 2013|volume=30|issue=1}}</ref>▼
|
|
|
|
|
|-
▲| Leaky rectified linear unit (Leaky ReLU)<ref>{{cite journal |last1=Maas |first1=Andrew L. |last2=Hannun |first2=Awni Y. |last3=Ng |first3=Andrew Y. |s2cid=16489696 |title=Rectifier nonlinearities improve neural network acoustic models |journal=Proc. ICML |date=June 2013 |volume=30 |issue=1}}</ref>
| [[File:Activation prelu.svg]]
| <math>\begin{cases}
Line 201 ⟶ 208:
where
<math>g_{\lambda, \sigma, \mu, \beta}(x) = \frac{ (x - \lambda) {1}_{ \{ x \geqslant \lambda \} } }{ 1 + e^{- \sgn(x-\mu) \left( \frac{\vert x-\mu \vert}{\sigma} \right)^\beta } } </math>
<ref name="refrepsu1">
{{Citation
|first1=Abdourrahmane M.|last1=Atto|first2=Sylvie|last2=Galichet|first3=Dominique|last3=Pastor|first4=Nicolas|last4=Méger
Line 213 ⟶ 220:
| <math>C^0</math>
|-
| Sigmoid linear unit (SiLU,<ref name="ReferenceA" /> Sigmoid shrinkage,<ref name="refssbs1">
{{Citation
|first1=Abdourrahmane M.|last1=Atto|first2=Dominique|last2=Pastor|first3=Grégoire|last3=Mercier
|