Content deleted Content added
→Table of activation functions: swiglu |
→Table of activation functions: swiglu |
||
Line 171:
|Swish plus GLU (SwiGLU)
|
| <math>x \cdot \sigma(\beta x) \cdot \operatorname{GLU}(x)</math>, where <math>\sigma(x)</math> is the sigmoid function, and <math>\operatorname{GLU}(x)</math> is a linear transformation gated by a sigmoid. Specifically, <math>\operatorname{GLU}(x) = W_1 x \cdot \sigma(W_2 x)</math>.
| <math>\sigma(\beta x) + x \beta \sigma(\beta x)(1-\sigma(\beta x))</math> (derivative of the swish part) times the GLU part, plus the derivative of the GLU part times the swish part. This gets quite complex so the best option here is not to write the derivative in its explicit mathematical form.
| <math>(-\infty, \infty)</math>
| <math>C^1</math>
|-
| Leaky rectified linear unit (Leaky ReLU)<ref>{{cite journal |last1=Maas |first1=Andrew L. |last2=Hannun |first2=Awni Y. |last3=Ng |first3=Andrew Y. |s2cid=16489696 |title=Rectifier nonlinearities improve neural network acoustic models |journal=Proc. ICML |date=June 2013 |volume=30 |issue=1}}</ref>
Line 287:
:{{note|kronecker_delta}} Here, <math>\delta_{ij}</math> is the [[Kronecker delta]].
:{{note|j}} For instance, <math>j</math> could be iterating through the number of kernels of the previous neural network layer while <math>i</math> iterates through the number of kernels of the current layer.
===Quantum activation functions ===
|