Content deleted Content added
→Batch normalization: spatial batchnorm |
|||
Line 49:
=== Special cases ===
The original paper<ref name=":0" /> recommended to only use BatchNorms after a linear transform, not after a nonlinear activation. That is,
For [[Convolutional neural network|convolutional neural networks]] (CNN), BatchNorm must preserve the translation
Concretely, suppose we have a 2-dimensional convolutional layer defined by<math display="block">x^{(l)}_{h, w, c} = \sum_{h', w', c'} K^{(l)}_{h'-h, w'-w, c, c'} x_{h', w', c'}^{(l-1)} + b^{(l)}_c</math>where
Line 58:
* <math>b^{(l)}_c</math> is the bias term for the <math>c</math>-th channel of the <math>l</math>-th layer.
In order to preserve the translational invariance, BatchNorm treats all outputs from the same kernel in the same batch as more data in a batch. That is, it is applied once per ''kernel'' <math>c</math> (equivalently, once per channel <math>c</math>), not per ''activation'' <math>x^{(l+1)}_{h, w, c}</math>:<math display="block">
\begin{aligned}
\mu^{(l)}_c &= \frac{1}{BHW} \sum_{b=1}^B \sum_{h=1}^H \sum_{w=1}^W x^{(l)}_{(b), h, w, c} \\
|