Capsule neural network: Difference between revisions

Content deleted Content added
top: capitalization; also, article uses "CapsNet" all in lowercase later
Line 62:
The pose vector <math display="inline">\mathbf{u}_{i}</math> is rotated and translated by a matrix <math display="inline">\mathbf{W}_{ij}</math> into a vector <math display="inline">\mathbf{\hat{u}}_{j|i}</math> that predicts the output of the parent capsule.
 
:<math display="block">\mathbf{\hat{u}}_{j|i} = \mathbf{W}_{ij} \mathbf{u}_{i}</math>
 
Capsules <math display="inline">s_{j}</math> in the next higher level are fed the sum of the predictions from all capsules in the lower layer, each with a coupling coefficient <math display="inline">c_{ij}</math>
 
:<math display="block">s_{j} = \sum{c_{ij} \mathbf{\hat{u}}_{j|i}}</math>
 
==== Procedure softmax ====
The coupling coefficients from a capsule <math display="inline">i</math> in layer <math display="inline">l</math> to all capsules in layer <math display="inline">l+1</math> sum to one, and are defined by a "[[Softmax function|routing softmax]]". The initial [[logit]]s <math display="inline">b_{ij}</math> are prior [[Log probability|log probabilities]] for the routing. That is the [[prior probability]] that capsule <math display="inline">i</math> in layer <math display="inline">l</math> should connect to capsule <math display="inline">j</math> in layer <math display="inline">l+1</math>. Normalization of the coupling coefficients:<ref name=":1"/>
 
:<math display="block">\begin{array}{lcl}
1: \mathbf{procedure}~ \mathrm{softmax} ( \mathbf{b}, i ) \\
2: \quad \triangleright \mbox{argument matrix} \\
Line 87:
Because the length of the vectors represents probabilities they should be between zero and one, and to do that a squashing function is applied:<ref name=":1"/>
 
:<math display="block">\begin{array}{lcl}
1: \mathbf{procedure}~ \mathrm{squash} ( \mathbf{a} ) \\
2: \quad \triangleright \mbox{argument vector} \\
Line 100:
One approach to routing is the following<ref name=":1"/>
 
:<math display="block">\begin{array}{lcl}
~~1: \mathbf{procedure}~ \mathrm{routing} ( \mathbf{\hat{u}}_{j|i}, r, l ) \\
~~2: \quad \triangleright \mbox{argument vector} \\
Line 132:
=== Margin loss ===
The length of the instantiation vector represents the probability that a capsule's entity is present in the scene. A top-level capsule has a long vector if and only if its associated entity is present. To allow for multiple entities, a separate [[Hinge loss|margin loss]] is computed for each capsule. Downweighting the loss for absent entities stops the learning from shrinking activity vector lengths for all entities. The total loss is the sum of the losses of all entities.<ref name=":1"/> In Hinton's example the loss function is:<ref name=":1"/>
:<math display="block">\begin{align}
L_{k} & = \underbrace{T_{k} ~ { \max \left ( 0, m^{+} - \| \mathbf{v}_{k} \| \right )}^{2}}_\mbox{class present}
+ \underbrace{\lambda \left ( 1 - T_{k} \right ) ~ { \max \left ( 0, \| \mathbf{v}_{k} \| - m^{-} \right )}^{2}}_\mbox{class not present}