Capsule neural network: Difference between revisions

Content deleted Content added
Equivariance
Equivariance: rewrite
Line 14:
In Hinton's original idea one minicolumn would represent and detect one multidimensional entity.<ref>{{Citation|last=Meher Vamsi|title=Geoffrey Hinton Capsule theory|date=2017-11-15|url=https://www.youtube.com/watch?v=6S1_WqE55UQ|accessdate=2017-12-06}}</ref><ref group="note" name=":0" />
 
== EquivarianceTransformations ==
An [[Invariant (mathematics)|invariant]] is an object property that does not change as a result of some transformation. For example, the area of a circle does not change if the circle is shifted to the left.
Invariance represents representation that does not vary with transformation; Equivariance means that a transformation is equivalent to an expression of a transformation.
 
Informally, an [[Equivariant map|equivariant]] is a property that changes predictably under transformation. For example, the center of a circle moves by the same amount as the circle when shifted.<ref>{{Cite web|url=https://jhui.github.io/2017/11/14/Matrix-Capsules-with-EM-routing-Capsule-Network/|title=“Understanding Matrix capsules with EM Routing (Based on Hinton's Capsule Networks)”|website=jhui.github.io|access-date=2017-12-31}}</ref>
From the perspective of computer vision, invariance means that objects are not recognized by some transformations, which include translation, rotation, viewpoint, scale and so onWhatever the object of translation, rotation of 2 d, 3 d rotation and scale, we can according to the invariance to identify its nature, hence invariance in object recognition has its importance, but when our task becomes more complex and difficult, for example, when we want to explore how much object moving unit, how many degrees rotated, put the shrinkage ratio of the relative specific questions such as how much, just rely on invariance can't reach it, then we need to degeneration.
 
A nonequivariant is a property whose value does not change predictably under a transformation. For example, transforming a circle into an ellipse means that its perimeter can no longer be computed as π times the diameter.
Equivariance&nbsp;is the detection of objects that can transform to each other.<ref>{{Cite web|url=https://jhui.github.io/2017/11/14/Matrix-Capsules-with-EM-routing-Capsule-Network/|title=“Understanding Matrix capsules with EM Routing (Based on Hinton's Capsule Networks)”|website=jhui.github.io|access-date=2017-12-31}}</ref> A spatial relationship can be characterized represented by its ''pose'', data that describes the object's [[Translation (geometry)|translation]]&nbsp;and [[Rotation (mathematics)|rotation]]. Translation is a change in ___location in one or more dimensions, while rotation is a change in orientation.<ref name=":1" />
 
In computer vision, the class of an object is expected to be an invariant over many transformations. I.e., a cat is still a cat if it is shifted, turned upside down or shrunken in size. However, many other properties are instead equivariant. The volume of a cat changes when it is scaled.
[[Unsupervised learning|Unsupervised]] capsnets learn a global [[Affine space|linear manifold]] between a whole object and its pose (as a matrix of weights). As such, the&nbsp;''translation invariance''&nbsp;is encapsulated in the weights, rather than in neural activity (recognition), making the network&nbsp;''translation equivariant''. Multiplying the object by the manifold thereby poses the object (for an object, in space).<ref>{{Cite web|url=https://kndrck.co/posts/capsule_networks_explained/|title=Capsule Networks Explained|last=Tan|first=Kendrick|date=November 10, 2017|website=kndrck.co|language=en|archive-url=|archive-date=|dead-url=|access-date=2017-12-26}}</ref>
 
Equivariant properties such as a spatial relationship are captured in a ''pose'', data that describes an object's [[Translation (geometry)|translation]], [[Rotation (mathematics)|rotation]], scale and reflection. Translation is a change in ___location in one or more dimensions. Rotation is a change in orientation. Scale is a change in size. Reflection is a mirror image.<ref name=":1" />
 
[[Unsupervised learning|Unsupervised]] capsnets learn a global [[Affine space|linear manifold]] between an object and its pose as a matrix of weights. In other words, capsnets can identify an object independent of its pose, rather than having to learn to recognize the object while including its spatial relationships as part of the object. In capsnets, the pose can incorporate properties other than spatial relationships, e.g., color (cats can be of various colors).
 
[[Unsupervised learning|Unsupervised]] capsnets learn a global [[Affine space|linear manifold]] between a whole object and its pose (as a matrix of weights). As such, the&nbsp;''translation invariance''&nbsp;is encapsulated in the weights, rather than in neural activity (recognition), making the network&nbsp;''translation equivariant''. Multiplying the object by the manifold thereby poses the object (for an object, in space).<ref>{{Cite web|url=https://kndrck.co/posts/capsule_networks_explained/|title=Capsule Networks Explained|last=Tan|first=Kendrick|date=November 10, 2017|website=kndrck.co|language=en|archive-url=|archive-date=|dead-url=|access-date=2017-12-26}}</ref>
 
== Pooling ==