Content deleted Content added
Songjianbo (talk | contribs) Equivariance |
→Equivariance: rewrite |
||
Line 14:
In Hinton's original idea one minicolumn would represent and detect one multidimensional entity.<ref>{{Citation|last=Meher Vamsi|title=Geoffrey Hinton Capsule theory|date=2017-11-15|url=https://www.youtube.com/watch?v=6S1_WqE55UQ|accessdate=2017-12-06}}</ref><ref group="note" name=":0" />
==
An [[Invariant (mathematics)|invariant]] is an object property that does not change as a result of some transformation. For example, the area of a circle does not change if the circle is shifted to the left.
Informally, an [[Equivariant map|equivariant]] is a property that changes predictably under transformation. For example, the center of a circle moves by the same amount as the circle when shifted.<ref>{{Cite web|url=https://jhui.github.io/2017/11/14/Matrix-Capsules-with-EM-routing-Capsule-Network/|title=“Understanding Matrix capsules with EM Routing (Based on Hinton's Capsule Networks)”|website=jhui.github.io|access-date=2017-12-31}}</ref>
A nonequivariant is a property whose value does not change predictably under a transformation. For example, transforming a circle into an ellipse means that its perimeter can no longer be computed as π times the diameter.
In computer vision, the class of an object is expected to be an invariant over many transformations. I.e., a cat is still a cat if it is shifted, turned upside down or shrunken in size. However, many other properties are instead equivariant. The volume of a cat changes when it is scaled.
[[Unsupervised learning|Unsupervised]] capsnets learn a global [[Affine space|linear manifold]] between a whole object and its pose (as a matrix of weights). As such, the ''translation invariance'' is encapsulated in the weights, rather than in neural activity (recognition), making the network ''translation equivariant''. Multiplying the object by the manifold thereby poses the object (for an object, in space).<ref>{{Cite web|url=https://kndrck.co/posts/capsule_networks_explained/|title=Capsule Networks Explained|last=Tan|first=Kendrick|date=November 10, 2017|website=kndrck.co|language=en|archive-url=|archive-date=|dead-url=|access-date=2017-12-26}}</ref>▼
Equivariant properties such as a spatial relationship are captured in a ''pose'', data that describes an object's [[Translation (geometry)|translation]], [[Rotation (mathematics)|rotation]], scale and reflection. Translation is a change in ___location in one or more dimensions. Rotation is a change in orientation. Scale is a change in size. Reflection is a mirror image.<ref name=":1" />
[[Unsupervised learning|Unsupervised]] capsnets learn a global [[Affine space|linear manifold]] between an object and its pose as a matrix of weights. In other words, capsnets can identify an object independent of its pose, rather than having to learn to recognize the object while including its spatial relationships as part of the object. In capsnets, the pose can incorporate properties other than spatial relationships, e.g., color (cats can be of various colors).
▲
== Pooling ==
|