Content deleted Content added
mNo edit summary |
No edit summary |
||
Line 3:
The idea is to add structures called capsules to a [[convolutional neural network]] (CNN), and to reuse output from several of those capsules to form more stable (with respect to various perturbations) representations for higher order capsules.<ref>{{Cite journal|last=Hinton|first=Geoffrey E.|last2=Krizhevsky|first2=Alex|last3=Wang|first3=Sida D.|date=2011-06-14|title=Transforming Auto-Encoders|url=https://link.springer.com/chapter/10.1007/978-3-642-21735-7_6|journal=Artificial Neural Networks and Machine Learning – ICANN 2011|volume=6791|series=Lecture Notes in Computer Science|language=en|publisher=Springer, Berlin, Heidelberg|pages=44–51|doi=10.1007/978-3-642-21735-7_6|isbn=9783642217340}}</ref> The output is a vector consisting of the [[Realization (probability)|probability of an observation]], and a [[Pose (computer vision)|pose for that observation]]. This vector is similar to what is done for example when doing ''classification with localization'' in CNNs.
Among other benefits, capsnets address the "Picasso problem" in image recognition: images that have all the right parts but that are not in the correct spatial relationship (e.g., in a "face", the positions of the mouth and one eye are switched). For image recognition, capsnets
{{TOC limit|3}}
|