Capsule neural network: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Add: s2cid. | Use this bot. Report bugs. | Suggested by Spinixster | Category:Artificial neural networks | #UCB_Category 116/146
Line 1:
A '''capsule neural network''' ('''CapsNet''') is a machine learning system that is a type of [[artificial neural network]] (ANN) that can be used to better model hierarchical relationships. The approach is an attempt to more closely mimic biological neural organization.<ref name=":1" />
 
The idea is to add structures called “capsules”"capsules" to a [[convolutional neural network]] (CNN), and to reuse output from several of those capsules to form more stable (with respect to various perturbations) representations for higher capsules.<ref>{{Cite book|last1=Hinton|first1=Geoffrey E.|last2=Krizhevsky|first2=Alex|last3=Wang|first3=Sida D.|date=2011-06-14|title=Transforming Auto-Encoders|journal=Artificial Neural Networks and Machine Learning – ICANN 2011|volume=6791|series=Lecture Notes in Computer Science|language=en|publisher=Springer, Berlin, Heidelberg|pages=44–51|doi=10.1007/978-3-642-21735-7_6|isbn=9783642217340|citeseerx=10.1.1.220.5099|s2cid=6138085 }}</ref> The output is a vector consisting of the [[Realization (probability)|probability of an observation]], and a [[Pose (computer vision)|pose for that observation]]. This vector is similar to what is done for example when doing ''[[classification with localization]]'' in CNNs.
 
Among other benefits, capsnets address the "Picasso problem" in image recognition: images that have all the right parts but that are not in the correct spatial relationship (e.g., in a "face", the positions of the mouth and one eye are switched). For image recognition, capsnets exploit the fact that while viewpoint changes have nonlinear effects at the pixel level, they have linear effects at the part/object level.<ref name=":16">{{cite web|url=http://www.cedar.buffalo.edu/~srihari/CSE676/9.12%20CapsuleNets.pdf|title=Capsule Nets|last=Srihari|first=Sargur|publisher=[[University of Buffalo]]|access-date=2017-12-07}}</ref> This can be compared to inverting the rendering of an object of multiple parts.<ref name=":0">{{Cite book|url=http://papers.nips.cc/paper/1710-learning-to-parse-images.pdf|title=Advances in Neural Information Processing Systems 12|last1=Hinton|first1=Geoffrey E|last2=Ghahramani|first2=Zoubin|last3=Teh|first3=Yee Whye|date=2000|publisher=MIT Press|editor-last=Solla|editor-first=S. A.|editor-link=Sara Solla|pages=463–469|editor-last2=Leen|editor-first2=T. K.|editor-last3=Müller|editor-first3=K.}}</ref>
Line 156:
Capsnets are hierarchical, in that each lower-level capsule contributes significantly to only one higher-level capsule.<ref name=":1"/>
 
However, replicating learned knowledge remains valuable. To achieve this, a capsnet's lower layers are [[convolution]]al, including hidden capsule layers. Higher layers thus cover larger regions, while retaining information about the precise position of each object within the region. For low level capsules, ___location information is “place"place-coded”coded" according to which capsule is active. Higher up, more and more of the positional information is [[Neural coding|rate-coded]] in the capsule's output vector. This shift from place-coding to rate-coding, combined with the fact that higher-level capsules represent more complex objects with more degrees of freedom, suggests that capsule dimensionality increases with level.<ref name=":1"/>
 
== Human vision ==