Capsule neural network: Difference between revisions

Content deleted Content added
Adding short description: "Type of artificial neural network"
 
(11 intermediate revisions by 8 users not shown)
Line 1:
{{Short description|Type of artificial neural network}}
A '''Capsulecapsule Neuralneural Networknetwork''' ('''CapsNet''') is a machine learning system that is a type of [[artificial neural network]] (ANN) that can be used to better model hierarchical relationships. The approach is an attempt to more closely mimic biological neural organization.<ref name=":1" />
 
The idea is to add structures called “capsules”"capsules" to a [[convolutional neural network]] (CNN), and to reuse output from several of those capsules to form more stable (with respect to various perturbations) representations for higher capsules.<ref>{{Cite book|last1=Hinton|first1=Geoffrey E.|last2=Krizhevsky|first2=Alex|last3=Wang|first3=Sida D.|date=2011-06-14|title=Transforming Auto-Encoders|journal=Artificial Neural Networks and Machine Learning – ICANN 2011 |chapter=Transforming Auto-Encoders |date=2011-06-14|volume=6791|series=Lecture Notes in Computer Science|language=en|publisher=Springer, Berlin, Heidelberg|pages=44–51|doi=10.1007/978-3-642-21735-7_6|isbn=9783642217340|citeseerx=10.1.1.220.5099|s2cid=6138085 }}</ref> The output is a vector consisting of the [[Realization (probability)|probability of an observation]], and a [[Pose (computer vision)|pose for that observation]]. This vector is similar to what is done for example when doing ''[[classification with localization]]'' in CNNs.
 
Among other benefits, capsnets address the "Picasso problem" in image recognition: images that have all the right parts but that are not in the correct spatial relationship (e.g., in a "face", the positions of the mouth and one eye are switched). For image recognition, capsnets exploit the fact that while viewpoint changes have nonlinear effects at the pixel level, they have linear effects at the part/object level.<ref name=":16">{{cite web|url=http://www.cedar.buffalo.edu/~srihari/CSE676/9.12%20CapsuleNets.pdf|title=Capsule Nets|last=Srihari|first=Sargur|publisher=[[University of Buffalo]]|access-date=2017-12-07}}</ref> This can be compared to inverting the rendering of an object of multiple parts.<ref name=":0">{{Cite book|url=http://papers.nips.cc/paper/1710-learning-to-parse-images.pdf|title=Advances in Neural Information Processing Systems 12|last1=Hinton|first1=Geoffrey E|last2=Ghahramani|first2=Zoubin|last3=Teh|first3=Yee Whye|date=2000|publisher=MIT Press|editor-last=Solla|editor-first=S. A.|editor-link=Sara Solla|pages=463–469|editor-last2=Leen|editor-first2=T. K.|editor-last3=Müller|editor-first3=K.}}</ref>
Line 30 ⟶ 31:
 
== Pooling ==
Capsnets reject the [[convolutional neural network#Pooling layer|pooling layer]] strategy of conventional CNNs that reduces the amount of detail to be processed at the next higher layer. Pooling allows a degree of translational invariance (it can recognize the same object in a somewhat different ___location) and allows a larger number of feature types to be represented. Capsnet proponents argue that pooling:<ref name=":1"/>
* violates biological shape perception in that it has no intrinsic coordinate frame;
* provides invariance (discarding positional information) instead of equivariance (disentangling that information);
Line 62 ⟶ 63:
The pose vector <math display="inline">\mathbf{u}_{i}</math> is rotated and translated by a matrix <math display="inline">\mathbf{W}_{ij}</math> into a vector <math display="inline">\mathbf{\hat{u}}_{j|i}</math> that predicts the output of the parent capsule.
 
:<math display="block">\mathbf{\hat{u}}_{j|i} = \mathbf{W}_{ij} \mathbf{u}_{i}</math>
 
Capsules <math display="inline">s_{j}</math> in the next higher level are fed the sum of the predictions from all capsules in the lower layer, each with a coupling coefficient <math display="inline">c_{ij}</math>
 
:<math display="block">s_{j} = \sum{c_{ij} \mathbf{\hat{u}}_{j|i}}</math>
 
==== Procedure softmax ====
The coupling coefficients from a capsule <math display="inline">i</math> in layer <math display="inline">l</math> to all capsules in layer <math display="inline">l+1</math> sum to one, and are defined by a "[[Softmax function|routing softmax]]". The initial [[logit]]s <math display="inline">b_{ij}</math> are prior [[Log probability|log probabilities]] for the routing. That is the [[prior probability]] that capsule <math display="inline">i</math> in layer <math display="inline">l</math> should connect to capsule <math display="inline">j</math> in layer <math display="inline">l+1</math>. Normalization of the coupling coefficients:<ref name=":1"/>
 
:<math display="block">\begin{array}{lcl}
1: \mathbf{procedure}~ \mathrm{softmax} ( \mathbf{b}, i ) \\
2: \quad \triangleright \mbox{argument matrix} \\
Line 85 ⟶ 86:
 
==== Procedure squash ====
Because the length of the vectors represents probabilities they should be between zero (0) and one (1), and to do that a squashing function is applied:<ref name=":1"/>
 
:<math display="block">\begin{array}{lcl}
1: \mathbf{procedure}~ \mathrm{squash} ( \mathbf{a} ) \\
2: \quad \triangleright \mbox{argument vector} \\
Line 100 ⟶ 101:
One approach to routing is the following<ref name=":1"/>
 
:<math display="block">\begin{array}{lcl}
~~1: \mathbf{procedure}~ \mathrm{routing} ( \mathbf{\hat{u}}_{j|i}, r, l ) \\
~~2: \quad \triangleright \mbox{argument vector} \\
Line 132 ⟶ 133:
=== Margin loss ===
The length of the instantiation vector represents the probability that a capsule's entity is present in the scene. A top-level capsule has a long vector if and only if its associated entity is present. To allow for multiple entities, a separate [[Hinge loss|margin loss]] is computed for each capsule. Downweighting the loss for absent entities stops the learning from shrinking activity vector lengths for all entities. The total loss is the sum of the losses of all entities.<ref name=":1"/> In Hinton's example the loss function is:<ref name=":1"/>
:<math display="block">\begin{align}
L_{k} & = \underbrace{T_{k} ~ { \max \left ( 0, m^{+} - \| \mathbf{v}_{k} \| \right )}^{2}}_\mbox{class present}
+ \underbrace{\lambda \left ( 1 - T_{k} \right ) ~ { \max \left ( 0, \| \mathbf{v}_{k} \| - m^{-} \right )}^{2}}_\mbox{class not present}
Line 156 ⟶ 157:
Capsnets are hierarchical, in that each lower-level capsule contributes significantly to only one higher-level capsule.<ref name=":1"/>
 
However, replicating learned knowledge remains valuable. To achieve this, a capsnet's lower layers are [[convolution]]al, including hidden capsule layers. Higher layers thus cover larger regions, while retaining information about the precise position of each object within the region. For low level capsules, ___location information is “place"place-coded”coded" according to which capsule is active. Higher up, more and more of the positional information is [[Neural coding|rate-coded]] in the capsule's output vector. This shift from place-coding to rate-coding, combined with the fact that higher-level capsules represent more complex objects with more degrees of freedom, suggests that capsule dimensionality increases with level.<ref name=":1"/>
 
== Human vision ==
Human vision examines a sequence of focal points (directed by [[saccade]]s), processing only a fraction of the scene at its highest resolution. Capsnets build on inspirations from [[cortical minicolumn]]s (also called cortical microcolumns) in the [[cerebral cortex]]. A minicolumn is a structure containing 80-120 neurons, with a diameter of about 28-40&nbsp;µmμm, spanning all layers in the cerebral cortex. All neurons in the larger minicolumns have the same [[receptive field]], and they output their activations as [[action potential]]s or spikes.<ref name=":1"/> Neurons within the microcolumn receive common inputs, have common outputs, are interconnected and may constitute a fundamental computational unit of the [[cerebral cortex]].<ref>{{Cite web|url=http://www.physics.drexel.edu/~ccruz/micros/research.html|title=Microcolumns in the Brain|website=www.physics.drexel.edu|access-date=2017-12-31|archive-date=2018-05-27|archive-url=https://web.archive.org/web/20180527140322/http://www.physics.drexel.edu/%7Eccruz/micros/research.html|url-status=dead}}</ref>
 
Capsnets explore the intuition that the human visual system creates a [[Parse tree|tree]]-like structure for each focal point and coordinates these trees to recognize objects. However, with capsnets each tree is "carved" from a fixed network (by adjusting coefficients) rather than assembled on the fly.<ref name=":1"/>
Line 188 ⟶ 189:
==References==
{{reflist|2|refs=
<ref name=":1">{{Cite arxivarXiv|last1=Sabour|first1=Sara|last2=Frosst|first2=Nicholas|last3=Hinton|first3=Geoffrey E.|date=2017-10-26|title=Dynamic Routing Between Capsules|eprint=1710.09829|class=cs.CV}}</ref>
}}
 
== External links ==
* {{Citation|title=Capsules Network Implementation in PyTorch, fixing several bugs in previous implementations |date=2018-04-16|url=https://github.com/manuelsh/capsule-networks-pytorch|access-date=2018-04-16}}
* {{Citation|title=Pytorch code: Capsule Routing via Variational Bayes | date=February 2020|url=https://github.com/fabio-deep/Variational-Capsule-Routing|access-date=2020-10-23}}
* {{Citation|title=A PyTorch implementation of the NIPS 2017 paper "Dynamic Routing Between Capsules"|date=2017-12-08|url=https://github.com/gram-ai/capsule-networks|publisher=Gram.AI|access-date=2017-12-08}}
* {{youtubeYouTube|What's wrong with convolutional neural nets|id=rTawFwUvnLE}}
* {{Cite web|url=http://www.cedar.buffalo.edu/~srihari/CSE676|title=Deep Learning|website=www.cedar.buffalo.edu|access-date=2017-12-07}}
*{{Cite web|url=https://medium.freecodecamp.org/understanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc|title=Understanding Capsule Networks — AI's Alluring New Architecture|last=Bourdakos|first=Nick|date=2018-02-12|website=freeCodeCamp.org|access-date=2019-04-23}}
*{{Cite arxivarXiv|last1=Dai|first1=Jifeng|last2=Qi|first2=Haozhi|last3=Xiong|first3=Yuwen|last4=Li|first4=Yi|last5=Zhang|first5=Guodong|last6=Hu|first6=Han|last7=Wei|first7=Yichen|date=2017-03-17|title=Deformable Convolutional Networks|eprint=1703.06211|class=cs.CV}}
*{{Cite arxivarXiv|last1=De Brabandere|first1=Bert|last2=Jia|first2=Xu|last3=Tuytelaars|first3=Tinne|last4=Van Gool|first4=Luc|date=2016-05-31|title=Dynamic Filter Networks|eprint=1605.09673|class=cs.LG}}
* {{Citation|last=Guo|first=Xifeng|title=CapsNet-Keras: A Keras implementation of CapsNet in NIPS2017 paper "Dynamic Routing Between Capsules". Now test error = 0.34%.|date=2017-12-08|url=https://github.com/XifengGuo/CapsNet-Keras|access-date=2017-12-08}}
* {{Cite web|url=https://openreview.net/pdf?id=HJWLfGWRb|title=MATRIX CAPSULES WITH EM ROUTING|last1=Hinton|first1=Geoffrey|last2=Sabour|first2=Sara|last3=Frosst|first3=Nicholas|date=November 2017}}
* {{youtubeYouTube|Hinton and Google Brain - Capsule Networks |id=x5Vxk9twXlE}}
* {{Citation|last=Liao|first=Huadong|title=CapsNet-Tensorflow: A Tensorflow implementation of CapsNet(Capsules Net) in Hinton's paper Dynamic Routing Between Capsules|date=2017-12-08|url=https://github.com/naturomics/CapsNet-Tensorflow|access-date=2017-12-08}}
*{{Cite web|first=Fangyu|last=Cai|date=2020-12-18|title='We Can Do It' — Geoffrey Hinton and UBC, UT, Google & UVic Team Propose Unsupervised Capsule…Capsule...|url=https://medium.com/syncedreview/we-can-do-it-geoffrey-hinton-and-ubc-ut-google-uvic-team-propose-unsupervised-capsule-c1f2edb6b1e9|access-date=2021-01-18|website=Medium|language=en}}
* {{cite arxivarXiv|last1=Sun|first1=Weiwei|last2=Tagliasacchi|first2=Andrea|last3=Deng|first3=Boyang|last4=Sabour|first4=Sara|last5=Yazdani|first5=Soroosh|last6=Hinton|first6=Geoffrey|last7=Yi|first7=Kwang Moo|date=2020-12-08|title=Canonical Capsules: Unsupervised Capsules in Canonical Pose|class=cs.CV|eprint=2012.04718}}
 
[[Category:Artificial neural networks]]