Graph neural network: Difference between revisions

Content deleted Content added
attentional methods
finish scene graph and start point cloud
Line 19:
=== Specify graph ===
[[File:Scene graph example.png|thumb|529x529px|An example of scene graph.]]
After a graph structure is found in the given data, the type of this graph should also be specified. A graph can be simply categorize as [[Directed graph|directed]]/[[Undirected graph|undirected]] or [[Homogeneous graph|homogeneous]]/[[Heterogeneous graph|heterogeneous]]. Note that for heterogeneous graphs, each edge may differ to the others by its property. For example, each edge in a [[scene graph]]<ref>{{Cite journal|last=Johnson|first=Justin|last2=Krishna|first2=Ranjay|last3=Stark|first3=Michael|last4=Li|first4=Li-Jia|last5=Shamma|first5=David A.|last6=Bernstein|first6=Michael S.|last7=Fei-Fei|first7=Li|title=Image retrieval using scene graphs|url=https://ieeexplore.ieee.org/document/7298990|journal=2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)|pages=3668–3678|doi=10.1109/CVPR.2015.7298990}}</ref> has different meaning to represent the relation between nodes. Sometimes the data's nodes can be merged to obtain graphs of different resolutions, and hence the graph structure may dynamically changed during the learning process. For example, when regarding [[point cloud]] as a graph, it is mostly a dynamic graph<ref name=":1">{{Cite journal|last=Wang|first=Yue|last2=Sun|first2=Yongbin|last3=Liu|first3=Z.|last4=Sarma|first4=Sanjay E.|last5=Bronstein|first5=M.|last6=Solomon|first6=J.|date=2019|title=Dynamic Graph CNN for Learning on Point Clouds|url=https://www.semanticscholar.org/paper/Dynamic-Graph-CNN-for-Learning-on-Point-Clouds-Wang-Sun/e1799aaf23c12af6932dc0ef3dfb1638f01413d1|journal=ACM Trans. Graph.|doi=10.1145/3326362}}</ref><ref name=":2">{{Cite journal|last=Thomas|first=Hugues|last2=Qi|first2=Charles R.|last3=Deschaud|first3=Jean-Emmanuel|last4=Marcotegui|first4=Beatriz|last5=Goulette|first5=François|last6=Guibas|first6=Leonidas|title=KPConv: Flexible and Deformable Convolution for Point Clouds|url=https://ieeexplore.ieee.org/document/9010002|journal=2019 IEEE/CVF International Conference on Computer Vision (ICCV)|pages=6410–6419|doi=10.1109/ICCV.2019.00651}}</ref><ref name=":3">{{Cite journal|last=Lin|first=Zhi-Hao|last2=Huang|first2=Sheng-Yu|last3=Wang|first3=Yu-Chiang Frank|title=Convolution in the Cloud: Learning Deformable Kernels in 3D Graph Convolution Networks for Point Cloud Analysis|url=https://ieeexplore.ieee.org/document/9156514|journal=2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)|pages=1797–1806|doi=10.1109/CVPR42600.2020.00187}}</ref><ref name=":4">{{Cite journal|last=Lin|first=Zhi-Hao|last2=Huang|first2=Sheng Yu|last3=Wang|first3=Yu-Chiang Frank|date=2021|title=Learning of 3D Graph Convolution Networks for Point Cloud Analysis|url=https://ieeexplore.ieee.org/abstract/document/9355025|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|pages=1–1|doi=10.1109/TPAMI.2021.3059758|issn=1939-3539}}</ref>.
 
=== Design loss function ===
Line 68:
 
==== Spatial approaches ====
Spatial approaches directly design convolution operation on the graph based on the graph topology (hence called '''spatial'''), making these methods more flexible compared with spectral approaches. Since the size of neighbors is mostly different for each node within a graph, designing an efficient way to define receptive fields and feature propagation is the prime challenge of such approaches. Unlike spectral approaches that severely affected by the global graph structure, spatial approaches mostly focus on local relations between nodes and edge properties, and the global properties can be found by apply pooling mechanisms between convolution layers properly.[[File:Graph attention network.png|thumb|310x310px|An illustration of k-head-attention-based GNN (GAT here). This example is when k=3.]]
 
===== GAT<ref>{{Cite journal|last=Veličković|first=Petar|last2=Cucurull|first2=Guillem|last3=Casanova|first3=Arantxa|last4=Romero|first4=Adriana|last5=Liò|first5=Pietro|last6=Bengio|first6=Yoshua|date=2018-02-04|title=Graph Attention Networks|url=http://arxiv.org/abs/1710.10903|journal=International Conference on Learning Representations (ICLR), 2018}}</ref> and GaAN<ref>{{Cite journal|last=Zhang|first=Jiani|last2=Shi|first2=Xingjian|last3=Xie|first3=Junyuan|last4=Ma|first4=Hao|last5=King|first5=Irwin|last6=Yeung|first6=Dit-Yan|date=2018-03-20|title=GaAN: Gated Attention Networks for Learning on Large and Spatiotemporal Graphs|url=http://arxiv.org/abs/1803.07294|journal=arXiv:1803.07294 [cs]}}</ref> =====
[[File:Graph attention network.png|thumb|310x310px|An illustration of k-head-attention-based GNN (GAT here). This example is when k=3.]]
[[Attention (machine learning)|Attentional networks]] have already gain huge success in multiple deep learning areas, especially sequenced data related works. As nodes features of a graph can be represented as a unordered dat sequence, the graph attentional network (GAT) and the gated attention network (GaAN) make use of the benefit that multi-head attention model can automatically learn the importance of each neighbor with respect to different heads.
 
===== GCN for scene graph<ref>{{Cite journal|last=Johnson|first=Justin|last2=Gupta|first2=Agrim|last3=Fei-Fei|first3=Li|date=2018-04-04|title=Image Generation from Scene Graphs|url=https://openaccess.thecvf.com/content_cvpr_2018/CameraReady/0764.pdf|journal=CVPR (2018)}}</ref><ref>{{Cite journal|last=Dhamo|first=Helisa|last2=Farshad|first2=Azade|last3=Laina|first3=Iro|last4=Navab|first4=Nassir|last5=Hager|first5=Gregory D.|last6=Tombari|first6=Federico|last7=Rupprecht|first7=Christian|date=2020-04-07|title=Semantic Image Manipulation Using Scene Graphs|url=https://openaccess.thecvf.com/content_CVPR_2020/papers/Dhamo_Semantic_Image_Manipulation_Using_Scene_Graphs_CVPR_2020_paper.pdf|journal=CVPR (2020)}}</ref><ref>{{Cite journal|last=Wald|first=Johanna|last2=Dhamo|first2=Helisa|last3=Navab|first3=Nassir|last4=Tombari|first4=Federico|date=2020|title=Learning 3D Semantic Scene Graphs From 3D Indoor Reconstructions|url=https://openaccess.thecvf.com/content_CVPR_2020/html/Wald_Learning_3D_Semantic_Scene_Graphs_From_3D_Indoor_Reconstructions_CVPR_2020_paper.html|journal=CVPR (2020)|pages=3961–3970}}</ref> =====
[[Scene graph|Scene graphs]] have different edge features which indicate the semantic relations between neighboring nodes, and therefore when designing convolution operations on such structure, both node features and edge features are updated. GCN for scene graph usually regard the features of two neighboring nodes and the edge between as a triplet, and update the edge feature by passing this triplet through [[Multilayer perceptron|MLPs]]. As for node features, the updating method is similar to GCN, instead not only considering neighbor points' feature but also edge features.
 
===== GCN for point cloud analysis<ref name=":1" /><ref name=":2" /><ref name=":3" /><ref name=":4" /><ref name=":5">{{Cite journal|last=Shen|first=Yiru|last2=Feng|first2=Chen|last3=Yang|first3=Yaoqing|last4=Tian|first4=Dong|date=2018|title=Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling|url=https://openaccess.thecvf.com/content_cvpr_2018/html/Shen_Mining_Point_Cloud_CVPR_2018_paper.html|journal=CVPR (2018)|pages=4548–4557}}</ref> =====
[[Point cloud|Point clouds]] are some point sets lies in 3D space with no edges between each point, so the original format of such data is not a graph. However, we can dynamically construct graph structures from point clouds by connecting points which satisfy given relation (mostly [[K-nearest neighbors algorithm|kNN]] or distance smaller than some thresholds), and the constructed graph can also be dynamically changed when sub-sampling or pooling methods are applied.
 
====== KC-Net<ref name=":5" /> ======
 
== References ==