Graph neural network: Difference between revisions

Content deleted Content added
No edit summary
Iking5 (talk | contribs)
Creating the graph neural network page
Line 1:
A graph neural network (GNN) is a class of [[Neural network|neural networks]] for processing data represented by [[Graph (abstract data type)|graph data structures]]<ref>{{Cite journal|last=Scarselli|first=Franco|last2=Gori|first2=Marco|last3=Tsoi|first3=Ah Chung|last4=Hagenbuchner|first4=Markus|last5=Monfardini|first5=Gabriele|date=2009|title=The Graph Neural Network Model|url=https://ieeexplore.ieee.org/abstract/document/4700287|journal=IEEE Transactions on Neural Networks|volume=20|issue=1|pages=61–80|doi=10.1109/TNN.2008.2005605|issn=1941-0093}}</ref>. They were popularized by their use in [[supervised learning]] on properties of various molecules<ref>{{Cite journal|last=Gilmer|first=Justin|last2=Schoenholz|first2=Samuel S.|last3=Riley|first3=Patrick F.|last4=Vinyals|first4=Oriol|last5=Dahl|first5=George E.|date=2017-07-17|title=Neural Message Passing for Quantum Chemistry|url=http://proceedings.mlr.press/v70/gilmer17a.html|journal=International Conference on Machine Learning|language=en|publisher=PMLR|pages=1263–1272}}</ref>.
{{AfC submission|t||ts=20210628052201|u=Sheng-Yu PJ Huang|ns=118|demo=}}<!-- Important, do not remove this line before article has been created. -->
[[File:Line graph construction (original).png|thumb|Graph with 5 nodes.]]
In [[deep learning]], a '''graph neural network (GNN)''' is a subarea of [[neural network]], which is designed to process [[Graph theory|graph]] structured data or data that is able to be formulated as a graph potentially<ref name=":0">{{Cite journal|date=2020-01-01|title=Graph neural networks: A review of methods and applications|url=https://www.sciencedirect.com/science/article/pii/S2666651021000012|journal=AI Open|language=en|volume=1|pages=57–81|doi=10.1016/j.aiopen.2021.01.001|issn=2666-6510}}</ref><ref>{{Cite journal|last=Zhang|first=Si|last2=Tong|first2=Hanghang|last3=Xu|first3=Jiejun|last4=Maciejewski|first4=Ross|date=2019-11-10|title=Graph convolutional networks: a comprehensive review|url=https://doi.org/10.1186/s40649-019-0069-y|journal=Computational Social Networks|volume=6|issue=1|pages=11|doi=10.1186/s40649-019-0069-y|issn=2197-4314}}</ref> (e.g. [[social network]], [[polygon mesh]], [[point cloud]]). Since graph data is [[Non-Euclidean geometry|non-Euclidean]], relations between data points cannot be easily represented by their ordering when recording them, and hence [[Convolutional neural network|standard CNN]] is not able to be directly applied to graph data. On the other hand, GNN not only be able to applied to non-Euclidean data but also Euclidean data such as sentences, images or videos since such data can be represented as graph data if organized properly.
 
Since their inception, several variants of the simple message passing neural network (MPNN) framework have been proposed<ref>{{Cite journal|last=Kipf|first=Thomas N|last2=Welling|first2=Max|date=2016|title=Semi-supervised classification with graph convolutional networks|url=https://ieeexplore.ieee.org/abstract/document/4700287|journal=International Conference on Learning Representations|volume=5|via=arXiv}}</ref><ref>{{Cite journal|last=Defferrard|first=Michaël|last2=Bresson|first2=Xavier|last3=Vandergheynst|first3=Pierre|date=2017-02-05|title=Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering|url=http://arxiv.org/abs/1606.09375|journal=Neural Information Processing Systems|volume=30}}</ref><ref>{{Cite journal|last=Hamilton|first=William|last2=Ying|first2=Rex|last3=Leskovec|first3=Jure|date=2017|title=Inductive Representation Learning on Large Graphs|url=https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf|journal=Neural Information Processing Systems|volume=31|via=Stanford}}</ref><ref>{{Cite journal|last=Veličković|first=Petar|last2=Cucurull|first2=Guillem|last3=Casanova|first3=Arantxa|last4=Romero|first4=Adriana|last5=Liò|first5=Pietro|last6=Bengio|first6=Yoshua|date=2018-02-04|title=Graph Attention Networks|url=http://arxiv.org/abs/1710.10903|journal=International Conference on Learning Representations|volume=6}}</ref>. These models optimize GNNs for use on larger graphs and apply them to domains such as [[Social network|social networks]], [[Citation graph|citation networks]], and online communities<ref>{{Cite web|title=Stanford Large Network Dataset Collection|url=https://snap.stanford.edu/data/|access-date=2021-07-05|website=snap.stanford.edu}}</ref>.
== Pipeline of a GNN model<ref name=":0" /> ==
The design pipeline for a GNN model can be generally derived as four steps: find graph structure, specify graph, design loss functions, and build model .
[[File:Pipe-line of GNN.png|thumb|558x558px|The general design pipeline for a GNN model.]]
 
It has been mathematically proven that GNNs are a weak form of the Weisfeiler-Lehman graph isomorphism test<ref>{{Cite journal|last=Douglas|first=B. L.|date=2011-01-27|title=The Weisfeiler-Lehman Method and Graph Isomorphism Testing|url=http://arxiv.org/abs/1101.5211|journal=arXiv:1101.5211 [math]}}</ref>, so any GNN model is at least as powerful as this test<ref>{{Cite journal|last=Xu|first=Keyulu|last2=Hu|first2=Weihua|last3=Leskovec|first3=Jure|last4=Jegelka|first4=Stefanie|date=2019-02-22|title=How Powerful are Graph Neural Networks?|url=http://arxiv.org/abs/1810.00826|journal=International Conference on Learning Representations|volume=7}}</ref>. There is now growing interest in uniting GNNs with other so-called "geometric deep learning models"<ref>{{Cite journal|last=Bronstein|first=Michael M.|last2=Bruna|first2=Joan|last3=LeCun|first3=Yann|last4=Szlam|first4=Arthur|last5=Vandergheynst|first5=Pierre|date=2017|title=Geometric Deep Learning: Going beyond Euclidean data|url=https://ieeexplore.ieee.org/document/7974879/|journal=IEEE Signal Processing Magazine|volume=34|issue=4|pages=18–42|doi=10.1109/MSP.2017.2693418|issn=1053-5888}}</ref> to better understand how and why these models work.
=== Find graph structure ===
In [[graph theory]], a graph is denoted as <math>G=(V,E)</math>, where:
 
<references />
* <math>V</math>, a [[Set (mathematics)|set]] of ''vertices'' (also called ''nodes'' or ''points'');
* <math>E</math>, a [[Set (mathematics)|set]] of ''edges'' (either directed or undirected, also called ''links'' or ''lines'');
*<math>A</math>, the given graph's [[adjacency matrix]];
*<math>D</math>, the given graph's [[degree matrix]].
 
[[Category:Artificial intelligence]]
If the input data is already in graph structure, then this step is done. Otherwise, you need to observe the data first and reorganize it to be a graph according to your requirement, while not destroying the data's property (so that your model won't face the [[Garbage in, garbage out|"garbage in, garbage out"]] problem).
[[Category:Artificial neural networks]]
 
[[Category:Graph algorithms]]
=== Specify graph ===
[[File:Scene graph example.png|thumb|529x529px|An example of scene graph.]]
After a graph structure is found in the given data, the type of this graph should also be specified. A graph can be simply categorize as [[Directed graph|directed]]/[[Undirected graph|undirected]] or [[Homogeneous graph|homogeneous]]/[[Heterogeneous graph|heterogeneous]]. Note that for heterogeneous graphs, each edge may differ to the others by its property. For example, each edge in a [[scene graph]]<ref>{{Cite journal|last=Johnson|first=Justin|last2=Krishna|first2=Ranjay|last3=Stark|first3=Michael|last4=Li|first4=Li-Jia|last5=Shamma|first5=David A.|last6=Bernstein|first6=Michael S.|last7=Fei-Fei|first7=Li|title=Image retrieval using scene graphs|url=https://ieeexplore.ieee.org/document/7298990|journal=2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)|pages=3668–3678|doi=10.1109/CVPR.2015.7298990}}</ref> has different meaning to represent the relation between nodes. Sometimes the data's nodes can be merged to obtain graphs of different resolutions, and hence the graph structure may dynamically changed during the learning process. For example, when regarding [[point cloud]] as a graph, it is mostly a dynamic graph<ref name=":1">{{Cite journal|last=Wang|first=Yue|last2=Sun|first2=Yongbin|last3=Liu|first3=Z.|last4=Sarma|first4=Sanjay E.|last5=Bronstein|first5=M.|last6=Solomon|first6=J.|date=2019|title=Dynamic Graph CNN for Learning on Point Clouds|url=https://www.semanticscholar.org/paper/Dynamic-Graph-CNN-for-Learning-on-Point-Clouds-Wang-Sun/e1799aaf23c12af6932dc0ef3dfb1638f01413d1|journal=ACM Trans. Graph.|doi=10.1145/3326362}}</ref><ref name=":2">{{Cite journal|last=Thomas|first=Hugues|last2=Qi|first2=Charles R.|last3=Deschaud|first3=Jean-Emmanuel|last4=Marcotegui|first4=Beatriz|last5=Goulette|first5=François|last6=Guibas|first6=Leonidas|title=KPConv: Flexible and Deformable Convolution for Point Clouds|url=https://ieeexplore.ieee.org/document/9010002|journal=2019 IEEE/CVF International Conference on Computer Vision (ICCV)|pages=6410–6419|doi=10.1109/ICCV.2019.00651}}</ref><ref name=":3">{{Cite journal|last=Lin|first=Zhi-Hao|last2=Huang|first2=Sheng-Yu|last3=Wang|first3=Yu-Chiang Frank|title=Convolution in the Cloud: Learning Deformable Kernels in 3D Graph Convolution Networks for Point Cloud Analysis|url=https://ieeexplore.ieee.org/document/9156514|journal=2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)|pages=1797–1806|doi=10.1109/CVPR42600.2020.00187}}</ref><ref name=":4">{{Cite journal|last=Lin|first=Zhi-Hao|last2=Huang|first2=Sheng Yu|last3=Wang|first3=Yu-Chiang Frank|date=2021|title=Learning of 3D Graph Convolution Networks for Point Cloud Analysis|url=https://ieeexplore.ieee.org/abstract/document/9355025|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|pages=1–1|doi=10.1109/TPAMI.2021.3059758|issn=1939-3539}}</ref>.
 
=== Design loss function ===
Base on the task you are dealing with, [[Loss function|loss functions]] have to be chosen wisely. For example, for a [[Supervised learning|supervised]] node-level classification task, [[Cross entropy|cross-entropy]] might be a reasonable choice.
 
=== Build model ===
 
* '''Propagation module''': updating information carried by nodes and/or edges by some aggregation methods.
* '''Sampling module''': when a graph is too large, sampling modules are needed for computation preservation.
* '''Pooling module''': when higher level information (sub-graph-level, graph-level) is needed for the task, pooling modules are needed to aggregate low-level information and provide hierarchical propagation.
 
== Architecture ==
 
=== Convolution based methods ===
The main idea of this type of methods is to generalize [[Convolutional neural network|standard CNN]] or [[Attention (machine learning)|attentional methods]] to graph structured data, and that is, to define some [[receptive fields]] with respect to given nodes and propagate information within. Based on the operation ___domain, we can further divide these methods into two groups, '''spectral approaches''' and '''spatial approaches'''.
 
==== Spectral approaches ====
For spectral approaches, a graph signal <math>x</math> is first transformed to spectral ___domain by the [[graph Fourier Transform]] <math>\mathcal{F}</math>, then the convolution operation can be conducted. After that, we can transform the result back to the original ___domain (spatial ___domain) by the [[Graph Fourier Transform|inverse graph Fourier Transform]] <math>\mathcal{F^{-1}}</math>. <math>\mathcal{F}</math> and <math>\mathcal{F^{-1}}</math> are defined as:
 
* <math>\mathcal{F}(x) = U^{T}x</math>
* <math>\mathcal{F^{-1}}(x) = Ux</math>,
 
where <math>U</math> is the matrix of eigenvectors of the [[Graph Laplacian|symmetric normalized graph Laplacian]] <math>L^\text{sym}</math>. <math>L^\text{sym}</math> is denoted as:
 
<math>L^\text{sym} := D^{-\frac{1}{2}} L D^{-\frac{1}{2}} = I - D^{-\frac{1}{2}} A D^{-\frac{1}{2}} = U\Lambda U^{T}</math> ,
 
where <math>L</math> is the [[Graph Laplacian|original graph Laplacian]], <math>I</math> is an [[identity matrix]], and <math>\Lambda</math> is a diagonal matrix.
 
Therefore, based on the [[Graph Fourier Transform|convolution's property]]<ref>{{Cite book|last=Mallat|first=Stephane|url=https://books.google.com.tw/books?hl=zh-TW&lr=&id=hbVOfWQNtB8C&oi=fnd&pg=PP1&ots=qunXivn446&sig=-CBvvP9HIfpAFLnQU_d5R0RKLpg&redir_esc=y#v=onepage&q&f=false|title=A Wavelet Tour of Signal Processing|date=1999-09-14|publisher=Elsevier|isbn=978-0-08-052083-4|language=en}}</ref>, the convolution of the signal <math>x</math> and a [[Kernel|learnable kernel function]] <math>g</math> is defined as:
 
<math>g\star x = \mathcal{F^{-1}}(\mathcal{F}(g)\odot \mathcal{F}(x)) = U(U^Tg\odot U^Tx)</math>,
 
and if we set the learnable kernel function to be a diagonal one <math>g_{\theta}</math>, this operation is further simplified to:
 
<math>g_{\theta}\star x = Ug_{\theta}U^Tx</math>.
 
===== GCN<ref>{{Cite journal|last=Kipf|first=Thomas N.|last2=Welling|first2=Max|date=2017-02-22|title=Semi-Supervised Classification with Graph Convolutional Networks|url=http://arxiv.org/abs/1609.02907|journal=International Conference on Learning Representations (ICLR), 2017}}</ref> =====
Given a input graph with <math>N</math> nodes and the node feature matrix <math>X</math>, GCN simplified the convolution layer to <math>H^{(l+1)} = \sigma (\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}H^{(l)}W^{(l)})</math>, where
 
* <math>\tilde{A} = A + I</math>;
* <math>\tilde{D}_{ii} = \textstyle \sum_{j} \displaystyle\tilde{A}_{ij} </math>;
* <math>H^{(l)}\in\mathbb{R}^{N\times D}</math> is the input [[Feature (machine learning)|feature map]] at <math>l</math>-th layer, <math>H^{(0)} = X</math>;
* <math>H^{(l+1)}\in\mathbb{R}^{N\times D'}</math> is the output [[Feature (machine learning)|feature map]] at <math>l</math>-th layer;
* <math>W^{(l)}\in\mathbb{R}^{D\times D'}</math>is the trainable weight matrix at <math>l</math>-th layer;
* <math>\sigma (\centerdot) </math> is the [[activation function]].
 
Since both <math>\tilde{A}</math> and <math>\tilde{D}</math> are able to be pre-calculated, this graph convolution method can be easily accelerated by GPU implementation. '''Note that this method only suits for undirected graphs with no edge features.'''
 
==== Spatial approaches ====
Spatial approaches directly design convolution operation on the graph based on the graph topology (hence called '''spatial'''), making these methods more flexible compared with spectral approaches. Since the size of neighbors is mostly different for each node within a graph, designing an efficient way to define receptive fields and feature propagation is the prime challenge of such approaches. Unlike spectral approaches that severely affected by the global graph structure, spatial approaches mostly focus on local relations between nodes and edge properties, and the global properties can be found by apply pooling mechanisms between convolution layers properly.[[File:Graph attention network.png|thumb|310x310px|An illustration of k-head-attention-based GNN (GAT here). This example is when k=3.]]
===== GAT<ref>{{Cite journal|last=Veličković|first=Petar|last2=Cucurull|first2=Guillem|last3=Casanova|first3=Arantxa|last4=Romero|first4=Adriana|last5=Liò|first5=Pietro|last6=Bengio|first6=Yoshua|date=2018-02-04|title=Graph Attention Networks|url=http://arxiv.org/abs/1710.10903|journal=International Conference on Learning Representations (ICLR), 2018}}</ref> and GaAN<ref>{{Cite journal|last=Zhang|first=Jiani|last2=Shi|first2=Xingjian|last3=Xie|first3=Junyuan|last4=Ma|first4=Hao|last5=King|first5=Irwin|last6=Yeung|first6=Dit-Yan|date=2018-03-20|title=GaAN: Gated Attention Networks for Learning on Large and Spatiotemporal Graphs|url=http://arxiv.org/abs/1803.07294|journal=arXiv:1803.07294 [cs]}}</ref> =====
[[Attention (machine learning)|Attentional networks]] have already gain huge success in multiple deep learning areas, especially sequenced data related works. As nodes features of a graph can be represented as a unordered dat sequence, the graph attentional network (GAT) and the gated attention network (GaAN) make use of the benefit that multi-head attention model can automatically learn the importance of each neighbor with respect to different heads.
 
===== GCN for scene graph<ref>{{Cite journal|last=Johnson|first=Justin|last2=Gupta|first2=Agrim|last3=Fei-Fei|first3=Li|date=2018-04-04|title=Image Generation from Scene Graphs|url=https://openaccess.thecvf.com/content_cvpr_2018/CameraReady/0764.pdf|journal=CVPR (2018)}}</ref><ref>{{Cite journal|last=Dhamo|first=Helisa|last2=Farshad|first2=Azade|last3=Laina|first3=Iro|last4=Navab|first4=Nassir|last5=Hager|first5=Gregory D.|last6=Tombari|first6=Federico|last7=Rupprecht|first7=Christian|date=2020-04-07|title=Semantic Image Manipulation Using Scene Graphs|url=https://openaccess.thecvf.com/content_CVPR_2020/papers/Dhamo_Semantic_Image_Manipulation_Using_Scene_Graphs_CVPR_2020_paper.pdf|journal=CVPR (2020)}}</ref><ref>{{Cite journal|last=Wald|first=Johanna|last2=Dhamo|first2=Helisa|last3=Navab|first3=Nassir|last4=Tombari|first4=Federico|date=2020|title=Learning 3D Semantic Scene Graphs From 3D Indoor Reconstructions|url=https://openaccess.thecvf.com/content_CVPR_2020/html/Wald_Learning_3D_Semantic_Scene_Graphs_From_3D_Indoor_Reconstructions_CVPR_2020_paper.html|journal=CVPR (2020)|pages=3961–3970}}</ref> =====
[[Scene graph|Scene graphs]] have different edge features which indicate the semantic relations between neighboring nodes, and therefore when designing convolution operations on such structure, both node features and edge features are updated. GCN for scene graph usually regard the features of two neighboring nodes and the edge between as a triplet, and update the edge feature by passing this triplet through [[Multilayer perceptron|MLPs]]. As for node features, the updating method is similar to GCN, instead not only considering neighbor points' feature but also edge features.
 
===== GCN for point cloud analysis<ref name=":1" /><ref name=":2" /><ref name=":3" /><ref name=":4" /><ref name=":5">{{Cite journal|last=Shen|first=Yiru|last2=Feng|first2=Chen|last3=Yang|first3=Yaoqing|last4=Tian|first4=Dong|date=2018|title=Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling|url=https://openaccess.thecvf.com/content_cvpr_2018/html/Shen_Mining_Point_Cloud_CVPR_2018_paper.html|journal=CVPR (2018)|pages=4548–4557}}</ref> =====
[[Point cloud|Point clouds]] are some point sets lies in 3D space with no edges between each point, so the original format of such data is not a graph. However, we can dynamically construct graph structures from point clouds by connecting points which satisfy given relation (mostly [[K-nearest neighbors algorithm|kNN]] or distance smaller than some thresholds), and the constructed graph can also be dynamically changed when sub-sampling or pooling methods are applied.
 
====== KC-Net<ref name=":5" /> ======
The graph in KC-Net is constructed by kNN. They design a 3D kernel which composed of several learnable 3D points, and the convolution of a given point as center is operated by calculate the similarity between each pair of "'''one kernel point and relative position of one neighboring point of the given center point'''". KC-Net also provides a graph max-pooling module to better capture higher level features of point clouds.
 
====== DGCNN<ref name=":1" /> ======
Convolution method used is DGCNN is using two sets of learnable weights to aggregate feature of a given center point and feature difference between this point and each of its neighbors separately. Although not using pooling module to obtain multi-scale feature, DGCNN dynamically re-define the graph by changing the neighbor number and considering distances in feature space instead of the original 3D space. This idea helps DGCNN better captures semantic information.
 
====== KP-Conv<ref name=":2" /> ======
The neighbor points of a given center point of KP-Conv is chosen by a given radius, all the points inside would regard as the center point's neighbor. While the key idea is similar to KC-Net, kernel points of KP-Conv can be either deformable or stable based on the current task. To explore geometry in different scale, KP-Conv provides pooling module by regional sampling.
 
====== 3D-GCN<ref name=":3" /><ref name=":4" /> ======
[[File:Illustration of 3D-GCN's receptive field and kernel.png|thumb|390x390px|Illustration of 3D-GCN's receptive field and kernel]]
Graphs are constructed by kNN. 3D-GCN designs deformable 3D kernels as each kernel has one center kernel point <math>k_C\in \mathbb{R}^3</math> and several support points <math>k_1, k_2, ... k_S\in \mathbb{R}^3</math>. Given a data point <math>p_n\in \mathbb{R}^3</math>and its neighbor points <math>p_1, p_2, ... p_M \in \mathbb{R}^3</math>, the convolution is operated by taking the direction vector of the center point to each neighbor <math>p_m - p_n</math> and the direction vector of the center kernel point to each support <math>k_s - k_C = k_s</math>(since <math>k_C</math> is set to be <math>(0,0,0)</math>), calculate their [[cosine similarity]], and then map this similarity to feature space by another learnable parameters <math>\mathcal{w}</math>. Since the convolution is calculated by cosine similarity instead of exact coordinate, 3D-GCN better captures an 3D object's geometry instead of ___location, and is totally '''shift and scale invariant.''' Similar to KC-Net, 3D-GCN also design a graph max-pooling to explore multi-resolution information, while preserving the largest activation.
 
=== Recurrent based methods<ref>{{Cite journal|last=Li|first=Yujia|last2=Tarlow|first2=Daniel|last3=Brockschmidt|first3=Marc|last4=Zemel|first4=Richard|date=2017-09-22|title=Gated Graph Sequence Neural Networks|url=http://arxiv.org/abs/1511.05493|journal=arXiv:1511.05493 [cs, stat]}}</ref><ref>{{Cite journal|last=Tai|first=Kai Sheng|last2=Socher|first2=Richard|last3=Manning|first3=Christopher D.|title=Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks|url=https://aclanthology.org/P15-1150|journal=|language=en-us|pages=1556–1566|doi=10.3115/v1/P15-1150}}</ref><ref>{{Cite journal|last=Zayats|first=Victoria|last2=Ostendorf|first2=Mari|date=2018|title=Conversation Modeling on Reddit Using a Graph-Structured LSTM|url=https://aclanthology.org/Q18-1009|journal=Transactions of the Association for Computational Linguistics|language=en-us|volume=6|pages=121–132|doi=10.1162/tacl_a_00009}}</ref> ===
Different from convolution based methods which learns different weight at each layer, recurrent based methods tend to use same shared module and recursively update node or edge information, borrowing some [[Recurrent neural network|RNN]] approaches such as GRU, LSTM, etc.
 
== References ==
<!-- Inline citations added to your article will automatically display here. See en.wikipedia.org/wiki/WP:REFB for instructions on how to add citations. -->
{{reflist}}
 
{{AfC submission|||ts=20210702083833|u=Sheng-Yu PJ Huang|ns=118}}