Graph neural network

In deep learning, a graph neural network (GNN) is a subarea of neural network, which is designed to process graph structured data or data that is able to be formulated as a graph potentially^[1]^[2] (e.g. social network, polygon mesh, point cloud). Since graph data is non-Euclidean, relations between data points cannot be easily represented by their ordering when recording them, and hence standard CNN is not able to be directly applied to graph data. On the other hand, GNN not only be able to applied to non-Euclidean data but also Euclidean data such as sentences, images or videos since such data can be represented as graph data if organized properly.

Pipeline of a GNN model^[1]

The design pipeline for a GNN model can be generally derived as four steps: find graph structure, specify graph, design loss functions, and build model .

The general design pipeline for a GNN model.

Find graph structure

In graph theory, a graph is denoted as $G=(V,E)$ , where:

$V$ , a set of vertices (also called nodes or points);
$E$ , a set of edges (either directed or undirected, also called links or lines);
$A$ , the given graph's adjacency matrix;
$D$ , the given graph's degree matrix.

If the input data is already in graph structure, then this step is done. Otherwise, you need to observe the data first and reorganize it to be a graph according to your requirement, while not destroying the data's property (so that your model won't face the "garbage in, garbage out" problem).

Specify graph

An example of scene graph.

After a graph structure is found in the given data, the type of this graph should also be specified. A graph can be simply categorize as directed/undirected or homogeneous/heterogeneous. Note that for heterogeneous graphs, each edge may differ to the others by its property. For example, each edge in a scene graph^[3] has different meaning to represent the relation between nodes. Sometimes the data's nodes can be merged to obtain graphs of different resolutions, and hence the graph structure may dynamically changed during the learning process. For example, when regarding point cloud as a graph, it is mostly a dynamic graph^[4]^[5]^[6]^[7].

Design loss function

Base on the task you are dealing with, loss functions have to be chosen wisely. For example, for a supervised node-level classification task, cross-entropy might be a reasonable choice.

Build model

Propagation module: updating information carried by nodes and/or edges by some aggregation methods.
Sampling module: when a graph is too large, sampling modules are needed for computation preservation.
Pooling module: when higher level information (sub-graph-level, graph-level) is needed for the task, pooling modules are needed to aggregate low-level information and provide hierarchical propagation.

Architecture

Convolution based methods

The main idea of this type of methods is to generalize standard CNN or attentional methods to graph structured data, and that is, to define some receptive fields with respect to given nodes and propagate information within. Based on the operation ___domain, we can further divide these methods into two groups, spectral approaches and spatial approaches.

Spectral approaches

For spectral approaches, a graph signal $x$ is first transformed to spectral ___domain by the graph Fourier Transform ${\mathcal {F}}$ , then the convolution operation can be conducted. After that, we can transform the result back to the original ___domain (spatial ___domain) by the inverse graph Fourier Transform ${\mathcal {F^{-1}}}$ . ${\mathcal {F}}$ and ${\mathcal {F^{-1}}}$ are defined as:

${\mathcal {F}}(x)=U^{T}x$
${\mathcal {F^{-1}}}(x)=Ux$ ,

where $U$ is the matrix of eigenvectors of the symmetric normalized graph Laplacian $L^{\text{sym}}$ . $L^{\text{sym}}$ is denoted as:

$L^{\text{sym}}:=D^{-{\frac {1}{2}}}LD^{-{\frac {1}{2}}}=I-D^{-{\frac {1}{2}}}AD^{-{\frac {1}{2}}}=U\Lambda U^{T}$ ,

where $L$ is the original graph Laplacian, $I$ is an identity matrix, and $\Lambda$ is a diagonal matrix.

Therefore, based on the convolution's property^[8], the convolution of the signal $x$ and a learnable kernel function $g$ is defined as:

$g\star x={\mathcal {F^{-1}}}({\mathcal {F}}(g)\odot {\mathcal {F}}(x))=U(U^{T}g\odot U^{T}x)$ ,

and if we set the learnable kernel function to be a diagonal one $g_{\theta }$ , this operation is further simplified to:

$g_{\theta }\star x=Ug_{\theta }U^{T}x$ .

GCN^[9]

Given a input graph with $N$ nodes and the node feature matrix $X$ , GCN simplified the convolution layer to $H^{(l+1)}=\sigma ({\tilde {D}}^{-1/2}{\tilde {A}}{\tilde {D}}^{-1/2}H^{(l)}W^{(l)})$ , where

${\tilde {A}}=A+I$ ;
${\tilde {D}}_{ii}=\textstyle \sum _{j}\displaystyle {\tilde {A}}_{ij}$ ;
$H^{(l)}\in \mathbb {R} ^{N\times D}$ is the input feature map at $l$ -th layer, $H^{(0)}=X$ ;
$H^{(l+1)}\in \mathbb {R} ^{N\times D'}$ is the output feature map at $l$ -th layer;
$W^{(l)}\in \mathbb {R} ^{D\times D'}$ is the trainable weight matrix at $l$ -th layer;
$\sigma (\centerdot )$ is the activation function.

Since both ${\tilde {A}}$ and ${\tilde {D}}$ are able to be pre-calculated, this graph convolution method can be easily accelerated by GPU implementation. Note that this method only suits for undirected graphs with no edge features.

Spatial approaches

Spatial approaches directly design convolution operation on the graph based on the graph topology (hence called spatial), making these methods more flexible compared with spectral approaches. Since the size of neighbors is mostly different for each node within a graph, designing an efficient way to define receptive fields and feature propagation is the prime challenge of such approaches. Unlike spectral approaches that severely affected by the global graph structure, spatial approaches mostly focus on local relations between nodes and edge properties, and the global properties can be found by apply pooling mechanisms between convolution layers properly.

An illustration of k-head-attention-based GNN (GAT here). This example is when k=3.

GAT^[10] and GaAN^[11]

Attentional networks have already gain huge success in multiple deep learning areas, especially sequenced data related works. As nodes features of a graph can be represented as a unordered dat sequence, the graph attentional network (GAT) and the gated attention network (GaAN) make use of the benefit that multi-head attention model can automatically learn the importance of each neighbor with respect to different heads.

GCN for scene graph^[12]^[13]^[14]

Scene graphs have different edge features which indicate the semantic relations between neighboring nodes, and therefore when designing convolution operations on such structure, both node features and edge features are updated. GCN for scene graph usually regard the features of two neighboring nodes and the edge between as a triplet, and update the edge feature by passing this triplet through MLPs. As for node features, the updating method is similar to GCN, instead not only considering neighbor points' feature but also edge features.

GCN for point cloud analysis^[4]^[5]^[6]^[7]^[15]

Point clouds are some point sets lies in 3D space with no edges between each point, so the original format of such data is not a graph. However, we can dynamically construct graph structures from point clouds by connecting points which satisfy given relation (mostly kNN or distance smaller than some thresholds), and the constructed graph can also be dynamically changed when sub-sampling or pooling methods are applied.

KC-Net^[15]

References

^ ^a ^b "Graph neural networks: A review of methods and applications". AI Open. 1: 57–81. 2020-01-01. doi:10.1016/j.aiopen.2021.01.001. ISSN 2666-6510.
^ Zhang, Si; Tong, Hanghang; Xu, Jiejun; Maciejewski, Ross (2019-11-10). "Graph convolutional networks: a comprehensive review". Computational Social Networks. 6 (1): 11. doi:10.1186/s40649-019-0069-y. ISSN 2197-4314.{{cite journal}}: CS1 maint: unflagged free DOI (link)
^ Johnson, Justin; Krishna, Ranjay; Stark, Michael; Li, Li-Jia; Shamma, David A.; Bernstein, Michael S.; Fei-Fei, Li. "Image retrieval using scene graphs". 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 3668–3678. doi:10.1109/CVPR.2015.7298990.
^ ^a ^b Wang, Yue; Sun, Yongbin; Liu, Z.; Sarma, Sanjay E.; Bronstein, M.; Solomon, J. (2019). "Dynamic Graph CNN for Learning on Point Clouds". ACM Trans. Graph. doi:10.1145/3326362.
^ ^a ^b Thomas, Hugues; Qi, Charles R.; Deschaud, Jean-Emmanuel; Marcotegui, Beatriz; Goulette, François; Guibas, Leonidas. "KPConv: Flexible and Deformable Convolution for Point Clouds". 2019 IEEE/CVF International Conference on Computer Vision (ICCV): 6410–6419. doi:10.1109/ICCV.2019.00651.
^ ^a ^b Lin, Zhi-Hao; Huang, Sheng-Yu; Wang, Yu-Chiang Frank. "Convolution in the Cloud: Learning Deformable Kernels in 3D Graph Convolution Networks for Point Cloud Analysis". 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR): 1797–1806. doi:10.1109/CVPR42600.2020.00187.
^ ^a ^b Lin, Zhi-Hao; Huang, Sheng Yu; Wang, Yu-Chiang Frank (2021). "Learning of 3D Graph Convolution Networks for Point Cloud Analysis". IEEE Transactions on Pattern Analysis and Machine Intelligence: 1–1. doi:10.1109/TPAMI.2021.3059758. ISSN 1939-3539.
^ Mallat, Stephane (1999-09-14). A Wavelet Tour of Signal Processing. Elsevier. ISBN 978-0-08-052083-4.
^ Kipf, Thomas N.; Welling, Max (2017-02-22). "Semi-Supervised Classification with Graph Convolutional Networks". International Conference on Learning Representations (ICLR), 2017.
^ Veličković, Petar; Cucurull, Guillem; Casanova, Arantxa; Romero, Adriana; Liò, Pietro; Bengio, Yoshua (2018-02-04). "Graph Attention Networks". International Conference on Learning Representations (ICLR), 2018.
^ Zhang, Jiani; Shi, Xingjian; Xie, Junyuan; Ma, Hao; King, Irwin; Yeung, Dit-Yan (2018-03-20). "GaAN: Gated Attention Networks for Learning on Large and Spatiotemporal Graphs". arXiv:1803.07294 [cs].
^ Johnson, Justin; Gupta, Agrim; Fei-Fei, Li (2018-04-04). "Image Generation from Scene Graphs" (PDF). CVPR (2018).
^ Dhamo, Helisa; Farshad, Azade; Laina, Iro; Navab, Nassir; Hager, Gregory D.; Tombari, Federico; Rupprecht, Christian (2020-04-07). "Semantic Image Manipulation Using Scene Graphs" (PDF). CVPR (2020).
^ Wald, Johanna; Dhamo, Helisa; Navab, Nassir; Tombari, Federico (2020). "Learning 3D Semantic Scene Graphs From 3D Indoor Reconstructions". CVPR (2020): 3961–3970.
^ ^a ^b Shen, Yiru; Feng, Chen; Yang, Yaoqing; Tian, Dong (2018). "Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling". CVPR (2018): 4548–4557.

[:0-1] "Graph neural networks: A review of methods and applications". AI Open. 1: 57–81. 2020-01-01. doi:10.1016/j.aiopen.2021.01.001. ISSN 2666-6510.

[2] Zhang, Si; Tong, Hanghang; Xu, Jiejun; Maciejewski, Ross (2019-11-10). "Graph convolutional networks: a comprehensive review". Computational Social Networks. 6 (1): 11. doi:10.1186/s40649-019-0069-y. ISSN 2197-4314.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[3] Johnson, Justin; Krishna, Ranjay; Stark, Michael; Li, Li-Jia; Shamma, David A.; Bernstein, Michael S.; Fei-Fei, Li. "Image retrieval using scene graphs". 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 3668–3678. doi:10.1109/CVPR.2015.7298990.

[:1-4] Wang, Yue; Sun, Yongbin; Liu, Z.; Sarma, Sanjay E.; Bronstein, M.; Solomon, J. (2019). "Dynamic Graph CNN for Learning on Point Clouds". ACM Trans. Graph. doi:10.1145/3326362.

[:2-5] Thomas, Hugues; Qi, Charles R.; Deschaud, Jean-Emmanuel; Marcotegui, Beatriz; Goulette, François; Guibas, Leonidas. "KPConv: Flexible and Deformable Convolution for Point Clouds". 2019 IEEE/CVF International Conference on Computer Vision (ICCV): 6410–6419. doi:10.1109/ICCV.2019.00651.

[:3-6] Lin, Zhi-Hao; Huang, Sheng-Yu; Wang, Yu-Chiang Frank. "Convolution in the Cloud: Learning Deformable Kernels in 3D Graph Convolution Networks for Point Cloud Analysis". 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR): 1797–1806. doi:10.1109/CVPR42600.2020.00187.

[:4-7] Lin, Zhi-Hao; Huang, Sheng Yu; Wang, Yu-Chiang Frank (2021). "Learning of 3D Graph Convolution Networks for Point Cloud Analysis". IEEE Transactions on Pattern Analysis and Machine Intelligence: 1–1. doi:10.1109/TPAMI.2021.3059758. ISSN 1939-3539.

[8] Mallat, Stephane (1999-09-14). A Wavelet Tour of Signal Processing. Elsevier. ISBN 978-0-08-052083-4.

[9] Kipf, Thomas N.; Welling, Max (2017-02-22). "Semi-Supervised Classification with Graph Convolutional Networks". International Conference on Learning Representations (ICLR), 2017.

[10] Veličković, Petar; Cucurull, Guillem; Casanova, Arantxa; Romero, Adriana; Liò, Pietro; Bengio, Yoshua (2018-02-04). "Graph Attention Networks". International Conference on Learning Representations (ICLR), 2018.

[11] Zhang, Jiani; Shi, Xingjian; Xie, Junyuan; Ma, Hao; King, Irwin; Yeung, Dit-Yan (2018-03-20). "GaAN: Gated Attention Networks for Learning on Large and Spatiotemporal Graphs". arXiv:1803.07294 [cs].

[12] Johnson, Justin; Gupta, Agrim; Fei-Fei, Li (2018-04-04). "Image Generation from Scene Graphs" (PDF). CVPR (2018).

[13] Dhamo, Helisa; Farshad, Azade; Laina, Iro; Navab, Nassir; Hager, Gregory D.; Tombari, Federico; Rupprecht, Christian (2020-04-07). "Semantic Image Manipulation Using Scene Graphs" (PDF). CVPR (2020).

[14] Wald, Johanna; Dhamo, Helisa; Navab, Nassir; Tombari, Federico (2020). "Learning 3D Semantic Scene Graphs From 3D Indoor Reconstructions". CVPR (2020): 3961–3970.

[:5-15] Shen, Yiru; Feng, Chen; Yang, Yaoqing; Tian, Dong (2018). "Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling". CVPR (2018): 4548–4557.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]