Graph neural network: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 08:00, 2 July 2021 edit Sheng-Yu PJ Huang (talk \| contribs) 20 edits mNo edit summary Tag: Visual edit ← Previous edit		Latest revision as of 06:58, 26 August 2025 edit undo Citation bot (talk \| contribs) Bots 5,866,970 edits Add: article-number, bibcode. \| Use this bot. Report bugs. \| Suggested by Headbomb \| Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox \| #UCB_webform_linked 69/990
(206 intermediate revisions by 85 users not shown)
Line 1: {{Short description\|Class of artificial neural networks}} ~~{{AfC submission\|t\|\|ts=20210628052201\|u=Sheng-Yu PJ Huang\|ns=118\|demo=}}<!-- Important, do not remove this line before article has been created. -->~~ {{Machine learning\|Artificial neural network}} ~~[[File:Line graph construction (original).png\|thumb\|Graph with 5 nodes.]]~~ {{Use dmy dates\|date=July 2025}} In [[deep learning]], a '''graph neural network (GNN)''' is a subarea of [[neural network]], which is designed to process [[Graph theory\|graph]] structured data or data that is able to be formulated as a graph potentially<ref name=":0">{{Cite journal\|date=2020-01-01\|title=Graph neural networks: A review of methods and applications\|url=https://www.sciencedirect.com/science/article/pii/S2666651021000012\|journal=AI Open\|language=en\|volume=1\|pages=57–81\|doi=10.1016/j.aiopen.2021.01.001\|issn=2666-6510}}</ref><ref>{{Cite journal\|last=Zhang\|first=Si\|last2=Tong\|first2=Hanghang\|last3=Xu\|first3=Jiejun\|last4=Maciejewski\|first4=Ross\|date=2019-11-10\|title=Graph convolutional networks: a comprehensive review\|url=https://doi.org/10.1186/s40649-019-0069-y\|journal=Computational Social Networks\|volume=6\|issue=1\|pages=11\|doi=10.1186/s40649-019-0069-y\|issn=2197-4314}}</ref> (e.g. [[social network]], [[polygon mesh]], [[point cloud]]). Since graph data is [[Non-Euclidean geometry\|non-Euclidean]], relations between data points cannot be easily represented by their ordering when recording them, and hence [[Convolutional neural network\|standard CNN]] is not able to be directly applied to graph data. On the other hand, GNN not only be able to applied to non-Euclidean data but also Euclidean data such as sentences, images or videos since such data can be represented as graph data if organized properly. '''Graph neural networks''' ('''GNN''') are specialized [[artificial neural network]]s that are designed for tasks whose inputs are [[Graph (abstract data type)\|graphs]].<ref name="wucuipeizhao2022" /><ref name="scarselli2009" /><ref name="micheli2009" /><ref name="sanchez2021" /><ref name="daigavane2021" /> ~~== Pipeline of a GNN model<ref name=":0" /> ==~~ ~~The design pipeline for a GNN model can be generally derived as four steps: find graph structure, specify graph, design loss functions, and build model .~~ ~~[[File:Pipe-line of GNN.png\|thumb\|558x558px\|The general design pipeline for a GNN model.]]~~ One prominent example is molecular drug design.<ref>{{Cite journal \|last1=Stokes \|first1=Jonathan M. \|last2=Yang \|first2=Kevin \|last3=Swanson \|first3=Kyle \|last4=Jin \|first4=Wengong \|last5=Cubillos-Ruiz \|first5=Andres \|last6=Donghia \|first6=Nina M. \|last7=MacNair \|first7=Craig R. \|last8=French \|first8=Shawn \|last9=Carfrae \|first9=Lindsey A. \|last10=Bloom-Ackermann \|first10=Zohar \|last11=Tran \|first11=Victoria M. \|last12=Chiappino-Pepe \|first12=Anush \|last13=Badran \|first13=Ahmed H. \|last14=Andrews \|first14=Ian W. \|last15=Chory \|first15=Emma J. \|date=20 February 2020 \|title=A Deep Learning Approach to Antibiotic Discovery \|journal=Cell \|volume=180 \|issue=4 \|pages=688–702.e13 \|doi=10.1016/j.cell.2020.01.021 \|issn=1097-4172 \|pmc=8349178 \|pmid=32084340}}</ref><ref>{{cite arXiv\|last1=Yang \|first1=Kevin \|title=Analyzing Learned Molecular Representations for Property Prediction \|date=20 November 2019 \|eprint=1904.01561 \|last2=Swanson \|first2=Kyle \|last3=Jin \|first3=Wengong \|last4=Coley \|first4=Connor \|last5=Eiden \|first5=Philipp \|last6=Gao \|first6=Hua \|last7=Guzman-Perez \|first7=Angel \|last8=Hopper \|first8=Timothy \|last9=Kelley \|first9=Brian\|class=cs.LG }}</ref><ref>{{Cite journal \|last=Marchant \|first=Jo \|date=20 February 2020 \|title=Powerful antibiotics discovered using AI \|url=https://www.nature.com/articles/d41586-020-00018-3 \|journal=Nature \|language=en \|doi=10.1038/d41586-020-00018-3\|pmid=33603175 \|url-access=subscription }}</ref> Each input sample is a graph representation of a molecule, where atoms form the nodes and chemical bonds between atoms form the edges. In addition to the graph representation, the input also includes known chemical properties for each of the atoms. Dataset samples may thus differ in length, reflecting the varying numbers of atoms in molecules, and the varying number of bonds between them. The task is to predict the efficacy of a given molecule for a specific medical application, like eliminating [[Escherichia coli\|''E. coli'']] bacteria. ~~=== Find graph structure ===~~ ~~In [[graph theory]], a graph is denoted as <math>G=(V,E)</math>, where:~~ The key design element of GNNs is the use of ''pairwise message passing'', such that graph nodes iteratively update their representations by exchanging information with their neighbors. Several GNN architectures have been proposed,<ref name="scarselli2009" /><ref name="micheli2009" /><ref name="kipf2016" /><ref name="hamilton2017" /><ref name="velickovic2018" /> which implement different flavors of message passing,<ref name="bronstein2021" /><ref name="hajij2022" /> started by recursive<ref name="scarselli2009" /> or convolutional constructive<ref name="micheli2009" /> approaches. {{As of\|2022}}, it is an open question whether it is possible to define GNN architectures "going beyond" message passing, or instead every GNN can be built on message passing over suitably defined graphs.<ref name="velickovic2022" /> * <math>V</math>, a [[Set (mathematics)\|set]] of ''vertices'' (also called ''nodes'' or ''points''); * <math>E</math>, a [[Set (mathematics)\|set]] of ''edges'' (either directed or undirected, also called ''links'' or ''lines''); <math>A</math>, the given graph's [[adjacency matrix]]; <math>D</math>, the given graph's [[degree matrix]]. [[File:GNN building blocks.png\|thumb\|Basic building blocks of a graph neural network (GNN). <math>(1)</math> Permutation equivariant layer. <math>(2)</math> Local pooling layer. <math>(3)</math> Global pooling (or readout) layer. Colors indicate [[Feature (machine learning)\|features]].]] If the input data is already in graph structure, then this step is done. Otherwise, you need to observe the data first and reorganize it to be a graph according to your requirement, while not destroying the data's property (so that your model won't face the [[Garbage in, garbage out\|"garbage in, garbage out"]] problem). In the more general subject of "geometric [[deep learning]]", certain existing neural network architectures can be interpreted as GNNs operating on suitably defined graphs.<ref name=bronstein2021 /> A [[convolutional neural network]] layer, in the context of [[computer vision]], can be considered a GNN applied to graphs whose nodes are [[pixel]]s and only adjacent pixels are connected by edges in the graph. A [[Transformer (machine learning model)\|transformer]] layer, in [[natural language processing]], can be considered a GNN applied to [[complete graph]]s whose nodes are [[words]] or tokens in a passage of [[natural language]] text. ~~=== Specify graph ===~~ ~~[[File:Scene graph example.png\|thumb\|529x529px\|An example of scene graph.]]~~ After a graph structure is found in the given data, the type of this graph should also be specified. A graph can be simply categorize as [[Directed graph\|directed]]/[[Undirected graph\|undirected]] or [[Homogeneous graph\|homogeneous]]/[[Heterogeneous graph\|heterogeneous]]. Note that for heterogeneous graphs, each edge may differ to the others by its property. For example, each edge in a [[scene graph]]<ref>{{Cite journal\|last=Johnson\|first=Justin\|last2=Krishna\|first2=Ranjay\|last3=Stark\|first3=Michael\|last4=Li\|first4=Li-Jia\|last5=Shamma\|first5=David A.\|last6=Bernstein\|first6=Michael S.\|last7=Fei-Fei\|first7=Li\|title=Image retrieval using scene graphs\|url=https://ieeexplore.ieee.org/document/7298990\|journal=2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)\|pages=3668–3678\|doi=10.1109/CVPR.2015.7298990}}</ref> has different meaning to represent the relation between nodes. Sometimes the data's nodes can be merged to obtain graphs of different resolutions, and hence the graph structure may dynamically changed during the learning process. For example, when regarding [[point cloud]] as a graph, it is mostly a dynamic graph<ref name=":1">{{Cite journal\|last=Wang\|first=Yue\|last2=Sun\|first2=Yongbin\|last3=Liu\|first3=Z.\|last4=Sarma\|first4=Sanjay E.\|last5=Bronstein\|first5=M.\|last6=Solomon\|first6=J.\|date=2019\|title=Dynamic Graph CNN for Learning on Point Clouds\|url=https://www.semanticscholar.org/paper/Dynamic-Graph-CNN-for-Learning-on-Point-Clouds-Wang-Sun/e1799aaf23c12af6932dc0ef3dfb1638f01413d1\|journal=ACM Trans. Graph.\|doi=10.1145/3326362}}</ref><ref name=":2">{{Cite journal\|last=Thomas\|first=Hugues\|last2=Qi\|first2=Charles R.\|last3=Deschaud\|first3=Jean-Emmanuel\|last4=Marcotegui\|first4=Beatriz\|last5=Goulette\|first5=François\|last6=Guibas\|first6=Leonidas\|title=KPConv: Flexible and Deformable Convolution for Point Clouds\|url=https://ieeexplore.ieee.org/document/9010002\|journal=2019 IEEE/CVF International Conference on Computer Vision (ICCV)\|pages=6410–6419\|doi=10.1109/ICCV.2019.00651}}</ref><ref name=":3">{{Cite journal\|last=Lin\|first=Zhi-Hao\|last2=Huang\|first2=Sheng-Yu\|last3=Wang\|first3=Yu-Chiang Frank\|title=Convolution in the Cloud: Learning Deformable Kernels in 3D Graph Convolution Networks for Point Cloud Analysis\|url=https://ieeexplore.ieee.org/document/9156514\|journal=2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)\|pages=1797–1806\|doi=10.1109/CVPR42600.2020.00187}}</ref><ref name=":4">{{Cite journal\|last=Lin\|first=Zhi-Hao\|last2=Huang\|first2=Sheng Yu\|last3=Wang\|first3=Yu-Chiang Frank\|date=2021\|title=Learning of 3D Graph Convolution Networks for Point Cloud Analysis\|url=https://ieeexplore.ieee.org/abstract/document/9355025\|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence\|pages=1–1\|doi=10.1109/TPAMI.2021.3059758\|issn=1939-3539}}</ref>. Relevant application domains for GNNs include [[Natural Language Processing\|natural language processing]],<ref name="wuchen2023" /> [[social networks]],<ref name="ying2018" /> [[Citation graph\|citation networks]],<ref name="stanforddata" /> [[molecular biology]],<ref>{{cite journal \|last1=Zhang \|first1=Weihang \|last2=Cui \|first2=Yang \|last3=Liu \|first3=Bowen \|last4=Loza \|first4=Martin \|last5=Park \|first5=Sung-Joon \|last6=Nakai \|first6=Kenta \|date=5 April 2024 \|title=HyGAnno: Hybrid graph neural network-based cell type annotation for single-cell ATAC sequencing data \|url=https://academic.oup.com/bib/article/25/3/bbae152/7641197 \|journal=Briefings in Bioinformatics \|volume=25 \|issue=3 \|pages=bbae152 \|doi=10.1093/bib/bbae152\|pmid=38581422 \|pmc=10998639 }}</ref> chemistry,<ref name="gilmer2017" /><ref>{{Cite journal \|last1=Coley \|first1=Connor W. \|last2=Jin \|first2=Wengong \|last3=Rogers \|first3=Luke \|last4=Jamison \|first4=Timothy F. \|last5=Jaakkola \|first5=Tommi S. \|last6=Green \|first6=William H. \|last7=Barzilay \|first7=Regina \|last8=Jensen \|first8=Klavs F. \|date=2 January 2019 \|title=A graph-convolutional neural network model for the prediction of chemical reactivity \|journal=Chemical Science \|language=en \|volume=10 \|issue=2 \|pages=370–377 \|doi=10.1039/C8SC04228D \|pmid=30746086 \|pmc=6335848 \|issn=2041-6539\|doi-access=free }}</ref> [[physics]]<ref name=qasim2019 /> and [[NP-hard]] [[combinatorial optimization]] problems.<ref name="li2018" /> ~~=== Design loss function ===~~ Base on the task you are dealing with, [[Loss function\|loss functions]] have to be chosen wisely. For example, for a [[Supervised learning\|supervised]] node-level classification task, [[Cross entropy\|cross-entropy]] might be a reasonable choice. [[Open source]] [[Library (computing)\|libraries]] implementing GNNs include PyTorch Geometric<ref name=fey2019 /> ([[PyTorch]]), TensorFlow GNN<ref name=tfgnn2022 /> ([[TensorFlow]]), Deep Graph Library<ref>{{Cite web \|last= \|title=Deep Graph Library (DGL) \|url=https://www.dgl.ai/ \|access-date=12 September 2024 \|website=}}</ref> (framework agnostic), jraph<ref name=jraph2022/> ([[Google JAX]]), and GraphNeuralNetworks.jl<ref name=Lucibello2021GNN/>/GeometricFlux.jl<ref>{{Citation \|title=FluxML/GeometricFlux.jl \|date=31 January 2024 \|url=https://github.com/FluxML/GeometricFlux.jl \|access-date=3 February 2024 \|publisher=FluxML}}</ref> ([[Julia (programming language)\|Julia]], [[Flux (machine-learning framework)\|Flux]]). ~~=== Build model ===~~ * '''Propagation module''': updating information carried by nodes and/or edges by some aggregation methods. * '''Sampling module''': when a graph is too large, sampling modules are needed for computation preservation. * '''Pooling module''': when higher level information (sub-graph-level, graph-level) is needed for the task, pooling modules are needed to aggregate low-level information and provide hierarchical propagation. == Architecture == The architecture of a generic GNN implements the following fundamental [[Layer (deep learning)\|layers]]:<ref name=bronstein2021 /> ~~=== Convolution based methods ===~~ # <em>Permutation equivariant</em>: a permutation equivariant layer [[Map (mathematics)\|maps]] a representation of a graph into an updated representation of the same graph. In the literature, permutation equivariant layers are implemented via pairwise message passing between graph nodes.<ref name="bronstein2021" /><ref name="velickovic2022" /> Intuitively, in a message passing layer, nodes <em>update</em> their representations by <em>aggregating</em> the <em>messages</em> received from their immediate neighbours. As such, each message passing layer increases the receptive field of the GNN by one hop. The main idea of this type of methods is to generalize [[Convolutional neural network\|standard CNN]] or [[Attention (machine learning)\|attentional methods]] to graph structured data, and that is, to define some [[receptive fields]] with respect to given nodes and propagate information within. Based on the operation ___domain, we can further divide these methods into two groups, '''spectral approaches''' and '''spatial approaches'''. # <em>Local pooling</em>: a local [[pooling layer]] coarsens the graph via [[Downsampling (signal processing)\|downsampling]]. Local pooling is used to increase the receptive field of a GNN, in a similar fashion to pooling layers in [[convolutional neural network]]s. Examples include [[Nearest neighbor graph\|k-nearest neighbours pooling]], top-k pooling,<ref name="gao2019" /> and self-attention pooling.<ref name="lee2019" /> # <em>Global pooling</em>: a global pooling layer, also known as ''readout'' layer, provides fixed-size representation of the whole graph. The global pooling layer must be permutation invariant, such that permutations in the ordering of graph nodes and edges do not alter the final output.<ref name="lui2022" /> Examples include element-wise sum, mean or maximum. It has been demonstrated that GNNs cannot be more expressive than the [[Weisfeiler Leman graph isomorphism test\|Weisfeiler–Leman Graph Isomorphism Test]].<ref name="douglas2011" /><ref name="xu2019" /> In practice, this means that there exist different graph structures (e.g., [[molecules]] with the same [[atoms]] but different [[Chemical bond\|bonds]]) that cannot be distinguished by GNNs. More powerful GNNs operating on higher-dimension geometries such as [[simplicial complex]]es can be designed.<ref name=bronstein2021-2 /><ref name=grady2011discrete /><ref name=hajij2022 /> {{As of\|2022}}, whether or not future architectures will overcome the message passing primitive is an open research question.<ref name=velickovic2022 /> [[File:GNN representational limits.png\|thumb\|[[Graph isomorphism\|Non-isomorphic]] graphs that cannot be distinguished by a GNN due to the limitations of the Weisfeiler-Lehman Graph Isomorphism Test. Colors indicate node [[Feature (machine learning)\|features]].]] == Message passing layers == [[File:Message Passing Neural Network.png\|thumb\|Node representation update in a Message Passing Neural Network (MPNN) layer. Node <math>\mathbf{x}_0</math> receives messages sent by all of its immediate neighbours <math>\mathbf{x}_1</math> to <math>\mathbf{x}_4</math>. Messages are computing via the message function <math>\psi</math>, which accounts for the features of both senders and receiver.]] Message passing layers are permutation-equivariant layers mapping a graph into an updated representation of the same graph. Formally, they can be expressed as message passing neural networks (MPNNs).<ref name="bronstein2021" /> Let <math>G = (V,E)</math> be a [[Graph (discrete mathematics)\|graph]], where <math>V</math> is the node set and <math>E</math> is the edge set. Let <math>N_u</math> be the [[Neighbourhood (graph theory)\|neighbourhood]] of some node <math>u \in V</math>. Additionally, let <math>\mathbf{x}_u</math> be the [[Feature (machine learning)\|features]] of node <math>u \in V</math>, and <math>\mathbf{e}_{uv}</math> be the features of edge <math>(u, v) \in E</math>. An MPNN [[Layer (deep learning)\|layer]] can be expressed as follows:<ref name="bronstein2021" /> :<math>\mathbf{h}_u = \phi \left( \mathbf{x}_u, \bigoplus_{v \in N_u} \psi(\mathbf{x}_u, \mathbf{x}_v, \mathbf{e}_{uv}) \right)</math> where <math>\phi</math> and <math>\psi</math> are [[differentiable functions]] (e.g., [[artificial neural network]]s), and <math>\bigoplus</math> is a [[permutation]] [[Invariant (mathematics)\|invariant]] [[aggregation operator]] that can accept an arbitrary number of inputs (e.g., element-wise sum, mean, or max). In particular, <math>\phi</math> and <math>\psi</math> are referred to as ''update'' and ''message'' functions, respectively. Intuitively, in an MPNN computational block, graph nodes ''update'' their representations by ''aggregating'' the ''messages'' received from their neighbours. The outputs of one or more MPNN layers are node representations <math>\mathbf{h}_u</math> for each node <math>u \in V</math> in the graph. Node representations can be employed for any downstream task, such as node/graph [[Statistical classification\|classification]] or edge prediction. Graph nodes in an MPNN update their representation aggregating information from their immediate neighbours. As such, stacking <math>n</math> MPNN layers means that one node will be able to communicate with nodes that are at most <math>n</math> "hops" away. In principle, to ensure that every node receives information from every other node, one would need to stack a number of MPNN layers equal to the graph [[Diameter (graph theory)\|diameter]]. However, stacking many MPNN layers may cause issues such as oversmoothing<ref name=chen2021 /> and oversquashing.<ref name=alon2021 /> Oversmoothing refers to the issue of node representations becoming indistinguishable. Oversquashing refers to the bottleneck that is created by squeezing long-range dependencies into fixed-size representations. Countermeasures such as skip connections<ref name=hamilton2017 /><ref name=xu2021 /> (as in [[residual neural network]]s), gated update rules<ref name=li2016 /> and jumping knowledge<ref name=xu2018 /> can mitigate oversmoothing. Modifying the final layer to be a fully-adjacent layer, i.e., by considering the graph as a [[complete graph]], can mitigate oversquashing in problems where long-range dependencies are required.<ref name=alon2021 /> Other "flavours" of MPNN have been developed in the literature,<ref name=bronstein2021 /> such as graph convolutional networks<ref name=kipf2016 /> and graph attention networks,<ref name=velickovic2018 /> whose definitions can be expressed in terms of the MPNN formalism. === Graph convolutional network === The graph convolutional network (GCN) was first introduced by [[Thomas Kipf]] and [[Max Welling]] in 2017.<ref name=kipf2016 /> A GCN layer defines a [[Order of approximation\|first-order approximation]] of a localized spectral [[Filter (signal processing)\|filter]] on graphs. GCNs can be understood as a generalization of [[convolutional neural network]]s to graph-structured data. The formal expression of a GCN layer reads as follows: :<math>\mathbf{H} = \sigma\left(\tilde{\mathbf{D}}^{-\frac{1}{2}} \tilde{\mathbf{A}} \tilde{\mathbf{D}}^{-\frac{1}{2}} \mathbf{X} \mathbf{\Theta}\right)</math> where <math>\mathbf{H}</math> is the matrix of node representations <math>\mathbf{h}_u</math>, <math>\mathbf{X}</math> is the matrix of node features <math>\mathbf{x}_u</math>, <math>\sigma(\cdot)</math> is an [[activation function]] (e.g., [[ReLU]]), <math>\tilde{\mathbf{A}}</math> is the graph [[adjacency matrix]] with the addition of self-loops, <math>\tilde{\mathbf{D}}</math> is the graph [[degree matrix]] with the addition of self-loops, and <math>\mathbf{\Theta}</math> is a matrix of trainable parameters. In particular, let <math>\mathbf{A}</math> be the graph adjacency matrix: then, one can define <math>\tilde{\mathbf{A}} = \mathbf{A} + \mathbf{I}</math> and <math>\tilde{\mathbf{D}}_{ii} = \sum_{j \in V} \tilde{A}_{ij}</math>, where <math>\mathbf{I}</math> denotes the [[identity matrix]]. This normalization ensures that the [[eigenvalue]]s of <math>\tilde{\mathbf{D}}^{-\frac{1}{2}} \tilde{\mathbf{A}} \tilde{\mathbf{D}}^{-\frac{1}{2}}</math> are bounded in the range <math>[0, 1]</math>, avoiding [[Numerical stability\|numerical instabilities]] and [[Vanishing gradient problem\|exploding/vanishing gradients]]. A limitation of GCNs is that they do not allow multidimensional edge features <math>\mathbf{e}_{uv}</math>.<ref name=kipf2016 /> It is however possible to associate scalar weights <math>w_{uv}</math> to each edge by imposing <math>A_{uv} = w_{uv}</math>, i.e., by setting each nonzero entry in the adjacency matrix equal to the weight of the corresponding edge. === Graph attention network === The graph attention network (GAT) was introduced by [[Petar Veličković]] et al. in 2018.<ref name=velickovic2018 /> Graph attention network is a combination of a GNN and an attention layer. The implementation of attention layer in graphical neural networks helps provide attention or focus to the important information from the data instead of focusing on the whole data. A multi-head GAT layer can be expressed as follows: :<math> \mathbf{h}_u = \overset{K}{\underset{k=1}{\Big\Vert}} \sigma \left(\sum_{v \in N_u} \alpha_{uv} \mathbf{W}^k \mathbf{x}_v\right) </math> where <math>K</math> is the number of [[Attention (machine learning)\|attention]] heads, <math>\Big\Vert</math> denotes [[Concatenation\|vector concatenation]], <math>\sigma(\cdot)</math> is an [[activation function]] (e.g., [[ReLU]]), <math>\alpha_{ij}</math> are attention coefficients, and <math>W^k</math> is a matrix of trainable parameters for the <math>k</math>-th attention head. For the final GAT layer, the outputs from each attention head are averaged before the application of the activation function. Formally, the final GAT layer can be written as: :<math> \mathbf{h}_u = \sigma \left(\frac{1}{K}\sum_{k=1}^K \sum_{v \in N_u} \alpha_{uv} \mathbf{W}^k \mathbf{x}_v\right) </math> [[Attention (machine learning)\|Attention]] in Machine Learning is a technique that mimics [[Attention\|cognitive attention]]. In the context of learning on graphs, the attention coefficient <math>\alpha_{uv}</math> measures ''how important'' is node <math>u \in V</math> to node <math>v \in V</math>. Normalized attention coefficients are computed as follows: :<math>\alpha_{uv} = \frac{\exp(\text{LeakyReLU}\left(\mathbf{a}^T [\mathbf{W} \mathbf{x}_u \Vert \mathbf{W} \mathbf{x}_v \Vert \mathbf{e}_{uv}]\right))}{\sum_{z \in N_u}\exp(\text{LeakyReLU}\left(\mathbf{a}^T [\mathbf{W} \mathbf{x}_u \Vert \mathbf{W} \mathbf{x}_z \Vert \mathbf{e}_{uz}]\right))}</math> where <math>\mathbf{a}</math> is a vector of learnable weights, <math>\cdot^T</math> indicates [[Transpose\|transposition]], <math>\mathbf{e}_{uv}</math> are the edge features (if present), and <math>\text{LeakyReLU}</math> is a [[Rectifier (neural networks)\|modified ReLU]] activation function. Attention coefficients are normalized to make them easily comparable across different nodes.<ref name=velickovic2018 /> A GCN can be seen as a special case of a GAT where attention coefficients are not learnable, but fixed and equal to the edge weights <math>w_{uv}</math>. === Gated graph sequence neural network === The gated graph sequence neural network (GGS-NN) was introduced by [[Yujia Li]] et al. in 2015.<ref name=li2016 /> The GGS-NN extends the GNN formulation by Scarselli et al.<ref name=scarselli2009 /> to output sequences. The message passing framework is implemented as an update rule to a [[gated recurrent unit]] (GRU) cell. A GGS-NN can be expressed as follows: :<math>\mathbf{h}_u^{(0)} = \mathbf{x}_u \, \Vert \, \mathbf{0}</math> :<math>\mathbf{m}_u^{(l+1)} = \sum_{v \in N_u} \mathbf{\Theta} \mathbf{h}_v</math> :<math>\mathbf{h}_u^{(l+1)} = \text{GRU}(\mathbf{m}_u^{(l+1)}, \mathbf{h}_u^{(l)})</math> where <math>\Vert</math> denotes [[Concatenation\|vector concatenation]], <math>\mathbf{0}</math> is a vector of zeros, <math>\mathbf{\Theta}</math> is a matrix of learnable parameters, <math>\text{GRU}</math> is a GRU cell, and <math>l</math> denotes the sequence index. In a GGS-NN, the node representations are regarded as the hidden states of a GRU cell. The initial node features <math>\mathbf{x}_u^{(0)}</math> are [[Data structure alignment\|zero-padded]] up to the hidden state dimension of the GRU cell. The same GRU cell is used for updating representations for each node. == Local pooling layers == Local pooling layers coarsen the graph via downsampling. Subsequently, several learnable local pooling strategies that have been proposed are presented.<ref name=lui2022 /> For each case, the input is the initial graph represented by a matrix <math>\mathbf{X}</math> of node features, and the graph adjacency matrix <math>\mathbf{A}</math>. The output is the new matrix <math>\mathbf{X}'</math>of node features, and the new graph adjacency matrix <math>\mathbf{A}'</math>. === Top-k pooling === We first set <math>\mathbf{y} = \frac{\mathbf{X}\mathbf{p}}{\Vert\mathbf{p}\Vert}</math> where <math>\mathbf{p}</math> is a learnable [[Projection (mathematics)\|projection]] vector. The projection vector <math>\mathbf{p}</math> computes a scalar projection value for each graph node. The top-k pooling layer <ref name="gao2019" /> can then be formalised as follows: :<math>\mathbf{X}' = (\mathbf{X} \odot \text{sigmoid}(\mathbf{y}))_{\mathbf{i}}</math> :<math>\mathbf{A}' = \mathbf{A}_{\mathbf{i}, \mathbf{i}}</math> where <math>\mathbf{i} = \text{top}_k(\mathbf{y})</math> is the subset of nodes with the top-k highest projection scores, <math>\odot</math> denotes element-wise [[matrix multiplication]], and <math>\text{sigmoid}(\cdot)</math> is the [[sigmoid function]]. In other words, the nodes with the top-k highest projection scores are retained in the new adjacency matrix <math>\mathbf{A}'</math>. The <math>\text{sigmoid}(\cdot)</math> operation makes the projection vector <math>\mathbf{p}</math> trainable by [[backpropagation]], which otherwise would produce discrete outputs.<ref name="gao2019" /> === Self-attention pooling === We first set :<math>\mathbf{y} = \text{GNN}(\mathbf{X}, \mathbf{A})</math> where <math>\text{GNN}</math> is a generic permutation equivariant GNN layer (e.g., GCN, GAT, MPNN). The Self-attention pooling layer<ref name="lee2019" /> can then be formalised as follows: :<math>\mathbf{X}' = (\mathbf{X} \odot \mathbf{y})_{\mathbf{i}}</math> ~~==== Spectral approaches ====~~ For spectral approaches, a graph signal <math>x</math> is first transformed to spectral ___domain by the [[graph Fourier Transform]] <math>\mathcal{F}</math>, then the convolution operation can be conducted. After that, we can transform the result back to the original ___domain (spatial ___domain) by the [[Graph Fourier Transform\|inverse graph Fourier Transform]] <math>\mathcal{F^{-1}}</math>. <math>\mathcal{F}</math> and <math>\mathcal{F^{-1}}</math> are defined as: * :<math>\~~mathcal~~mathbf{FA}~~(x)~~' = U^\mathbf{TA}_{\mathbf{i}, \mathbf{i}}x</math> * <math>\mathcal{F^{-1}}(x) = Ux</math>, where <math>U\mathbf{i} = \text{top}_k(\mathbf{y})</math> is the ~~matrix~~subset of ~~eigenvectors~~nodes ofwith the ~~[[Graph~~top-k ~~Laplacian\|symmetric~~highest ~~normalized~~projection ~~graph Laplacian]]~~scores, <math>L^\~~text{sym}~~odot</math>. ~~<math>L^\text{sym}</math>~~denotes is[[Hadamard ~~denoted~~product ~~as:~~(matrices)\|element-wise matrix multiplication]]. The self-attention pooling layer can be seen as an extension of the top-k pooling layer. Differently from top-k pooling, the self-attention scores computed in self-attention pooling account both for the graph features and the graph topology. ~~<math>L^\text{sym} := D^{-\frac{1}{2}} L D^{-\frac{1}{2}} = I - D^{-\frac{1}{2}} A D^{-\frac{1}{2}} = U\Lambda U^{T}</math> ,~~ == Heterophilic Graph Learning == ~~where <math>L</math> is the [[Graph Laplacian\|original graph Laplacian]], <math>I</math> is an [[identity matrix]], and <math>\Lambda</math> is a diagonal matrix.~~ [[Homophily]] principle, i.e., nodes with the same labels or similar attributes are more likely to be connected, has been commonly believed to be the main reason for the superiority of Graph Neural Networks (GNNs) over traditional Neural Networks (NNs) on graph-structured data, especially on node-level tasks.<ref name=":0">{{cite arXiv \|eprint=2407.09618 \|last1=Luan \|first1=Sitao \|last2=Hua \|first2=Chenqing \|last3=Lu \|first3=Qincheng \|last4=Ma \|first4=Liheng \|last5=Wu \|first5=Lirong \|last6=Wang \|first6=Xinyu \|last7=Xu \|first7=Minkai \|last8=Chang \|first8=Xiao-Wen \|last9=Precup \|first9=Doina \|last10=Ying \|first10=Rex \|last11=Li \|first11=Stan Z. \|last12=Tang \|first12=Jian \|last13=Wolf \|first13=Guy \|last14=Jegelka \|first14=Stefanie \|title=The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges \|date=2024 \|class=cs.LG }}</ref> However, recent work has identified a non-trivial set of datasets where GNN’s performance compared to the NN’s is not satisfactory.<ref>{{Cite book \|last1=Luan \|first1=Sitao \|last2=Hua \|first2=Chenqing \|last3=Lu \|first3=Qincheng \|last4=Zhu \|first4=Jiaqi \|last5=Chang \|first5=Xiao-Wen \|last6=Precup \|first6=Doina \|chapter=When do We Need Graph Neural Networks for Node Classification? \|date=2024 \|editor-last=Cherifi \|editor-first=Hocine \|editor2-last=Rocha \|editor2-first=Luis M. \|editor3-last=Cherifi \|editor3-first=Chantal \|editor4-last=Donduran \|editor4-first=Murat \|title=Complex Networks & Their Applications XII \|chapter-url=https://link.springer.com/chapter/10.1007/978-3-031-53468-3_4 \|series=Studies in Computational Intelligence \|volume=1141 \|language=en \|___location=Cham \|publisher=Springer Nature Switzerland \|pages=37–48\|doi=10.1007/978-3-031-53468-3_4 \|isbn=978-3-031-53467-6 }}</ref> [[Heterophily]], i.e., low homophily, has been considered the main cause of this empirical observation.<ref name=":1">{{Cite journal \|last1=Luan \|first1=Sitao \|last2=Hua \|first2=Chenqing \|last3=Lu \|first3=Qincheng \|last4=Zhu \|first4=Jiaqi \|last5=Zhao \|first5=Mingde \|last6=Zhang \|first6=Shuyuan \|last7=Chang \|first7=Xiao-Wen \|last8=Precup \|first8=Doina \|date=6 December 2022 \|title=Revisiting Heterophily For Graph Neural Networks \|url=https://proceedings.neurips.cc/paper_files/paper/2022/hash/092359ce5cf60a80e882378944bf1be4-Abstract-Conference.html \|journal=Advances in Neural Information Processing Systems \|language=en \|volume=35 \|pages=1362–1375\|arxiv=2210.07606 }}</ref> People have begun to revisit and re-evaluate most existing graph models in the heterophily scenario across various kinds of graphs, e.g., [[Heterogeneous network\|heterogeneous graphs]], [[Temporal network\|temporal graphs]] and [[hypergraph]]s. Moreover, numerous graph-related applications are found to be closely related to the heterophily problem, e.g. [[Fraud detection\|graph fraud/anomaly detection]], [[Adversarial attack\|graph adversarial attacks and robustness]], privacy, [[federated learning]] and [[Point cloud\|point cloud segmentation]], [[cluster analysis\|graph clustering]], [[recommender system]]s, [[generative model]]s, [[link prediction]], [[Graph isomorphism\|graph classification]] and [[Graph coloring\|coloring]], etc. In the past few years, considerable effort has been devoted to studying and addressing the heterophily issue in graph learning.<ref name=":0" /><ref name=":1" /><ref>{{Cite journal \|last1=Luan \|first1=Sitao \|last2=Hua \|first2=Chenqing \|last3=Xu \|first3=Minkai \|last4=Lu \|first4=Qincheng \|last5=Zhu \|first5=Jiaqi \|last6=Chang \|first6=Xiao-Wen \|last7=Fu \|first7=Jie \|last8=Leskovec \|first8=Jure \|last9=Precup \|first9=Doina \|date=15 December 2023 \|title=When Do Graph Neural Networks Help with Node Classification? Investigating the Homophily Principle on Node Distinguishability \|url=https://proceedings.neurips.cc/paper_files/paper/2023/hash/5ba11de4c74548071899cf41dec078bf-Abstract-Conference.html \|journal=Advances in Neural Information Processing Systems \|language=en \|volume=36 \|pages=28748–28760}}</ref> == Applications == Therefore, based on the [[Graph Fourier Transform\|convolution's property]]<ref>{{Cite book\|last=Mallat\|first=Stephane\|url=https://books.google.com.tw/books?hl=zh-TW&lr=&id=hbVOfWQNtB8C&oi=fnd&pg=PP1&ots=qunXivn446&sig=-CBvvP9HIfpAFLnQU_d5R0RKLpg&redir_esc=y#v=onepage&q&f=false\|title=A Wavelet Tour of Signal Processing\|date=1999-09-14\|publisher=Elsevier\|isbn=978-0-08-052083-4\|language=en}}</ref>, the convolution of the signal <math>x</math> and a [[Kernel\|learnable kernel function]] <math>g</math> is defined as: === Protein folding === ~~<math>g\star x = \mathcal{F^{-1}}(\mathcal{F}(g)\odot \mathcal{F}(x)) = U(U^Tg\odot U^Tx)</math>,~~ {{See also\|AlphaFold}} Graph neural networks are one of the main building blocks of [[AlphaFold]], an artificial intelligence program developed by [[Google]]'s [[DeepMind]] for solving the [[protein folding]] problem in [[biology]]. AlphaFold achieved first place in several [[CASP]] competitions.<ref name=guardian2018 /><ref name=mit2020 /><ref name=xu2018 /> ~~and if we set the learnable kernel function to be a diagonal one <math>g_{\theta}</math>, this operation is further simplified to:~~ === Social networks === ~~<math>g_{\theta}\star x = Ug_{\theta}U^Tx</math>.~~ {{See also\|Recommender system}} [[Social networks]] are a major application ___domain for GNNs due to their natural representation as [[social graph]]s. GNNs are used to develop recommender systems based on both [[social relations]] and item relations.<ref name=fan2019 /><ref name=ying2018/> === Combinatorial optimization === ===== GCN<ref>{{Cite journal\|last=Kipf\|first=Thomas N.\|last2=Welling\|first2=Max\|date=2017-02-22\|title=Semi-Supervised Classification with Graph Convolutional Networks\|url=http://arxiv.org/abs/1609.02907\|journal=International Conference on Learning Representations (ICLR), 2017}}</ref> ===== {{See also\|Combinatorial optimization}} Given a input graph with <math>N</math> nodes and the node feature matrix <math>X</math>, GCN simplified the convolution layer to <math>H^{(l+1)} = \sigma (\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}H^{(l)}W^{(l)})</math>, where GNNs are used as fundamental building blocks for several combinatorial optimization algorithms.<ref name=cappart2021 /> Examples include computing [[Shortest path problem\|shortest paths]] or [[Eulerian path\|Eulerian circuits]] for a given graph,<ref name=li2016/> deriving [[Placement (electronic design automation)\|chip placements]] superior or competitive to handcrafted human solutions,<ref name=mirhoseini2021 /> and improving expert-designed branching rules in [[branch and bound]].<ref name=gasse2019 /> === Cyber security === * <math>\tilde{A} = A + I</math>; {{See also\|Intrusion detection system}} * <math>\tilde{D}_{ii} = \textstyle \sum_{j} \displaystyle\tilde{A}_{ij} </math>; When viewed as a graph, a network of computers can be analyzed with GNNs for anomaly detection. Anomalies within provenance graphs often correlate to malicious activity within the network. GNNs have been used to identify these anomalies on individual nodes<ref>{{Cite journal \|last1=Wang \|first1=Su \|last2=Wang \|first2=Zhiliang \|last3=Zhou \|first3=Tao \|last4=Sun \|first4=Hongbin \|last5=Yin \|first5=Xia \|last6=Han \|first6=Dongqi \|last7=Zhang \|first7=Han \|last8=Shi \|first8=Xingang \|last9=Yang \|first9=Jiahai \|date=2022 \|title=Threatrace: Detecting and Tracing Host-Based Threats in Node Level Through Provenance Graph Learning \|url=https://ieeexplore.ieee.org/document/9899459/;jsessionid=NzAXdLahhjEX-xmrFzOROk4qxoaz40aJFvKcZRgjck8-zCOucJi7!380715771 \|journal=IEEE Transactions on Information Forensics and Security \|volume=17 \|pages=3972–3987 \|doi=10.1109/TIFS.2022.3208815 \|issn=1556-6021\|arxiv=2111.04333 \|bibcode=2022ITIF...17.3972W \|s2cid=243847506 }}</ref> and within paths<ref>{{Cite journal \|last1=Wang \|first1=Qi \|last2=Hassan \|first2=Wajih Ul \|last3=Li \|first3=Ding \|last4=Jee \|first4=Kangkook \|last5=Yu \|first5=Xiao \|date=2020 \|title=You Are What You Do: Hunting Stealthy Malware via Data Provenance Analysis. \|journal=Network and Distributed Systems Security Symposium\|doi=10.14722/ndss.2020.24167 \|isbn=978-1-891562-61-7 \|s2cid=211267791 \|doi-access=free }}</ref> to detect malicious processes, or on the edge level<ref>{{Cite journal \|last1=King \|first1=Isaiah J. \|last2=Huang \|first2=H. Howie \|date=2022 \|title=Euler: Detecting Network Lateral Movement via Scalable Temporal Link Prediction \|url=https://www.ndss-symposium.org/wp-content/uploads/2022-107A-paper.pdf \|journal=In Proceedings of the 29th Network and Distributed Systems Security Symposium\|doi=10.14722/ndss.2022.24107 \|s2cid=248221601 }}</ref> to detect [[Network Lateral Movement\|lateral movement]]. * <math>H^{(l)}\in\mathbb{R}^{N\times D}</math> is the input [[Feature (machine learning)\|feature map]] at <math>l</math>-th layer, <math>H^{(0)} = X</math>; * <math>H^{(l+1)}\in\mathbb{R}^{N\times D'}</math> is the output [[Feature (machine learning)\|feature map]] at <math>l</math>-th layer; * <math>W^{(l)}\in\mathbb{R}^{D\times D'}</math>is the trainable weight matrix at <math>l</math>-th layer; * <math>\sigma (\centerdot) </math> is the [[activation function]]. === Water distribution networks === Since both <math>\tilde{A}</math> and <math>\tilde{D}</math> are able to be pre-calculated, this graph convolution method can be easily accelerated by GPU implementation. '''Note that this method only suits for undirected graphs with no edge features.''' {{See also\|Water distribution system}} Water distribution systems can be modelled as graphs, being then a straightforward application of GNN. This kind of algorithm has been applied to water demand forecasting,<ref>{{cite journal \|url=https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2022WR032299\|title=Graph Convolutional Recurrent Neural Networks for Water Demand Forecasting\|last=Zanfei\|first=Ariele \|display-authors=etal \|date=2022\|journal=Water Resources Research\|volume=58 \|issue=7 \|article-number=e2022WR032299 \|publisher=AGU\|doi=10.1029/2022WR032299 \|bibcode=2022WRR....5832299Z \|access-date=11 June 2024}}</ref> interconnecting District Measuring Areas to improve the forecasting capacity. Other application of this algorithm on water distribution modelling is the development of metamodels.<ref>{{cite journal \|url=https://www.sciencedirect.com/science/article/abs/pii/S0043135423007005\|title=Shall we always use hydraulic models? A graph neural network metamodel for water system calibration and uncertainty assessment\|last=Zanfei\|first=Ariele \|journal=Water Research \|display-authors=etal \|date=2023\|volume=242 \|article-number=120264 \|doi=10.1016/j.watres.2023.120264 \|pmid=37393807 \|bibcode=2023WatRe.24220264Z \|access-date=11 June 2024\|url-access=subscription }}</ref> ~~==== Spatial approaches ====~~ Spatial approaches directly design convolution operation on the graph based on the graph topology (hence called '''spatial'''), making these methods more flexible compared with spectral approaches. Since the size of neighbors is mostly different for each node within a graph, designing an efficient way to define receptive fields and feature propagation is the prime challenge of such approaches. Unlike spectral approaches that severely affected by the global graph structure, spatial approaches mostly focus on local relations between nodes and edge properties, and the global properties can be found by apply pooling mechanisms between convolution layers properly.[[File:Graph attention network.png\|thumb\|310x310px\|An illustration of k-head-attention-based GNN (GAT here). This example is when k=3.]] ===== GAT<ref>{{Cite journal\|last=Veličković\|first=Petar\|last2=Cucurull\|first2=Guillem\|last3=Casanova\|first3=Arantxa\|last4=Romero\|first4=Adriana\|last5=Liò\|first5=Pietro\|last6=Bengio\|first6=Yoshua\|date=2018-02-04\|title=Graph Attention Networks\|url=http://arxiv.org/abs/1710.10903\|journal=International Conference on Learning Representations (ICLR), 2018}}</ref> and GaAN<ref>{{Cite journal\|last=Zhang\|first=Jiani\|last2=Shi\|first2=Xingjian\|last3=Xie\|first3=Junyuan\|last4=Ma\|first4=Hao\|last5=King\|first5=Irwin\|last6=Yeung\|first6=Dit-Yan\|date=2018-03-20\|title=GaAN: Gated Attention Networks for Learning on Large and Spatiotemporal Graphs\|url=http://arxiv.org/abs/1803.07294\|journal=arXiv:1803.07294 [cs]}}</ref> ===== [[Attention (machine learning)\|Attentional networks]] have already gain huge success in multiple deep learning areas, especially sequenced data related works. As nodes features of a graph can be represented as a unordered dat sequence, the graph attentional network (GAT) and the gated attention network (GaAN) make use of the benefit that multi-head attention model can automatically learn the importance of each neighbor with respect to different heads. === Computer Vision === ===== GCN for scene graph<ref>{{Cite journal\|last=Johnson\|first=Justin\|last2=Gupta\|first2=Agrim\|last3=Fei-Fei\|first3=Li\|date=2018-04-04\|title=Image Generation from Scene Graphs\|url=https://openaccess.thecvf.com/content_cvpr_2018/CameraReady/0764.pdf\|journal=CVPR (2018)}}</ref><ref>{{Cite journal\|last=Dhamo\|first=Helisa\|last2=Farshad\|first2=Azade\|last3=Laina\|first3=Iro\|last4=Navab\|first4=Nassir\|last5=Hager\|first5=Gregory D.\|last6=Tombari\|first6=Federico\|last7=Rupprecht\|first7=Christian\|date=2020-04-07\|title=Semantic Image Manipulation Using Scene Graphs\|url=https://openaccess.thecvf.com/content_CVPR_2020/papers/Dhamo_Semantic_Image_Manipulation_Using_Scene_Graphs_CVPR_2020_paper.pdf\|journal=CVPR (2020)}}</ref><ref>{{Cite journal\|last=Wald\|first=Johanna\|last2=Dhamo\|first2=Helisa\|last3=Navab\|first3=Nassir\|last4=Tombari\|first4=Federico\|date=2020\|title=Learning 3D Semantic Scene Graphs From 3D Indoor Reconstructions\|url=https://openaccess.thecvf.com/content_CVPR_2020/html/Wald_Learning_3D_Semantic_Scene_Graphs_From_3D_Indoor_Reconstructions_CVPR_2020_paper.html\|journal=CVPR (2020)\|pages=3961–3970}}</ref> ===== {{See also\|Computer vision}} [[Scene graph\|Scene graphs]] have different edge features which indicate the semantic relations between neighboring nodes, and therefore when designing convolution operations on such structure, both node features and edge features are updated. GCN for scene graph usually regard the features of two neighboring nodes and the edge between as a triplet, and update the edge feature by passing this triplet through [[Multilayer perceptron\|MLPs]]. As for node features, the updating method is similar to GCN, instead not only considering neighbor points' feature but also edge features. To represent an image as a graph structure, the image is first divided into multiple patches, each of which is treated as a node in the graph. Edges are then formed by connecting each node to its nearest neighbors based on spatial or feature similarity. This graph-based representation enables the application of graph learning models to visual tasks. The relational structure helps to enhance feature extraction and improve performance on image understanding.<ref>{{cite arXiv \|eprint=2206.00272 \|last1=Han \|first1=Kai \|last2=Wang \|first2=Yunhe \|last3=Guo \|first3=Jianyuan \|last4=Tang \|first4=Yehui \|last5=Wu \|first5=Enhua \|title=Vision GNN: An Image is Worth Graph of Nodes \|date=2022 \|class=cs.CV }}</ref> ===== GCN for point cloud analysis<ref name=":1" /><ref name=":2" /><ref name=":3" /><ref name=":4" /><ref name=":5">{{Cite journal\|last=Shen\|first=Yiru\|last2=Feng\|first2=Chen\|last3=Yang\|first3=Yaoqing\|last4=Tian\|first4=Dong\|date=2018\|title=Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling\|url=https://openaccess.thecvf.com/content_cvpr_2018/html/Shen_Mining_Point_Cloud_CVPR_2018_paper.html\|journal=CVPR (2018)\|pages=4548–4557}}</ref> ===== [[Point cloud\|Point clouds]] are some point sets lies in 3D space with no edges between each point, so the original format of such data is not a graph. However, we can dynamically construct graph structures from point clouds by connecting points which satisfy given relation (mostly [[K-nearest neighbors algorithm\|kNN]] or distance smaller than some thresholds), and the constructed graph can also be dynamically changed when sub-sampling or pooling methods are applied. ~~===~~=== ~~KC-Net<ref~~Text ~~name=":5"~~and />NLP ~~===~~=== {{See also\|Natural language processing}} The graph in KC-Net is constructed by kNN. They design a 3D kernel which composed of several learnable 3D points, and the convolution of a given point as center is operated by calculate the similarity between each pair of "'''one kernel point and relative position of one neighboring point of the given center point'''". KC-Net also provides a graph max-pooling module to better capture higher level features of point clouds. Graph-based representation of text helps to capture deeper semantic relationships between words. Many studies have used graph networks to enhance performance in various text processing tasks such as text classification, question answering, Neural Machine Translation (NMT), event extraction, fact verification, etc.<ref>{{Cite journal \|last1=Zhou \|first1=Jie \|last2=Cui \|first2=Ganqu \|last3=Hu \|first3=Shengding \|last4=Zhang \|first4=Zhengyan \|last5=Yang \|first5=Cheng \|last6=Liu \|first6=Zhiyuan \|last7=Wang \|first7=Lifeng \|last8=Li \|first8=Changcheng \|last9=Sun \|first9=Maosong \|date=1 January 2020 \|title=Graph neural networks: A review of methods and applications \|journal=AI Open \|volume=1 \|pages=57–81 \|doi=10.1016/j.aiopen.2021.01.001 \|issn=2666-6510\|doi-access=free }}</ref> ~~====== DGCNN<ref name=":1" /> ======~~ Convolution method used is DGCNN is using two sets of learnable weights to aggregate feature of a given center point and feature difference between this point and each of its neighbors separately. Although not using pooling module to obtain multi-scale feature, DGCNN dynamically re-define the graph by changing the neighbor number and considering distances in feature space instead of the original 3D space. This idea helps DGCNN better captures semantic information. ==References== ~~====== KP-Conv<ref name=":2" /> ======~~ {{Reflist\|refs= The neighbor points of a given center point of KP-Conv is chosen by a given radius, all the points inside would regard as the center point's neighbor. While the key idea is similar to KC-Net, kernel points of KP-Conv can be either deformable or stable based on the current task. To explore geometry in different scale, KP-Conv provides pooling module by regional sampling. <ref name="wuchen2023">{{Cite journal\|last1=Wu\|first1=Lingfei\|last2=Chen\|first2=Yu\|last3=Shen \|first3=Kai\|last4=Guo\|first4=Xiaojie\|last5=Gao\|first5=Hanning\|last6=Li\|first6=Shucheng\|last7=Pei\|first7=Jian\|last8=Long\|first8=Bo\|date=2023\|title=Graph Neural Networks for Natural Language Processing: A Survey \|url=https://www.nowpublishers.com/article/Details/MAL-096\|journal=Foundations and Trends in Machine Learning\|volume=16\|issue=2\|pages=119–328\|doi=10.1561/2200000096 \|pmid=19068426\|s2cid=206756462\|issn=1941-0093\|arxiv=2106.06090}}</ref> <ref name="wucuipeizhao2022">{{Cite journal\|last1=Wu\|first1=Lingfei\|last2=Cui\|first2=Peng\|last3=Pei \|first3=Jian\|last4=Zhao\|first4=Liang\|date=2022\|title=Graph Neural Networks: Foundations, Frontiers, and Applications\|url=https://graph-neural-networks.github.io/\|journal=Springer Singapore\|pages=725\|url-access=<!--WP:URLACCESS-->}}</ref> <ref name="scarselli2009">{{Cite journal\|last1=Scarselli\|first1=Franco\|last2=Gori\|first2=Marco\|last3=Tsoi \|first3=Ah Chung\|last4=Hagenbuchner\|first4=Markus\|last5=Monfardini\|first5=Gabriele\|date=2009\|title=The Graph Neural Network Model\|journal=IEEE Transactions on Neural Networks\|volume=20\|issue=1\|pages=61–80\|doi=10.1109/TNN.2008.2005605 \|pmid=19068426\|bibcode=2009ITNN...20...61S \|s2cid=206756462\|issn=1941-0093}}</ref> <ref name="micheli2009">{{Cite journal\|last1=Micheli\|first1=Alessio\|title=Neural Network for Graphs: A Contextual Constructive Approach\|journal=IEEE Transactions on Neural Networks\|year=2009 \|volume=20\|issue=3\|pages=498–511\|doi=10.1109/TNN.2008.2010350 \|pmid=19193509\|bibcode=2009ITNN...20..498M \|s2cid=17486263\|issn=1045-9227}}</ref> <ref name="sanchez2021">{{Cite journal\|last1=Sanchez-Lengeling\|first1=Benjamin\|last2=Reif\|first2=Emily \|last3=Pearce\|first3=Adam\|last4=Wiltschko\|first4=Alex\|date=2 September 2021\|title=A Gentle Introduction to Graph Neural Networks\|url=https://distill.pub/2021/gnn-intro\|journal=Distill\|volume=6\|issue=9\|pages=e33 \|doi=10.23915/distill.00033\|issn=2476-0757\|doi-access=free}}</ref> <ref name="daigavane2021">{{Cite journal\|last1=Daigavane\|first1=Ameya\|last2=Ravindran\|first2=Balaraman \|last3=Aggarwal\|first3=Gaurav\|date=2 September 2021\|title=Understanding Convolutions on Graphs \|url=https://distill.pub/2021/understanding-gnns\|journal=Distill\|volume=6\|issue=9\|pages=e32 \|doi=10.23915/distill.00032\|s2cid=239678898\|issn=2476-0757\|doi-access=free}}</ref> <ref name="gilmer2017">{{Cite journal\|last1=Gilmer\|first1=Justin\|last2=Schoenholz\|first2=Samuel S. \|last3=Riley\|first3=Patrick F.\|last4=Vinyals\|first4=Oriol\|last5=Dahl\|first5=George E.\|date=17 July 2017\|title=Neural Message Passing for Quantum Chemistry\|url=http://proceedings.mlr.press/v70/gilmer17a.html \|journal=Proceedings of Machine Learning Research\|language=en\|pages=1263–1272\|arxiv=1704.01212}}</ref> <ref name="kipf2016">{{Cite journal\|last1=Kipf\|first1=Thomas N\|last2=Welling\|first2=Max\|date=2016 \|title=Semi-supervised classification with graph convolutional networks\|journal=IEEE Transactions on Neural Networks \|volume=5\|issue=1\|pages=61–80 \|doi=10.1109/TNN.2008.2005605\|pmid=19068426\|arxiv=1609.02907\|bibcode=2009ITNN...20...61S \|s2cid=206756462}}</ref> <ref name="hamilton2017">{{Cite journal\|last1=Hamilton\|first1=William\|last2=Ying\|first2=Rex \|last3=Leskovec\|first3=Jure\|date=2017\|title=Inductive Representation Learning on Large Graphs\|url=https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf\|journal=Neural Information Processing Systems\|volume=31\|arxiv=1706.02216\|via=Stanford}}</ref> <ref name="velickovic2018">{{Cite arXiv\|last1=Veličković\|first1=Petar\|last2=Cucurull\|first2=Guillem \|last3=Casanova\|first3=Arantxa\|last4=Romero\|first4=Adriana\|last5=Liò\|first5=Pietro\|last6=Bengio \|first6=Yoshua\|date=4 February 2018 \|title=Graph Attention Networks\|eprint=1710.10903 \|class=stat.ML}}</ref> <ref name=stanforddata>{{Cite web\|title=Stanford Large Network Dataset Collection \|url=https://snap.stanford.edu/data/\|access-date=5 July 2021\|website=snap.stanford.edu}}</ref> <ref name="li2018">{{cite book \|last1=Li \|first1=Zhuwen \|last2=Chen \|first2=Qifeng \|last3=Koltun \|first3=Vladlen \|title=Neural Information Processing \|chapter=Text Simplification with Self-Attention-Based Pointer-Generator Networks \|series=Lecture Notes in Computer Science \|date=2018 \|volume=31 \|pages=537–546 \|doi=10.1007/978-3-030-04221-9_48 \|arxiv=1810.10659 \|isbn=978-3-030-04220-2 }}</ref> <ref name="bronstein2021">{{cite arXiv \|last1=Bronstein \|first1=Michael M. \|last2=Bruna \|first2=Joan \|last3=Cohen \|first3=Taco \|last4=Veličković \|first4=Petar \|title=Geometric Deep Learning: Grids, Groups, Graphs Geodesics and Gauges \|date=4 May 2021 \|class=cs.LG \|eprint=2104.13478}}</ref> <ref name=douglas2011>{{cite arXiv\|last=Douglas\|first=B. L.\|date=27 January 2011\|title=The Weisfeiler–Lehman Method and Graph Isomorphism Testing\|class=math.CO\|eprint=1101.5211}}</ref> <ref name=xu2019>{{Cite arXiv\|last1=Xu\|first1=Keyulu\|last2=Hu\|first2=Weihua\|last3=Leskovec\|first3=Jure \|last4=Jegelka\|first4=Stefanie\|author4-link=Stefanie Jegelka\|date=22 February 2019\|title=How Powerful are Graph Neural Networks? \|eprint=1810.00826 \|class=cs.LG}}</ref> <ref name=velickovic2022>{{cite arXiv \|last1=Veličković \|first1=Petar \|title=Message passing all the way up \|year=2022 \|class=cs.LG \|eprint=2202.11097}}</ref> <ref name=qasim2019>{{cite journal \|last1=Qasim \|first1=Shah Rukh \|last2=Kieseler \|first2=Jan \|last3=Iiyama \|first3=Yutaro \|last4=Pierini \|first4=Maurizio Pierini \|title=Learning representations of irregular particle-detector geometry with distance-weighted graph networks \|journal=The European Physical Journal C \|date=2019 \|volume=79 \|issue=7 \|page=608 \|doi=10.1140/epjc/s10052-019-7113-9\|s2cid=88518244 \|doi-access=free \|arxiv=1902.07987 \|bibcode=2019EPJC...79..608Q }}</ref> <ref name=lui2022>{{cite arXiv \|last1=Liu \|first1=Chuang \|last2=Zhan \|first2=Yibing \|last3=Li \|first3=Chang \|last4=Du \|first4=Bo \|last5=Wu \|first5=Jia \|last6=Hu \|first6=Wenbin \|last7=Liu \|first7=Tongliang \|last8=Tao \|first8=Dacheng \|title=Graph Pooling for Graph Neural Networks: Progress, Challenges, and Opportunities \|date=2022 \|class=cs.LG \|eprint=2204.07321}}</ref> <ref name=hajij2022>{{cite arXiv \|last1=Hajij \|first1=M. \|last2=Zamzmi \|first2=G. \|last3=Papamarkou \|first3=T. \|last4=Miolane \|first4=N. \|last5=Guzmán-Sáenz \|first5=A. \|last6=Ramamurthy \|first6=K. N. \|last7=Schaub \|first7=M. T. \|title=Topological deep learning: Going beyond graph data \|date=2022 \|class=cs.LG \|eprint=2206.00606}}</ref> <ref name=bronstein2021-2>{{cite arXiv \|last1=Bodnar \|first1=Christian \|last2=Frasca \|first2=Fabrizio \|last3=Guang Wang \|first3=Yu \|last4=Otter \|first4=Nina \|last5=Montúfar \|first5=Guido \|last6=Liò \|first6=Pietro \|last7=Bronstein \|first7=Michael \|title=Weisfeiler and Lehman Go Topological: Message Passing Simplicial Networks \|date=2021 \|class=cs.LG \|eprint=2103.03212}}</ref> <ref name=gao2019>{{cite arXiv \|last1=Gao \|first1=Hongyang \|last2=Ji \|first2=Shuiwang Ji \|title=Graph U-Nets \|date=2019 \|class=cs.LG \|eprint=1905.05178}}</ref> <ref name=lee2019>{{cite arXiv \|last1=Lee \|first1=Junhyun \|last2=Lee \|first2=Inyeop \|last3=Kang \|first3=Jaewoo \|title=Self-Attention Graph Pooling \|date=2019 \|class=cs.LG \|eprint=1904.08082}}</ref> <ref name=alon2021>{{cite arXiv \|last1=Alon \|first1=Uri \|last2=Yahav \|first2=Eran \|title=On the Bottleneck of Graph Neural Networks and its Practical Implications \|date=2021 \|class=cs.LG \|eprint=2006.05205}}</ref> <ref name=chen2021>{{cite journal \|last1=Chen \|first1=Deli \|last2=Lin \|first2=Yankai \|last3=Li \|first3=Wei \|last4=Li \|first4=Peng \|last5=Zhou \|first5=Jie \|last6=Sun \|first6=Xu \|title=Measuring and Relieving the Over-Smoothing Problem for Graph Neural Networks from the Topological View \|journal=Proceedings of the AAAI Conference on Artificial Intelligence \|date=2020 \|volume=34 \|issue=4 \|pages=3438–3445 \|doi=10.1609/aaai.v34i04.5747 \|s2cid=202539008 \|arxiv=1909.03211}}</ref> <ref name=guardian2018>{{cite news\|last1=Sample\|first1=Ian\|date=2 December 2018\|title=Google's DeepMind predicts 3D shapes of proteins\|work=The Guardian\|url=https://www.theguardian.com/science/2018/dec/02/google-deepminds-ai-program-alphafold-predicts-3d-shapes-of-proteins\|access-date=30 November 2020}}</ref> <ref name=mit2020>{{cite web \|title=DeepMind's protein-folding AI has solved a 50-year-old grand challenge of biology \|url=https://www.technologyreview.com/2020/11/30/1012712/deepmind-protein-folding-ai-solved-biology-science-drugs-disease/ \|website=MIT Technology Review \|access-date=30 November 2020 \|language=en}}</ref> <ref name=fan2019>{{cite book \|last1=Fan \|first1=Wenqi \|last2=Ma \|first2=Yao \|last3=Li \|first3=Qing \|last4=He \|first4=Yuan \|last5=Zhao \|first5=Eric \|last6=Tang \|first6=Jiliang \|last7=Yin \|first7=Dawei \|title=Graph Neural Networks for Social Recommendation \|date=2019 \|pages=417–426 \|doi=10.1145/3308558.3313488 \|hdl=10397/81232 \|isbn=9781450366748 \|s2cid=67769538 \|arxiv=1902.07243}}</ref> <ref name=ying2018>{{cite book \|last1=Ying \|first1=Rex \|last2=He \|first2=Ruining \|last3=Chen \|first3=Kaifeng \|last4=Eksombatchai \|first4=Pong \|last5=Hamilton \|first5=William L. \|last6=Leskovec \|first6=Jure \|title=Graph Convolutional Neural Networks for Web-Scale Recommender Systems \|date=2018 \|pages=974–983 \|doi=10.1145/3219819.3219890 \|isbn=9781450355520 \|s2cid=46949657 \|arxiv=1806.01973}}</ref> <ref name=gasse2019>{{cite arXiv \|last1=Gasse \|first1=Maxime \|last2=Chételat \|first2=Didier \|last3=Ferroni \|first3=Nicola \|last4=Charlin \|first4=Laurent \|last5=Lodi \|first5=Andrea \|title=Exact Combinatorial Optimization with Graph Convolutional Neural Networks \|date=2019 \|class=cs.LG \|eprint=1906.01629}}</ref> <ref name=cappart2021>{{cite arXiv \|last1=Cappart \|first1=Quentin \|last2=Chételat \|first2=Didier \|last3=Khalil \|first3=Elias \|last4=Lodi \|first4=Andrea \|last5=Morris \|first5=Christopher \|last6=Veličković \|first6=Petar \|title=Combinatorial optimization and reasoning with graph neural networks \|year=2021 \|class=cs.LG \|eprint=2102.09544}}</ref> <ref name=mirhoseini2021>{{cite journal \|last1=Mirhoseini \|first1=Azalia \|last2=Goldie \|first2=Anna \|last3=Yazgan \|first3=Mustafa \|last4=Jiang \|first4=Joe Wenjie \|last5=Songhori \|first5=Ebrahim \|last6=Wang \|first6=Shen \|last7=Lee \|first7=Young-Joon \|last8=Johnson \|first8=Eric \|last9=Pathak \|first9=Omkar \|last10=Nazi \|first10=Azade \|last11=Pak \|first11=Jiwoo \|last12=Tong \|first12=Andy \|last13=Srinivasa \|first13=Kavya \|last14=Hang \|first14=William \|last15=Tuncer \|first15=Emre \|last16=Le \|first16=Quoc V. \|last17=Laudon \|first17=James \|last18=Ho \|first18=Richard \|last19=Carpenter \|first19=Roger \|last20=Dean \|first20=Jeff \|title=A graph placement methodology for fast chip design \|journal=Nature \|date=2021 \|volume=594 \|issue=7862 \|pages=207–212 \|doi=10.1038/s41586-021-03544-w\|pmid=34108699 \|bibcode=2021Natur.594..207M \|s2cid=235395490 }}</ref> <ref name=fey2019>{{cite arXiv \|last1=Matthias \|first1=Fey \|last2=Lenssen \|first2=Jan E. \|title=Fast Graph Representation Learning with PyTorch Geometric \|date=2019 \|class=cs.LG \|eprint=1903.02428}}</ref> <ref name=tfgnn2022>{{cite web \|title=Tensorflow GNN \|website=[[GitHub]] \|url=https://github.com/tensorflow/gnn \|access-date=30 June 2022}}</ref> <ref name=jraph2022>{{cite web \|title=jraph \|website=[[GitHub]] \|url=https://github.com/deepmind/jraph \|access-date=30 June 2022}}</ref> <ref name=xu2021>{{cite arXiv \|last1=Xu \|first1=Keyulu \|last2=Zhang \|first2=Mozhi \|last3=Jegelka \|first3=Stephanie\|author3-link=Stefanie Jegelka \|last4=Kawaguchi \|first4=Kenji \|title=Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth \|date=2021 \|class=cs.LG \|eprint=2105.04550}}</ref> <ref name=li2016>{{cite arXiv \|last1=Li \|first1=Yujia \|last2=Tarlow \|first2=Daniel \|last3=Brockschmidt \|first3=Mark \|last4=Zemel \|first4=Richard \|title=Gated Graph Sequence Neural Networks \|date=2016 \|class=cs.LG \|eprint=1511.05493 }}</ref> <ref name=grady2011discrete>{{cite book \|last1=Grady \|first1=Leo \|last2=Polimeni \|first2=Jonathan \|title=Discrete Calculus: Applied Analysis on Graphs for Computational Science \|url=http://leogrady.net/wp-content/uploads/2017/01/grady2010discrete.pdf \|date=2011 \|publisher=Springer }}</ref> <ref name=xu2018>{{cite arXiv \|last1=Xu \|first1=Keyulu \|last2=Li \|first2=Chengtao \|last3=Tian \|first3=Yonglong \|last4=Sonobe \|first4=Tomohiro \|last5=Kawarabayashi \|first5=Ken-ichi \|last6=Jegelka \|first6=Stefanie\|author6-link=Stefanie Jegelka \|title=Representation Learning on Graphs with Jumping Knowledge Networks \|date=2018 \|class=cs.LG \|eprint=1806.03536}}</ref> <ref name=Lucibello2021GNN>{{cite web \|last=Lucibello \|first=Carlo \|title=GraphNeuralNetworks.jl \|website=[[GitHub]] \|url=https://github.com/CarloLucibello/GraphNeuralNetworks.jl \|year=2021 \|access-date=21 September 2023}}</ref> }} == External links == ~~====== 3D-GCN<ref name=":3" /><ref name=":4" /> ======~~ * [https://distill.pub/2021/gnn-intro/ A Gentle Introduction to Graph Neural Networks] ~~[[File:Illustration of 3D-GCN's receptive field and kernel.png\|thumb\|390x390px\|Illustration of 3D-GCN's receptive field and kernel]]~~ Graphs are constructed by kNN. 3D-GCN designs deformable 3D kernels as each kernel has one center kernel point <math>k_C\in \mathbb{R}^3</math> and several support points <math>k_1, k_2, ... k_S\in \mathbb{R}^3</math>. Given a data point <math>p_n\in \mathbb{R}^3</math>and its neighbor points <math>p_1, p_2, ... p_M \in \mathbb{R}^3</math>, the convolution is operated by taking the direction vector of the center point to each neighbor <math>p_m - p_n</math> and the direction vector of the center kernel point to each support <math>k_s - k_C = k_s</math>(since <math>k_C</math> is set to be <math>(0,0,0)</math>), calculate their [[cosine similarity]], and then map this similarity to feature space by another learnable parameters <math>\mathcal{w}</math>. Since the convolution is calculated by cosine similarity instead of exact coordinate, 3D-GCN better captures an 3D object's geometry instead of ___location, and is totally '''shift and scale invariant.''' Similar to KC-Net, 3D-GCN also design a graph max-pooling to explore multi-resolution information, while preserving the largest activation. {{Artificial intelligence navbox}} ~~=== Recurrent based methods ===~~ [[Category:Neural network architectures]] ~~== References ==~~ [[Category:Semisupervised learning]] ~~<!-- Inline citations added to your article will automatically display here. See en.wikipedia.org/wiki/WP:REFB for instructions on how to add citations. -->~~ [[Category:Supervised learning]] ~~{{reflist}}~~ [[Category:Unsupervised learning]] [[Category:Artificial neural networks]] [[Category:Graph algorithms]] [[Category:2009 in artificial intelligence]]