Graph neural network: Difference between revisions

Content deleted Content added
Corrected the claim about GNNs w.r.t. WL isomorphism.
Line 1:
{{Machine learning|Artificial neural network}}
A '''graph neural network (GNN)''' is a class of [[neural network]] for processing data best represented by [[Graph (abstract data type)|graph data structures]].<ref>{{Cite journal|last1=Scarselli|first1=Franco|last2=Gori|first2=Marco|last3=Tsoi|first3=Ah Chung|last4=Hagenbuchner|first4=Markus|last5=Monfardini|first5=Gabriele|date=2009|title=The Graph Neural Network Model|url=https://ieeexplore.ieee.org/document/4700287|journal=IEEE Transactions on Neural Networks|volume=20|issue=1|pages=61–80|doi=10.1109/TNN.2008.2005605|pmid=19068426|s2cid=206756462|issn=1941-0093}}</ref><ref>{{Cite journal|last1=Sanchez-Lengeling|first1=Benjamin|last2=Reif|first2=Emily|last3=Pearce|first3=Adam|last4=Wiltschko|first4=Alex|date=2021-09-02|title=A Gentle Introduction to Graph Neural Networks|url=https://distill.pub/2021/gnn-intro|journal=Distill|language=en|volume=6|issue=9|pages=e33|doi=10.23915/distill.00033|issn=2476-0757|doi-access=free}}</ref><ref>{{Cite journal|last=Daigavane|first=Ameya|last2=Ravindran|first2=Balaraman|last3=Aggarwal|first3=Gaurav|date=2021-09-02|title=Understanding Convolutions on Graphs|url=https://distill.pub/2021/understanding-gnns|journal=Distill|language=en|volume=6|issue=9|pages=e32|doi=10.23915/distill.00032|issn=2476-0757}}</ref> They were popularized by their use in [[supervised learning]] on properties of various molecules.<ref>{{Cite journal|last1=Gilmer|first1=Justin|last2=Schoenholz|first2=Samuel S.|last3=Riley|first3=Patrick F.|last4=Vinyals|first4=Oriol|last5=Dahl|first5=George E.|date=2017-07-17|title=Neural Message Passing for Quantum Chemistry|url=http://proceedings.mlr.press/v70/gilmer17a.html|journal=International Conference on Machine Learning|language=en|publisher=PMLR|pages=1263–1272|arxiv=1704.01212}}</ref>
 
A '''Graph Neural Network (GNN)''' is a class of [[artificial neural network]]s for processing data that can be represented as [[Graph (abstract data type)|graphs]]<ref name="scarselli2008 /><ref name="sanchez2021" /><ref name="daigavane2021" />.
Since their inception, variants of the message passing neural network (MPNN) framework have been proposed.<ref>{{Cite journal|last1=Kipf|first1=Thomas N|last2=Welling|first2=Max|date=2016|title=Semi-supervised classification with graph convolutional networks|url=https://ieeexplore.ieee.org/document/4700287|journal=International Conference on Learning Representations|volume=5|issue=1|pages=61–80|doi=10.1109/TNN.2008.2005605|pmid=19068426|arxiv=1609.02907|s2cid=206756462}}</ref><ref>{{Cite journal|last1=Defferrard|first1=Michaël|last2=Bresson|first2=Xavier|last3=Vandergheynst|first3=Pierre|date=2017-02-05|title=Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering|arxiv=1606.09375|journal=Neural Information Processing Systems|volume=30}}</ref><ref>{{Cite journal|last1=Hamilton|first1=William|last2=Ying|first2=Rex|last3=Leskovec|first3=Jure|date=2017|title=Inductive Representation Learning on Large Graphs|url=https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf|journal=Neural Information Processing Systems|volume=31|arxiv=1706.02216|via=Stanford}}</ref><ref>{{Cite journal|last1=Veličković|first1=Petar|last2=Cucurull|first2=Guillem|last3=Casanova|first3=Arantxa|last4=Romero|first4=Adriana|last5=Liò|first5=Pietro|last6=Bengio|first6=Yoshua|date=2018-02-04|title=Graph Attention Networks|arxiv=1710.10903|journal=International Conference on Learning Representations|volume=6}}</ref> These models optimize GNNs for use on larger graphs and apply them to domains such as [[social network]]s, [[Citation graph|citation networks]], and online communities.<ref>{{Cite web|title=Stanford Large Network Dataset Collection|url=https://snap.stanford.edu/data/|access-date=2021-07-05|website=snap.stanford.edu}}</ref> GNNs have also been relatively successful in various [[NP-hard]] combinatorial problems, [[automated planning]] and [[path-planning]] areas due to the inherent graph structure of data. <ref>{{cite journal |last1=Li |first1=Zhuwen |last2=Chen |first2=Qifeng |last3=Koltun |first3=Vladlen |title=Combinatorial optimization with graph convolutional networks and guided tree search |journal=Neural Information Processing Systems |date=2018 |volume=31 |page=537--546 |url=https://arxiv.org/abs/1810.10659}}</ref> <ref>{{cite journal |last1=Ma |first1=Tengfei |last2=Ferber |first2=Patrick |last3=Huo |first3=Siyu |last4=Chen |first4=Jie |last5=Katz |first5=Michael |title=Online planner selection with graph neural networks and adaptive scheduling |journal=Proceedings of the AAAI Conference on Artificial Intelligence |date=2020 |volume=34 |page=5077--5084 |url=https://ojs.aaai.org/index.php/AAAI/article/download/5949/5805}}</ref> <ref>{{cite journal |last1=Osanlou |first1=Kevin |last2=Bursuc |first2=Andrei |last3=Guettier |first3=Christophe |last4=Cazenave |first4=Tristan |last5=Jacopin |first5=Eric |title=Optimal Solving of Constrained Path-Planning Problems with Graph Convolutional Networks and Optimized Tree Search |journal=2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) |date=2019 |page=3519--3525 |doi=10.1109/IROS40897.2019.8968113 |url=https://arxiv.org/abs/2108.01036}}</ref>
 
[[File:GNN building blocks.png|thumb|Basic building blocks of a Graph Neural Network (GNN). <math>(1)</math> Permutation equivariant layer. <math>(2)</math> Local pooling layer. <math>(3)</math> Global pooling (or readout) layer. Colors indicate [[Feature (machine learning)|features]].]]
GNNs are a weak form of the [[Boris Weisfeiler|Weisfeiler]]–Lehman graph isomorphism test,<ref>{{cite arXiv|last=Douglas|first=B. L.|date=2011-01-27|title=The Weisfeiler–Lehman Method and Graph Isomorphism Testing|class=math.CO|eprint=1101.5211}}</ref> so a GNN model can be at least as powerful as this test if it meets certain conditions.<ref>{{Cite journal|last1=Xu|first1=Keyulu|last2=Hu|first2=Weihua|last3=Leskovec|first3=Jure|last4=Jegelka|first4=Stefanie|date=2019-02-22|title=How Powerful are Graph Neural Networks?|arxiv=1810.00826|journal=International Conference on Learning Representations|volume=7}}</ref> Researchers are attempting to unite GNNs with other "geometric deep learning models"<ref>{{Cite journal|last1=Bronstein|first1=Michael M.|last2=Bruna|first2=Joan|last3=LeCun|first3=Yann|last4=Szlam|first4=Arthur|last5=Vandergheynst|first5=Pierre|date=2017|title=Geometric Deep Learning: Going beyond Euclidean data|url=https://ieeexplore.ieee.org/document/7974879|journal=IEEE Signal Processing Magazine|volume=34|issue=4|pages=18–42|doi=10.1109/MSP.2017.2693418|issn=1053-5888|arxiv=1611.08097|bibcode=2017ISPM...34...18B|s2cid=15195762}}</ref> to better understand how and why these models work.
 
In the more general subject of "Geometric [[Deep Learning]]", existing neural network architectures can be interpreted as GNNs operating on suitably defined graphs<ref name=bronstein2021 />. [[Convolutional Neural Network]]s, in the context of [[computer vision]], can be seen as a GNN applied to graphs structured as grids of [[pixel]]s. [[Transformer (machine learning model) | Transformers]], in the context of [[natural language processing]], can be seen as GNNs applied to [[complete graph]]s whose nodes are [[words]] in a [[Sentence (linguistics) | sentence]].
In the case of the absence of a known graph structure for example a k-[[nearest neighbor graph]] can be heuristically induced.
 
The key design element of GNNs is the use of ''pairwise message passing'', such that graph nodes iteratively update their representations by exchanging information with their neighbors. Since their inception, several different GNN architectures have been proposed<ref name="scarselli2008" /><ref name="kipf2016" /><ref name="hamilton2017" /><ref name="velickovic2018" />, which implement different flavors of message passing<ref name=bronstein2021 />. {{As of|2022}}, whether is it possible to define GNN architectures "going beyond" message passing, or if every GNN can be built on message passing over suitably defined graphs, is an open research question<ref name=velickovic2022 />.
GNNs can be understood as a generalization of [[convolutional neural network]]s (which are used on 2-dimensional black and white image data and 3-dimensional color image data) to graph-structured data.
 
Relevant application domains for GNNs include [[social network]]s<ref name="ying2018" />,
[[Citation graph|citation networks]]<ref name="stanforddata" />,
[[molecular biology]]<ref name="gilmer2017" />,
[[physics]]<ref name=qasim2019 /> and
[[NP-hard]] [[combinatorial optimization]] problems<ref name="li2018" />.
 
Several [[open source]] [[Library (computing)|libraries]] implementing Graph Neural Networks are available, such as PyTorch Geometric <ref name=fey2019 /> ([[PyTorch]]), Tensorflow GNN<ref name=tfgnn2022 /> ([[Tensorflow]]), and jGraph<ref name=jraph2022/> ([[Google JAX]]).
 
== Architecture ==
 
The architecture of a generic GNN implements the following fundamental [[Layer (deep learning)|layers]]<ref name=bronstein2021 />:
* <em>Permutation Equivariant</em>: a permutation equivariant layer [[Map (mathematics)|maps]] a representation of a graph into an updated representation of the same graph. In literature, permutation equivariant layers are implemented via pairwise message passing between graph nodes<ref name=bronstein2021 /><ref name=velickovic2022 />. Intuitively, in a message passing layer, nodes ''update'' their representations by ''aggregating'' the ''messages'' received from their immediate neighbours. As such, each message passing layer increases the receptive field of the GNN by one hop.
 
* <em>Local Pooling</em>: a local pooling layer coarsens the graph via downsampling. Local pooling is used to increase the receptive field of a GNN, in a similar fashion to pooling layers in [[Convolutional Neural Network]]s. Examples include [[Nearest neighbor graph|k-nearest neighbours pooling]], top-k pooling<ref name=gao2019 />, and self-attention pooling<ref name=lee2019 />.
 
* <em>Global Pooling</em>: a global pooling layer, also known as ''readout'' layer, provides fixed-size representation of the whole graph. The global pooling layer must be permutation invariant, such that permutations in the ordering of graph nodes and edges do not alter the final output<ref name=lui2022 />. Examples include element-wise sum, mean or maximum.
 
It has been demonstrated that GNNs cannot be more expressive than the Weisfeiler–Lehman Graph Isomorphism Test<ref name="douglas2011" /><ref name="xu2019" />. In practice, this means that there exist different graph structures (e.g., [[molecules]] with the same [[atoms]] but different [[bonds]]) that cannot be distinguished by GNNs. More powerful GNNs operating on higher-dimension geometries such as [[Simplicial complex|simplicial complexes]] can be designed<ref name=bronstein2021-2 />. {{As of|2022}}, whether or not future architectures will overcome the message passing primitive is an open research question<ref name=velickovic2022 />.
 
[[File:GNN representational limits.png|thumb|[[Graph isomorphism|Non-isomorphic]] graphs that cannot be distinguished by a GNN due to the limitations of the Weisfeiler-Lehman Graph Isomorphism Test. Colors indicate node [[Feature (machine learning)|features]].]]
 
== Message Passing Layers ==
[[File:Message Passing Neural Network.png|thumb|Node representation update in a Message Passing Neural Network (MPNN) layer. Node <math>\mathbf{x}_0</math> receives messages sent by all of his immediate neighbours <math>\mathbf{x}_1</math> to <math>\mathbf{x}_4</math>. Messages are computing via the message function <math>\phi</math>, which accounts for the features of both senders and receiver.]]
 
Message Passing layers are permutation-equivariant layers mapping a graph into an updated representation of the same graph. Formally, they can be expressed as Message Passing Neural Networks (MPNNs)<ref name="bronstein2021" />.
 
Let <math>G = (V,E)</math> be a [[Graph (discrete mathematics)|graph]], where <math>V</math> is the node set and <math>E</math> is the edge set. Let <math>N_u</math> [[Neighbourhood (graph theory)|neighbourhood]] of node <math>u \in V</math>. Additionally, let <math>\mathbf{x}_u</math> be the [[Feature (machine learning)|features]] of node <math>u \in V</math>, and <math>\mathbf{e}_{u, v}</math> be the features of edge <math>(u, v) \in E</math>. An MPNN [[Layer_(deep learning)|layer]] can be expressed as follows<ref name="bronstein2021" />:
 
:<math>\mathbf{h}_u = \phi \left( \mathbf{x_u}, \bigoplus_{v \in N_u} \psi(\mathbf{x}_u, \mathbf{x}_v, \mathbf{e}_{uv}) \right)</math>
 
Where <math>\phi</math> and <math>\psi</math> are [[differentiable functions]] (e.g., [[artificial neural network]]s), and <math>\bigoplus</math> is a [[permutation]] [[Invariant (mathematics)|invariant]] aggregation operator that can accept an arbitrary number of inputs (e.g, element-wise sum, mean, or max). In particular, <math>\phi</math> and <math>\psi</math> are referred to as ''update'' and ''message'' functions, respectively. Intuitively, in an MPNN computational block, graph nodes ''update'' their representations by ''aggregating'' the ''messages'' received from their neighbours.
 
The outputs of one or more MPNN layers are node representations <math>\mathbf{h}_u</math> for each node <math>u \in V</math> in the graph. Node representations can be employed for any downstream task, such as node/graph [[Statistical classification|classification]] or edge prediction.
 
Graph nodes in an MPNN update their representation aggregating information from their immediate neighbours. As such, stacking <math>n</math> MPNN layers means that one node will be able to communicate with nodes that are at most <math>n</math> "hops" away. In principle, to ensure that every node receives information from every other node, one would need to stack a number of MPNN layers equal to the graph [[Distance (graph theory) | diameter]]. However, stacking many MPNN layers may cause issues such as oversmoothing<ref name=chen2021 /> and oversquashing<ref name=alon2021 />. Oversmoothing refers to the issue of node representations becoming indistinguishable. Oversquashing refers to the bottleneck that is created by squeezing long-range dependencies into fixed-size representations.
 
Other "flavours" of MPNN have been developed in the literature<ref name=bronstein2021 />, such as Graph Convolutional Networks<ref name=kipf2016 /> and Graph Attention Networks<ref name=velickovic2018 />, whose definitions can be expressed in terms of the MPNN formalism.
 
=== Graph Convolutional Network ===
The Graph Convolutional Network (GCN) was first introduced by [[Thomas Kipf]] and [[Max Welling]] in 2017<ref name=kipf2016 />.
 
A GCN layer defines a [[Order of approximation|first-order approximation]] of a localized spectral [[Filter (signal processing)|filter]] on graphs. GCNs can be understood as a generalization of [[Convolutional Neural Network]]s to graph-structured data.
 
The formal expression of a GCN layer reads as follows:
 
:<math>\mathbf{H} = \sigma\left(\tilde{\mathbf{D}}^{-\frac{1}{2}} \tilde{\mathbf{A}} \tilde{\mathbf{D}}^{-\frac{1}{2}} \mathbf{X} \mathbf{\Theta}\right)</math>
 
Where <math>\mathbf{H}</math> is the matrix of node representations <math>\mathbf{h}_u</math>, <math>\mathbf{X}</math> is the matrix of node features <math>\mathbf{x}_u</math>, <math>\sigma(\cdot)</math> is an [[activation function]] (e.g., [[ReLU]]), <math>\tilde{\mathbf{A}}</math> is the graph [[adjacency matrix]] with the addition of self-loops,<math>\tilde{\mathbf{D}}</math> is the normalized [[degree matrix]], and <math>\mathbf{\Theta}</math> is a matrix of trainable parameters.
 
In particular, let <math>\mathbf{A}</math> be the graph adjacency matrix: then, one can define <math>\tilde{\mathbf{A}} = \mathbf{A} + \mathbf{I}</math> and <math>\tilde{\mathbf{D}_{ii}} = \sum_{j \in V} \tilde{A}_{ij}</math>, where <math>\mathbf{I}</math> denotes the [[identity matrix]]. This normalization ensures that the [[eigenvalue]]s of <math>\tilde{\mathbf{D}}^{-\frac{1}{2}} \tilde{\mathbf{A}} \tilde{\mathbf{D}}^{-\frac{1}{2}}</math> are bounded in the range <math>[0, 1]</math>, avoiding [[Numerical stability|numerical instabilities]] and [[Vanishing gradient problem|exploding/vanishing gradients]].
 
A limitation of GCNs is that they do not allow multidimensional edge features <math>\mathbf{e}_{uv}</math><ref name=kipf2016 />. It is however possible to associate scalar weights <math>w_{uv}</math> to each edge by imposing <math>A_{uv} = w_{uv}</math>, i.e., by setting each nonzero entry in the adjacency matrix equal to the weight of the corresponding edge.
 
=== Graph Attention Network ===
The Graph Attention Network (GAT) was introduced by [[Petar Veličković]] et al. in 2018<ref name=velickovic2018 />.
 
A multi-head GAT layer can be expressed as follows:
 
:<math> \mathbf{h}_u = \overset{K}{\underset{k=1}{\Big\Vert}} \sigma \left(\sum_{v \in N_u} \alpha_{ij} \mathbf{W}^k \mathbf{x}_v\right) </math>
 
Where <math>K</math> is the number of [[Attention (machine learning)|attention]] heads, <math>\Big\Vert</math> denotes [[Concatenation| vector concatenation]], <math>\sigma(\cdot)</math> is an [[activation function]] (e.g., [[ReLU]]), <math>\alpha_{ij}</math> are attention coefficients, and <math>W^k</math> is a matrix of trainable parameters for the <math>k</math>-th attention head.
 
For the final GAT layer, the outputs from each attention head are averaged before the application of the activation function. Formally, the final GAT layer can be written as:
 
:<math> \mathbf{h}_u = \sigma \left(\frac{1}{K}\sum_{k=1}^K \sum_{v \in N_u} \alpha_{uv} \mathbf{W}^k \mathbf{x}_v\right) </math>
 
[[Attention (machine learning)|Attention]] in Machine Learning is a technique that mimics [[Attention|cognitive attention]]. In the context of learning on graphs, the attention coefficient <math>\alpha_{uv}</math> measures ''how important'' is node <math>u \in V</math> to node <math>v \in V</math>.
 
Normalized attention coefficients are computed as follows:
 
:<math>\alpha_{uv} = \frac{\exp(\text{LeakyReLU}\left(\mathbf{a}^T [\mathbf{W} \mathbf{h}_u \Vert \mathbf{W} \mathbf{h}_v \Vert \mathbf{e}_{uv}]\right))}{\sum_{z \in N_u}\exp(\text{LeakyReLU}\left(\mathbf{a}^T [\mathbf{W} \mathbf{h}_u \Vert \mathbf{W} \mathbf{h}_z \Vert \mathbf{e}_{uz}]\right))}</math>
 
Where <math>a</math> is a vector of learnable weights, <math>\cdot^T</math> indicates [[Transpose | transposition]], and <math>\text{LeakyReLU}</math> is a [[Rectifier (neural networks)|modified ReLU]] activation function.
 
A GCN can be seen as a special case of a GAT where attention coefficients are not learnable, but fixed and equal to the edge weights <math>w_{uv}</math>.
 
== Local Pooling Layers ==
Local pooling layers coarsen the graph via downsampling. In literature, several learnable local pooling strategies have been proposed<ref name=lui2022 />.
 
=== Top-k Pooling ===
The Top-k Pooling layer <ref name=gao2019 /> can be formalised as follows:
:<math>\mathbf{y} = \frac{\mathbf{X}\mathbf{p}}{\Vert\mathbf{p}\Vert}</math>
 
:<math>\mathbf{i} = \text{top}_k(\mathbf{y})</math>
 
:<math>\mathbf{X}' = (\mathbf{X} \odot \text{sigmoid}(\mathbf{y}))_{\mathbf{i}}</math>
 
:<math>\mathbf{A}' = \mathbf{A}_{\mathbf{i}, \mathbf{I}}</math>
 
Where <math>\mathbf{X}</math> is the matrix of node features, <math>\mathbf{A}</math> is the graph adjacency matrix, <math>\odot</math> denotes element-wise [[matrix multiplication]], and <math>\mathbf{p}</math> is a learnable [[Projection (mathematics)|projection]] vector. The projection vector <math>\mathbf{p}</math> computes a scalar projection value for each graph node. The nodes with the top-k highest projection scores are retained in the new adjacency matrix <math>\mathbf{A}'</math>. The <math>\text{sigmoid}(\cdot)</math> operation makes the projection vector <math>\mathbf{p}</math> trainable by [[backpropagation]], which otherwise would produce discrete outputs<ref name=gao2019 />.
 
=== Self-Attention Pooling ===
The Self-Attention Pooling layer <ref name=lee2019 /> can be formalised as follows:
 
:<math>\mathbf{y} = \text{GNN}(\mathbf{X}, \mathbf{A})</math>
 
:<math>\mathbf{i} = \text{top}_k(\mathbf{y})</math>
 
:<math>\mathbf{X}' = (\mathbf{X} \odot \mathbf{y})_{\mathbf{i}}</math>
 
:<math>\mathbf{A}' = \mathbf{A}_{\mathbf{i}, \mathbf{i}}</math>
 
Where <math>\mathbf{X}</math> is the matrix of node features, <math>\mathbf{A}</math> is the graph adjacency matrix, <math>\odot</math> denotes element-wise [[matrix multiplication]], and <math>\text{GNN}</math> is a generic permutation equivariant GNN layer (e.g., GCN, GAT, MPNN).
 
The Self-Attention Pooling layer can be seen as an extension of the Top-k Pooling layer. Differently from Top-k Pooling, the self-attention scores computed in Self-Attention Pooling account both for the graph features and the graph topology.
 
== Applications ==
 
=== Protein Folding ===
{{See Also|AlphaFold}}
 
Graph Neural Networks are one of the main building blocks of [[AlphaFold]], an artificial intelligence program developed by [[Google]'s [[DeepMind]] for solving the [[protein folding]] problem in [[biology]]. AlphaFold achieved first place in several [[CASP]] competitions<ref name=guardian2018 /><ref name=mit2020 />.
 
=== Social Networks ===
{{See Also|Recommender system}}
[[Social networks]] are a major application ___domain for GNNs due to their natural representation as [[social graph]]s. GNNs are used to develop recommender systems based on both [[social relations]] and item relations<ref name=fan2019 /><ref name=ying2018/>.
 
=== Combinatorial Optimization ===
{{See Also|Combinatorial Optimization}}
GNNs are used as fundamental building blocks for several combinatorial optimization algorithm <ref name=cappart2021 />. Examples include computing superior or competitive [[Placement (electronic design automation)|chip placements]] with handcrafted human solutions<ref name=mirhoseini2021 />, and improving expert-designed bracing rules in [[Branch and Bound]]<ref name=gasse2019 />.
 
==References==
{{Reflist|refs=
<references />
<ref name="scarselli2008">{{Cite journal|last1=Scarselli|first1=Franco|last2=Gori|first2=Marco|last3=Tsoi|first3=Ah Chung|last4=Hagenbuchner|first4=Markus|last5=Monfardini|first5=Gabriele|date=2009|title=The Graph Neural Network Model|url=https://ieeexplore.ieee.org/document/4700287|journal=IEEE Transactions on Neural Networks|volume=20|issue=1|pages=61–80|doi=10.1109/TNN.2008.2005605|pmid=19068426|s2cid=206756462|issn=1941-0093}}</ref>
<ref name="sanchez2021">{{Cite journal|last1=Sanchez-Lengeling|first1=Benjamin|last2=Reif|first2=Emily|last3=Pearce|first3=Adam|last4=Wiltschko|first4=Alex|date=2021-09-02|title=A Gentle Introduction to Graph Neural Networks|url=https://distill.pub/2021/gnn-intro|journal=Distill|language=en|volume=6|issue=9|pages=e33|doi=10.23915/distill.00033|issn=2476-0757|doi-access=free}}</ref>
<ref name="daigavane2021>{{Cite journal|last=Daigavane|first=Ameya|last2=Ravindran|first2=Balaraman|last3=Aggarwal|first3=Gaurav|date=2021-09-02|title=Understanding Convolutions on Graphs|url=https://distill.pub/2021/understanding-gnns|journal=Distill|language=en|volume=6|issue=9|pages=e32|doi=10.23915/distill.00032|issn=2476-0757}}</ref>
<ref name="gilmer2017">{{Cite journal|last1=Gilmer|first1=Justin|last2=Schoenholz|first2=Samuel S.|last3=Riley|first3=Patrick F.|last4=Vinyals|first4=Oriol|last5=Dahl|first5=George E.|date=2017-07-17|title=Neural Message Passing for Quantum Chemistry|url=http://proceedings.mlr.press/v70/gilmer17a.html|journal=International Conference on Machine Learning|language=en|publisher=PMLR|pages=1263–1272|arxiv=1704.01212}}</ref>
<ref name="kipf2016">{{Cite journal|last1=Kipf|first1=Thomas N|last2=Welling|first2=Max|date=2016|title=Semi-supervised classification with graph convolutional networks|url=https://ieeexplore.ieee.org/document/4700287|journal=International Conference on Learning Representations|volume=5|issue=1|pages=61–80|doi=10.1109/TNN.2008.2005605|pmid=19068426|arxiv=1609.02907|s2cid=206756462}}</ref>
<ref name="hamilton2017">{{Cite journal|last1=Hamilton|first1=William|last2=Ying|first2=Rex|last3=Leskovec|first3=Jure|date=2017|title=Inductive Representation Learning on Large Graphs|url=https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf|journal=Neural Information Processing Systems|volume=31|arxiv=1706.02216|via=Stanford}}</ref>
<ref name="velickovic2018">{{Cite journal|last1=Veličković|first1=Petar|last2=Cucurull|first2=Guillem|last3=Casanova|first3=Arantxa|last4=Romero|first4=Adriana|last5=Liò|first5=Pietro|last6=Bengio|first6=Yoshua|date=2018-02-04|title=Graph Attention Networks|arxiv=1710.10903|journal=International Conference on Learning Representations|volume=6}}</ref>
<ref name=stanforddata>{{Cite web|title=Stanford Large Network Dataset Collection|url=https://snap.stanford.edu/data/|access-date=2021-07-05|website=snap.stanford.edu}}</ref>
<ref name="li2018">{{cite journal |last1=Li |first1=Zhuwen |last2=Chen |first2=Qifeng |last3=Koltun |first3=Vladlen |title=Combinatorial optimization with graph convolutional networks and guided tree search |journal=Neural Information Processing Systems |date=2018 |volume=31 |page=537--546 |url=https://arxiv.org/abs/1810.10659}}</ref>
<ref name="bronstein2021">{{cite journal |last1=Bronstein |first1=Michael M. |last2=Bruna |first2=Joan |last3=Cohen |first3=Taco |last4=Veličković |first4=Petar |title=Geometric Deep Learning: Grids, Groups, Graphs Geodesics and Gauges |journal=arXiv |date=May 4, 2021 |url=https://arxiv.org/abs/2104.13478}}</ref>.
<ref name=douglas2011>{{cite arXiv|last=Douglas|first=B. L.|date=2011-01-27|title=The Weisfeiler–Lehman Method and Graph Isomorphism Testing|class=math.CO|eprint=1101.5211}}</ref>.
<ref name=xu2019>{{Cite journal|last1=Xu|first1=Keyulu|last2=Hu|first2=Weihua|last3=Leskovec|first3=Jure|last4=Jegelka|first4=Stefanie|date=2019-02-22|title=How Powerful are Graph Neural Networks?|arxiv=1810.00826|journal=International Conference on Learning Representations|volume=7}}</ref>
<ref name=velickovic2022>{{cite journal |last1=Veličković |first1=Petar |title=Message passing all the way up |journal=arXiv |doi=10.48550/ARXIV.2202.11097 |url=https://arxiv.org/abs/2202.11097}}</ref>
<ref name=qasim2019>{{cite journal |last1=Qasim |first1=Shah Rukh |last2=Kieseler |first2=Jan |last3=Iiyama |first3=Yutaro |last4=Pierini |first4=Maurizio Pierini |title=Learning representations of irregular particle-detector geometry with distance-weighted graph networks |journal=The European Physical Journal C |date=2019 |volume=79 |issue=7 |doi=10.1140/epjc/s10052-019-7113-9}}</ref>
<ref name=lui2022>{{cite journal |last1=Liu |first1=Chuang |last2=Zhan |first2=Yibing |last3=Li |first3=Chang |last4=Du |first4=Bo |last5=Wu |first5=Jia |last6=Hu |first6=Wenbin |last7=Liu |first7=Tongliang |last8=Tao |first8=Dacheng |title=Graph Pooling for Graph Neural Networks: Progress, Challenges, and Opportunities |journal=arXiv |date=2022 |doi=10.48550/arXiv.2204.07321 |url=https://arxiv.org/abs/2204.07321}}</ref>
<ref name=bronstein2021-2>{{cite journal |last1=Bodnar |first1=Christian |last2=Frasca |first2=Fabrizio |last3=Guang Wang |first3=Yu |last4=Otter |first4=Nina |last5=Montúfar |first5=Guido |last6=Liò |first6=Pietro |last7=Bronstein |first7=Michael |title=Weisfeiler and Lehman Go Topological: Message Passing Simplicial Networks |journal=arXiv |date=2021 |doi=10.48550/arXiv.2103.03212 |url=https://arxiv.org/abs/2103.03212}}</ref>
<ref name=gao2019>{{cite journal |last1=Gao |first1=Hongyang |last2=Ji |first2=Shuiwang Ji |title=Graph U-Nets |journal=International Conference on Machine Learning |date=2019 |doi=10.48550/arXiv.1905.05178 |url=https://arxiv.org/abs/1905.05178}}</ref>
<ref name=lee2019>{{cite journal |last1=Lee |first1=Junhyun |last2=Lee |first2=Inyeop |last3=Kang |first3=Jaewoo |title=Self-Attention Graph Pooling |journal=International Conference of Machine Learning |date=2019 |doi=10.48550/arXiv.1904.08082 |url=https://arxiv.org/abs/1904.08082}}</ref>
<ref name=alon2021>{{cite journal |last1=Alon |first1=Uri |last2=Yahav |first2=Eran |title=On the Bottleneck of Graph Neural Networks and its Practical Implications |journal=International Conference on Learning Representations |date=2021 |doi=10.48550/arXiv.2006.05205 |url=https://arxiv.org/abs/2006.05205}}</ref>
<ref name=chen2021>{{cite journal |last1=Chen |first1=Deli |last2=Lin |first2=Yankai |last3=Li |first3=Wei |last4=Li |first4=Peng |last5=Zhou |first5=Jie |last6=Sun |first6=Xu |title=Measuring and Relieving the Over-smoothing Problem for Graph Neural Networks from the Topological View |journal=AAAI Conference on Artificial Intelligence |date=2020 |doi=10.48550/arXiv.1909.03211 |url=https://arxiv.org/abs/1909.03211}}</ref>
<ref name=guardian2018>{{cite news|last1=Sample|first1=Ian|date=2 December 2018|title=Google's DeepMind predicts 3D shapes of proteins|work=The Guardian|url=https://www.theguardian.com/science/2018/dec/02/google-deepminds-ai-program-alphafold-predicts-3d-shapes-of-proteins|access-date=30 November 2020}}</ref>
<ref name=mit2020>{{cite web |title=DeepMind's protein-folding AI has solved a 50-year-old grand challenge of biology |url=https://www.technologyreview.com/2020/11/30/1012712/deepmind-protein-folding-ai-solved-biology-science-drugs-disease/ |website=MIT Technology Review |access-date=30 November 2020 |language=en}}</ref>
<ref name=fan2019>{{cite journal |last1=Fan |first1=Wenqi |last2=Ma |first2=Yao |last3=Li |first3=Qing |last4=He |first4=Yuan |last5=Zhao |first5=Eric |last6=Tang |first6=Jiliang |last7=Yin |first7=Dawei |title=Graph Neural Networks for Social Recommendation |journal=The Web Conference |date=2019 |doi=10.48550/arXiv.1902.07243 |url=https://arxiv.org/abs/1902.07243}}</ref>
<ref name=ying2018>{{cite journal |last1=Ying |first1=Rex |last2=He |first2=Ruining |last3=Chen |first3=Kaifeng |last4=Eksombatchai |first4=Pong |last5=Hamilton |first5=William L. |last6=Leskovec |first6=Jure |title=Graph Convolutional Neural Networks for Web-Scale Recommender Systems |journal=ACM SIGKDD Conference on Knowledge Discovery and Data Mining |date=2018 |doi=10.48550/arXiv.1806.01973 |url=https://arxiv.org/abs/1806.01973}}</ref>
<ref name=gasse2019>{{cite journal |last1=Gasse |first1=Maxime |last2=Chételat |first2=Didier |last3=Ferroni |first3=Nicola |last4=Charlin |first4=Laurent |last5=Lodi |first5=Andrea |title=Exact Combinatorial Optimization with Graph Convolutional Neural Networks |journal=Conference and Workshop on Neural Information Processing Systems |date=2019 |doi=10.48550/arXiv.1906.01629 |url=https://arxiv.org/abs/1906.01629}}</ref>
<ref name=cappart2021>{{cite journal |last1=Cappart |first1=Quentin |last2=Chételat |first2=Didier |last3=Khalil |first3=Elias |last4=Lodi |first4=Andrea |last5=Morris |first5=Christopher |last6=Veličković |first6=Petar |title=Combinatorial optimization and reasoning with graph neural networks |journal=arXiv |doi=10.48550/arXiv.2102.09544 |url=https://arxiv.org/abs/2102.09544}}</ref>
<ref name=mirhoseini2021>{{cite journal |last1=Mirhoseini |first1=Azalia |last2=Goldie |first2=Anna |last3=Yazgan |first3=Mustafa |last4=Jiang |first4=Joe Wenjie |last5=Songhori |first5=Ebrahim |last6=Wang |first6=Shen |last7=Lee |first7=Young-Joon |last8=Johnson |first8=Eric |last9=Pathak |first9=Omkar |last10=Nazi |first10=Azade |last11=Pak |first11=Jiwoo |last12=Tong |first12=Andy |last13=Srinivasa |first13=Kavya |last14=Hang |first14=William |last15=Tuncer |first15=Emre |last16=Le |first16=Quoc V. |last17=Laudon |first17=James |last18=Ho |first18=Richard |last19=Carpenter |first19=Roger |last20=Dean |first20=Jeff |title=A graph placement methodology for fast chip design |journal=Nature |date=2021 |volume=594 |pages=207-212 |doi=10.1038/s41586-021-03544-w}}</ref>
<ref name=fey2019>{{cite journal |last1=Matthias |first1=Fey |last2=Lenssen |first2=Jan E. |title=Fast Graph Representation Learning with PyTorch Geometric |journal=ICLR Workshop on Representation Learning on Graphs and Manifolds |date=2019 |doi=10.48550/arXiv.1903.02428 |url=https://arxiv.org/abs/1903.02428}}</ref>
<ref name=tfgnn2022>{{cite web |title=Tensorflow GNN |url=https://github.com/tensorflow/gnn |access-date=30 June 2022}}</ref>
<ref name=jraph2022>{{cite web |title=jraph |url=https://github.com/deepmind/jraph |access-date=30 June 2022}}</ref>
}}
 
[[Category:Machine learning]]
[[Category:Semisupervised learning]]
[[Category:Supervised learning]]
[[Category:Unsupervised learning]]
[[Category:Artificial neural networks]]
[[Category:Graph algorithms]]