Neural operators: Difference between revisions

Content deleted Content added
WikiCleanerBot (talk | contribs)
m v2.05b - Bot T20 CW#61 - Fix errors for CW project (Reference before punctuation)
m formatting, added orphan tag
Line 1:
{{Short description|Machine learning framework}}
{{Orphan|date=January 2024}}
 
'''Neural operators''' are a class of [[deep learning]] architectures designed to learn maps between infinite-dimensional [[Functionfunction space|function spaces]]s.<ref name="patel1">{{cite arXiv |last1=Patel |first1=Ravi G. |last2=Desjardins |first2=Olivier |title=Nonlinear integro-differential operator regression with neural networks |date=2018 |class=cs.LG |eprint=1810.08552}}</ref> Neural operators represent an extension of traditional [[Artificialartificial neural network|artificial neural networks]]s, marking a departure from the typical focus on learning mappings between finite-dimensional Euclidean spaces or finite sets. Neural operators directly learn [[Operator (mathematics)|operators]] between function spaces; they can receive input functions, and the output function can be evaluated at any discretization.<ref name="NO journal">{{cite journal |last1=Kovachki |first1=Nikola |last2=Li |first2=Zongyi |last3=Liu |first3=Burigede |last4=Azizzadenesheli |first4=Kamyar |last5=Bhattacharya |first5=Kaushik |last6=Stuart |first6=Andrew |last7=Anandkumar |first7=Anima |title=Neural operator: Learning maps between function spaces |journal=Journal of Machine Learning Research |date=2021 |volume=24 |pagepages=1-971–97 |arxiv=2108.08481 |url=https://www.jmlr.org/papers/volume24/21-1524/21-1524.pdf}}</ref>
 
The primary application of neural operators is in learning surrogate maps for the solution operators of [[Partialpartial differential equation|partial differential equations]]s (PDEs),<ref name="NO journal" /> which are critical tools in modeling the natural environment.<ref name="Evans"> {{cite book |author-link=Lawrence C. Evans |first=L. C. |last=Evans |title=Partial Differential Equations |publisher=American Mathematical Society |___location=Providence |year=1998 |isbn=0-8218-0772-2 }}</ref> <ref> X, S. (2023, September 6). How ai models are transforming weather forecasting: A showcase of data-driven systems. Phys.org. https://phys.org/news/2023-09-ai-weather-showcase-data-driven.html </ref> Standard PDE solvers can be time-consuming and computationally intensive, especially for complex systems. Neural operators have demonstrated improved performance in solving PDEs <ref>Kadri Umay, Y. O. (2023, September 20). Microsoft and&nbsp;accenture&nbsp;partner to tackle methane emissions with AI technology. Microsoft Azure Blog. https://azure.microsoft.com/en-us/blog/microsoft-and-accenture-partner-to-tackle-methane-emissions-with-ai-technology/ </ref> compared to existing machine learning methodologies while being significantly faster than numerical solvers.<ref name="patel2">{{cite journal |last1=Patel |first1=Ravi G. |last2=Trask |first2=Nathaniel A. |last3=Wood |first3=Mitchell A. |last4=Cyr |first4=Eric C. |title=A physics-informed operator regression framework for extracting data-driven continuum models |journal=Computer Methods in Applied Mechanics and Engineering |date=January 2021 |volume=373 |pages=113500 |doi=10.1016/j.cma.2020.113500|arxiv=2009.11992 }}</ref><ref name="FNO">{{cite arXiv |last1=Li |first1=Zongyi |last2=Kovachki |first2=Nikola |last3=Azizzadenesheli |first3=Kamyar |last4=Liu |first4=Burigede |last5=Bhattacharya |first5=Kaushik |last6=Stuart |first6=Andrew |last7=Anima |first7=Anandkumar |title=Fourier neural operator for parametric partial differential equations |date=2020 |class=cs.LG |eprint=2010.08895 }}</ref><ref>Hao, K. (2021, October 20). Ai has cracked a key mathematical puzzle for understanding our world. MIT Technology Review. https://www.technologyreview.com/2020/10/30/1011435/ai-fourier-neural-network-cracks-navier-stokes-and-partial-differential-equations/ </ref><ref> Ananthaswamy, A., &amp; Quanta Magazine moderates comments to&nbsp;facilitate an informed, substantive. (2021, September 10). Latest neural nets solve world’s hardest equations faster than ever before. Quanta Magazine. https://www.quantamagazine.org/latest-neural-nets-solve-worlds-hardest-equations-faster-than-ever-before-20210419/ </ref> Neural operators have also been applied to various scientific and engineering disciplines such as turbulent flow modeling, computational mechanics, graph-structured data,<ref>Sharma, A., Singh, S. & Ratna, S. Graph Neural Network Operators: a Review. Multimed Tools Appl (2023). https://doi.org/10.1007/s11042-023-16440-4
</ref> and the geosciences.<ref> Gege Wen, Zongyi Li, Kamyar Azizzadenesheli, Anima Anandkumar, Sally M. Benson,
U-FNO—An enhanced Fourier neural operator-based deep-learning model for multiphase flow,
Advances in Water Resources,
Line 13 ⟶ 14:
https://doi.org/10.1016/j.advwatres.2022.104180.
(https://www.sciencedirect.com/science/article/pii/S0309170822000562)
</ref> In particular, they have been applied to learning stress-strain fields in materials, classifying complex data like spatial transcriptomics, predicting multiphase flow in porous media,<ref> Choubineh A, Chen J, Wood DA, Coenen F, Ma F. Fourier Neural Operator for Fluid Flow in Small-Shape 2D Simulated Porous Media Dataset. Algorithms. 2023; 16(1):24. https://doi.org/10.3390/a16010024
</ref> and climate modeling through long-term weather forecasting<ref> Yang, Q., Hernandez-Garcia, A., Harder, P., Ramesh, V., Sattegeri, P., Szwarcman, D., ... & Rolnick, D. (2023). Fourier Neural Operators for Arbitrary Resolution Climate Data Downscaling. arXiv preprint arXiv:2305.14452.</ref> and carbon dioxide migration simulations. Finally, the operator learning paradigm allows learning maps between function spaces, and is different from parallel ideas of learning maps from finite-dimensional spaces to function spaces,<ref name="meshfreeflownet">{{cite journal | vauthors=((Esmaeilzadeh, S., Azizzadenesheli, K., Kashinath, K., Mustafa, M., Tchelepi, H. A., Marcus, P., Prabhat, M., Anandkumar, A., others)) | title=Meshfreeflownet: A physics-constrained deep continuous space-time super-resolution framework | pages=1–15 | publisher=IEEE | date=19 October 2020| arxiv=2005.01463 }}</ref><ref name="deeponet">{{cite journal | vauthors=((Lu, L., Jin, P., Pang, G., Zhang, Z., Karniadakis, G. E.)) | title=Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators | volume=3 | issue=3 | pages=218–229 | publisher=Nature Publishing Group UK London | date=19 October 2021}}</ref> and subsumes these settings when limited to fixed input resolution.
 
== Operator learning ==
Understanding and mapping relationships between function spaces has many applications in engineering and the sciences. In particular, [[Abstract differential equation|one can cast the problem]] of solving partial differential equations as identifying a map between function spaces, such as from an initial condition to a time-evolved state. In other PDEs this map takes an input coefficient function and outputs a solution function. Operator learning is a [[machine learning]] paradigm to learn solution operators mapping the input function to the output function.
 
Using traditional machine learning methods, addressing this problem would involve discretizing the infinite-dimensional input and output function spaces into finite-dimensional grids and applying standard learning models, such as neural networks. This approach reduces the operator learning to finite-dimensional function learning and has some limitations, such as generalizing to discretizations beyond the grid used in training.
Line 24 ⟶ 25:
 
== Definition and formulation ==
Architecturally, neural operators are similar to feed-forward neural networks in the sense that they are composed of alternating [[Linearlinear map|linear maps]]s and non-linearities. Since neural operators act on and output functions, neural operators have been instead formulated as a sequence of alternating linear [[integral operators]] on function spaces and point-wise non-linearities.<ref name="patel1" /><ref name="NO journal" /> Using an analogous architecture to finite-dimensional neural networks, similar [[Universaluniversal approximation theorem|universal approximation theorems]]s have been proven for neural operators. In particular, it has been shown that neural operators can approximate any continuous operator on a [[Compact space|compact]] set.<ref name="NO journal"/>
 
Neural operators seek to approximate some operator <math>\mathcal{G} : \mathcal{A} \to \mathcal{U}</math> between function spaces <math>\mathcal{A}</math> and <math>\mathcal{U}</math> by building a parametric map <math>\mathcal{G}_\phi : \mathcal{A} \to \mathcal{U}</math>. Such parametric maps <math>\mathcal{G}_\phi</math> can generally be defined in the form
Line 30 ⟶ 31:
<math>\mathcal{G}_\phi := \mathcal{Q} \circ \sigma(W_T + \mathcal{K}_T + b_T) \circ \cdots \circ \sigma(W_1 + \mathcal{K}_1 + b_1) \circ \mathcal{P},</math>
 
where <math>\mathcal{P}, \mathcal{Q}</math> are the lifting (lifting the codomain of the input function to a higher dimensional space) and projection (projecting the codomain of the intermediate function to the output codimension) operators, respectively. These operators act pointwise on functions and are typically parametrized as [[Multilayermultilayer perceptron|multilayer perceptrons]]s. <math>\sigma</math> is a pointwise nonlinearity, such as a [[Rectifier (neural networks)|rectified linear unit (ReLU)]], or a [[Rectifier (neural networks)#Other_nonOther non-linear_variantslinear variants|Gaussian error linear unit (GeLU)]]. Each layer <math>t=1, \dots, T</math> has a respective local operator <math>W_t</math> (usually parameterized by a pointwise neural network), a kernel integral operator <math>\mathcal{K}_t</math>, and a bias function <math>b_t</math>. Given some intermediate functional representation <math>v_t</math> with ___domain <math>D</math> in the <math>t</math>-th hidden layer, a kernel integral operator <math>\mathcal{K}_\phi</math> is defined as
 
<math>(\mathcal{K}_\phi v_t)(x) := \int_D \kappa_\phi(x, y, v_t(x), v_t(y))v_t(y)dy, </math>
Line 44 ⟶ 45:
<math>v_{t+1}(x) \approx \sigma\left(\sum_j^n \kappa_\phi(x, y_j, v_t(x), v_t(y_j))v_t(y_j)\Delta_{y_j} + W_t(v_t(y_j)) + b_t(x)\right).</math>
 
The above approximation, along with parametrizing <math>\kappa_\phi</math> as an implicit neural network, results in the graph neural operator (GNO).<ref name="Graph NO">{{cite arXiv |last1=Li |first1=Zongyi |last2=Kovachki |first2=Nikola |last3=Azizzadenesheli |first3=Kamyar |last4=Liu |first4=Burigede |last5=Bhattacharya |first5=Kaushik |last6=Stuart |first6=Andrew |last7=Anima |first7=Anandkumar |title=Neural operator: Graph kernel network for partial differential equations |date=2020 |class=cs.LG |eprint=2003.03485 }}</ref>
 
There have been various parameterizations of neural operators for different applications.<ref name="patel2" /><ref name="FNO" /><ref name="Graph NO" /> These typically differ in their parameterization of <math>\kappa</math>. The most popular instantiation is the Fourier neural operator (FNO). FNO takes <math>\kappa_\phi(x, y, v_t(x), v_t(y)) := \kappa_\phi(x-y)</math> and by applying the [[convolution theorem]], arrives at the following parameterization of the kernel integral operator:
Line 59 ⟶ 60:
where <math>\|\cdot \|_\mathcal{U}</math> is a norm on the output function space <math>\mathcal{U}</math>. Neural operators can be trained directly using [[backpropagation]] and [[gradient descent]]-based methods.<ref name="patel1" />
 
Another training paradigm is associated with physics-informed machine learning. In particular, [[physics-informed neural networks]] (PINNs) use complete physics laws to fit neural networks to solutions of PDEs. Extensions of this paradigm to operator learning are broadly called physics-informed neural operators (PINO),<ref name="PINO">{{cite arXiv |last1=Li |first1=Zongyi | last2=Hongkai| first2=Zheng |last3=Kovachki |first3=Nikola | last4=Jin | first4=David | last5=Chen | first5= Haoxuan |last6=Liu |first6=Burigede | last7=Azizzadenesheli |first7=Kamyar |last8=Anima |first8=Anandkumar |title=Physics-Informed Neural Operator for Learning Partial Differential Equations |date=2021 |class=cs.LG |eprint=2111.03794 }}</ref>, where loss functions can include full physics equations or partial physical laws. As opposed to standard PINNs, the PINO paradigm incorporates a data loss (as defined above) in addition to the physics loss <math>\mathcal{L}_{PDE}(a, \mathcal{G}_\theta (a))</math>. The physics loss <math>\mathcal{L}_{PDE}(a, \mathcal{G}_\theta (a))</math> quantifies how much the predicted solution of <math>\mathcal{G}_\theta (a)</math> violates the PDEs equation for the input <math>a</math>.
 
== References ==