Neural operators: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 14:13, 8 December 2023 edit Stevenliuyi (talk \| contribs) Extended confirmed users 1,753 edits +category ← Previous edit		Latest revision as of 14:56, 22 August 2025 edit undo Shayanss (talk \| contribs) 9 edits →Definition and formulation: codimension is wrong here.
(15 intermediate revisions by 12 users not shown)
Line 1: {{Short description\|Machine learning framework}} '''Neural operators''' are a class of [[deep learning]] architectures designed to learn maps between infinite-dimensional [[~~Function~~function space~~\|function spaces~~]] <ref name="patel1">{{cite arXiv \|last1=Patel \|first1=Ravi G. \|last2=Desjardins \|first2=Olivier \|title=Nonlinear integro-differential operator regression with neural networks \|date=2018 \|class=cs.LG \|eprint=1810.08552}}</ref>s. Neural operators represent an extension of traditional [[~~Artificial neural network\|~~artificial neural ~~networks~~network]]s, marking a departure from the typical focus on learning mappings between finite-dimensional Euclidean spaces or finite sets. Neural operators directly learn [[Operator (mathematics)\|operators]] between function spaces; they can receive input functions, and the output function can be evaluated at any discretization.<ref name="NO journal">{{cite journal \|last1=Kovachki \|first1=Nikola \|last2=Li \|first2=Zongyi \|last3=Liu \|first3=Burigede \|last4=Azizzadenesheli \|first4=Kamyar \|last5=Bhattacharya \|first5=Kaushik \|last6=Stuart \|first6=Andrew \|last7=Anandkumar \|first7=Anima \|title=Neural operator: Learning maps between function spaces \|journal=Journal of Machine Learning Research \|date=2021 \|volume=24 \|~~page~~pages=~~1-97~~1–97 \|arxiv=2108.08481 \|url=https://www.jmlr.org/papers/volume24/21-1524/21-1524.pdf}}</ref><ref name="NO Nature">{{cite journal \|last1=Azizzadenesheli \|first1=Kamyar \|last2=Kovachki \|first2=Nikola \|last3=Li \|first3=Zongyi \|last4=Liu-Schiaffini \|first4=Miguel \|last5=Kossaifi \|first5=Jean \|last6=Anandkumar \|first6=Anima \|title=Neural operators for accelerating scientific simulations and design \|journal=Nature Reviews Physics \|date=2024 \|volume=6 \|pages=320–328 \|arxiv=2309.15325 \|url=https://www.nature.com/articles/s42254-024-00712-5}}</ref> The primary application of neural operators is in learning surrogate maps for the solution operators of [[~~Partial differential equation\|~~partial differential ~~equations~~equation]]s (PDEs),<ref name="NO journal" /><ref name="NO Nature" /> which are critical tools in modeling the natural environment.<ref name="Evans"> {{cite book \|author-link=Lawrence C. Evans \|first=L. C. \|last=Evans \|title=Partial Differential Equations \|publisher=American Mathematical Society \|___location=Providence \|year=1998 \|isbn=0-8218-0772-2 }}</ref> <ref>{{cite X,press ~~S. (2023, September 6).~~release \|title=How aiAI models are transforming weather forecasting: A showcase of data-driven systems~~. Phys.org.~~ \|url=https://phys.org/news/2023-09-ai-weather-showcase-data-driven.html \|work=phys.org \|publisher=European Centre for Medium-Range Weather Forecasts \|date=6 September 2023 }}</ref> Standard PDE solvers can be time-consuming and computationally intensive, especially for complex systems. Neural operators have demonstrated improved performance in solving PDEs <ref>~~Kadri~~{{cite ~~Umay,~~news Y.\|last1=Russ O.\|first1=Dan ~~(2023,~~\|last2=Abinader ~~September 20).~~\|first2=Sacha \|title=Microsoft and~~ accenture ~~ Accenture partner to tackle methane emissions with AI technology~~. Microsoft Azure Blog.~~ \|url=https://azure.microsoft.com/en-us/blog/microsoft-and-accenture-partner-to-tackle-methane-emissions-with-ai-technology/ ~~</ref>~~\|work=Microsoft ~~compared~~Azure toBlog ~~existing~~\|date=23 ~~machine~~August ~~learning~~2023 ~~methodologies while being significantly faster than numerical solvers.~~}}</ref><ref ~~name="patel2"~~>{{~~cite journal~~Citation \|last1=~~Patel~~Li \|first1=~~Ravi G.~~Zijie \|~~last2~~title=~~Trask~~Transformer ~~\|first2=Nathaniel~~for A.Partial ~~\|last3=Wood~~Differential ~~\|first3=Mitchell~~Equations' A.Operator Learning \|~~last4~~date=~~Cyr~~2023-04-27 \|~~first4~~url=~~Eric C~~http://arxiv.org/abs/2205.13671 \|~~title~~access-date=~~A physics~~2025-~~informed operator regression framework for extracting data~~06-~~driven continuum models~~23 \|~~journal~~arxiv=~~Computer Methods in Applied Mechanics and Engineering~~2205.13671 \|~~date~~last2=~~January 2021~~Meidani \|~~volume~~first2=~~373~~Kazem \|~~pages~~last3=~~113500~~Farimani \|~~doi~~first3=~~10.1016/j.cma.2020.113500~~Amir Barati}}</ref> compared to existing machine learning methodologies while being significantly faster than numerical solvers.<ref name="FNO">{{cite arXiv \|last1=Li \|first1=Zongyi \|last2=Kovachki \|first2=Nikola \|last3=Azizzadenesheli \|first3=Kamyar \|last4=Liu \|first4=Burigede \|last5=Bhattacharya \|first5=Kaushik \|last6=Stuart \|first6=Andrew \|last7=Anima \|first7=Anandkumar \|title=Fourier neural operator for parametric partial differential equations \|date=2020 \|class=cs.LG \|eprint=2010.08895 }}</ref><ref>~~Hao,~~{{cite K.news ~~(2021,~~\|last1=Hao ~~October~~\|first1=Karen ~~20). Ai~~\|title=AI has cracked a key mathematical puzzle for understanding our world~~. MIT Technology Review.~~ \|url=https://www.technologyreview.com/2020/10/30/1011435/ai-fourier-neural-network-cracks-navier-stokes-and-partial-differential-equations/ \|work=MIT Technology Review \|date=30 October 2020 }}</ref><ref>{{cite news \|last1=Ananthaswamy, ~~A., & Quanta Magazine moderates comments to facilitate an informed, substantive. (2021, September 10).~~\|first1=Anil \|title=Latest ~~neural~~Neural ~~nets~~Nets ~~solve~~Solve ~~world’s~~World's ~~hardest~~Hardest ~~equations~~Equations ~~faster~~Faster ~~than~~Than ~~ever~~Ever ~~before. Quanta Magazine.~~Before \|url=https://www.quantamagazine.org/latest-neural-nets-solve-worlds-hardest-equations-faster-than-ever-before-20210419/ \|work=Quanta Magazine \|date=19 April 2021 }}</ref> Neural operators have also been applied to various scientific and engineering disciplines such as turbulent flow modeling, computational mechanics, graph-structured data,<ref>{{cite journal \|last1=Sharma, ~~A.,~~\|first1=Anuj \|last2=Singh, ~~S. &~~\|first2=Sukhdeep \|last3=Ratna, \|first3=S. \|title=Graph Neural Network Operators: a Review. ~~Multimed~~\|journal=Multimedia Tools ~~Appl~~and Applications \|date=15 August (2023) \|volume=83 \|issue=8 \|pages=23413–23436 \|doi=10.1007/s11042-023-16440-4 }}</ref> and the geosciences.<ref>{{cite journal \|last1=Wen \|first1=Gege \|last2=Li \|first2=Zongyi \|last3=Azizzadenesheli \|first3=Kamyar \|last4=Anandkumar \|first4=Anima \|last5=Benson \|first5=Sally M. \|title=U-FNO—An enhanced Fourier neural operator-based deep-learning model for multiphase flow \|journal=Advances in Water Resources \|date=May 2022 \|volume=163 \|pages=104180 \|doi=10.1016/j.advwatres.2022.104180 \|arxiv=2109.03697 \|bibcode=2022AdWR..16304180W }}</ref> In particular, they have been applied to learning stress-strain fields in materials, classifying complex data like spatial transcriptomics, predicting multiphase flow in porous media,<ref>{{cite journal \|last1=Choubineh \|first1=Abouzar \|last2=Chen \|first2=Jie \|last3=Wood \|first3=David A. \|last4=Coenen \|first4=Frans \|last5=Ma \|first5=Fei \|title=Fourier Neural Operator for Fluid Flow in Small-Shape 2D Simulated Porous Media Dataset \|journal=Algorithms \|date=2023 \|volume=16 \|issue=1 \|pages=24 \|doi=10.3390/a16010024 \|doi-access=free }}</ref> and carbon dioxide migration simulations. Finally, the operator learning paradigm allows learning maps between function spaces, and is different from parallel ideas of learning maps from finite-dimensional spaces to function spaces,<ref name="meshfreeflownet">{{cite book \|doi=10.1109/SC41405.2020.00013 \|chapter=MESHFREEFLOWNET: A Physics-Constrained Deep Continuous Space-Time Super-Resolution Framework \|title=SC20: International Conference for High Performance Computing, Networking, Storage and Analysis \|date=2020 \|last1=Jiang \|first1=Chiyu Lmaxr \|last2=Esmaeilzadeh \|first2=Soheil \|last3=Azizzadenesheli \|first3=Kamyar \|last4=Kashinath \|first4=Karthik \|last5=Mustafa \|first5=Mustafa \|last6=Tchelepi \|first6=Hamdi A. \|last7=Marcus \|first7=Philip \|last8=Prabhat \|first8=Mr \|last9=Anandkumar \|first9=Anima \|pages=1–15 \|isbn=978-1-7281-9998-6 \|url=https://~~doi~~resolver.~~org~~caltech.edu/CaltechAUTHORS:20200526-153937049 }}</ref><ref name="deeponet">{{cite journal \|last1=Lu \|first1=Lu \|last2=Jin \|first2=Pengzhan \|last3=Pang \|first3=Guofei \|last4=Zhang \|first4=Zhongqiang \|last5=Karniadakis \|first5=George Em \|title=Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators \|journal=Nature Machine Intelligence \|date=18 March 2021 \|volume=3 \|issue=3 \|pages=218–229 \|doi=10.~~1007~~1038/~~s11042~~s42256-~~023~~021-~~16440~~00302-45 \|arxiv=1910.03193 }}</ref> and subsumes these settings as special cases when limited to a fixed input resolution. ~~</ref> and the geosciences.<ref> Gege Wen, Zongyi Li, Kamyar Azizzadenesheli, Anima Anandkumar, Sally M. Benson,~~ ~~U-FNO—An enhanced Fourier neural operator-based deep-learning model for multiphase flow,~~ ~~Advances in Water Resources,~~ ~~Volume 163,~~ ~~2022,~~ ~~104180,~~ ~~ISSN 0309-1708,~~ ~~https://doi.org/10.1016/j.advwatres.2022.104180.~~ ~~(https://www.sciencedirect.com/science/article/pii/S0309170822000562)~~ </ref> In particular, they have been applied to learning stress-strain fields in materials, classifying complex data like spatial transcriptomics, predicting multiphase flow in porous media,<ref> Choubineh A, Chen J, Wood DA, Coenen F, Ma F. Fourier Neural Operator for Fluid Flow in Small-Shape 2D Simulated Porous Media Dataset. Algorithms. 2023; 16(1):24. https://doi.org/10.3390/a16010024 </ref> and climate modeling through long-term weather forecasting<ref> Yang, Q., Hernandez-Garcia, A., Harder, P., Ramesh, V., Sattegeri, P., Szwarcman, D., ... & Rolnick, D. (2023). Fourier Neural Operators for Arbitrary Resolution Climate Data Downscaling. arXiv preprint arXiv:2305.14452.</ref> and carbon dioxide migration simulations. Finally, the operator learning paradigm allows learning maps between function spaces, and is different from parallel ideas of learning maps from finite-dimensional spaces to function spaces,<ref name="meshfreeflownet">{{cite journal \| vauthors=((Esmaeilzadeh, S., Azizzadenesheli, K., Kashinath, K., Mustafa, M., Tchelepi, H. A., Marcus, P., Prabhat, M., Anandkumar, A., others)) \| title=Meshfreeflownet: A physics-constrained deep continuous space-time super-resolution framework \| pages=1–15 \| publisher=IEEE \| date=19 October 2020\| arxiv=2005.01463 }}</ref><ref name="deeponet">{{cite journal \| vauthors=((Lu, L., Jin, P., Pang, G., Zhang, Z., Karniadakis, G. E.)) \| title=Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators \| volume=3 \| issue=3 \| pages=218–229 \| publisher=Nature Publishing Group UK London \| date=19 October 2021}}</ref> and subsumes these settings when limited to fixed input resolution. == Operator learning == Understanding and mapping relationships between function spaces has many applications in engineering and the sciences. In particular, [[Abstract differential equation\|one can cast the problem]] of solving partial differential equations as identifying a map between function spaces, such as from an initial condition to a time-evolved state. In other PDEs this map takes an input coefficient function and outputs a solution function. Operator learning is a [[machine learning]] paradigm to learn solution operators mapping the input function to the output function. Using traditional machine learning methods, addressing this problem would involve discretizing the infinite-dimensional input and output function spaces into finite-dimensional grids and applying standard learning models, such as neural networks. This approach reduces the operator learning to finite-dimensional function learning and has some limitations, such as generalizing to discretizations beyond the grid used in training. Line 24 ⟶ 13: == Definition and formulation == Architecturally, neural operators are similar to feed-forward neural networks in the sense that they are composed of alternating [[~~Linear~~linear map~~\|linear maps~~]]s and non-linearities. Since neural operators act on and output functions, neural operators have been instead formulated as a sequence of alternating linear [[integral operators]] on function spaces and point-wise non-linearities.~~<ref name="patel1" />~~<ref name="NO journal" /> Using an analogous architecture to finite-dimensional neural networks, similar [[~~Universal~~universal approximation theorem~~\|universal approximation theorems~~]]s have been proven for neural operators. In particular, it has been shown that neural operators can approximate any continuous operator on a [[Compact space\|compact]] set.<ref name="NO journal"/> Neural operators seek to approximate some operator <math>\mathcal{G} : \mathcal{A} \to \mathcal{U}</math> between function spaces <math>\mathcal{A}</math> and <math>\mathcal{U}</math> by building a parametric map <math>\mathcal{G}_\phi : \mathcal{A} \to \mathcal{U}</math>. Such parametric maps <math>\mathcal{G}_\phi</math> can generally be defined in the form Line 30 ⟶ 19: <math>\mathcal{G}_\phi := \mathcal{Q} \circ \sigma(W_T + \mathcal{K}_T + b_T) \circ \cdots \circ \sigma(W_1 + \mathcal{K}_1 + b_1) \circ \mathcal{P},</math> where <math>\mathcal{P}, \mathcal{Q}</math> are the lifting (lifting the codomain of the input function to a higher dimensional space) and projection (projecting the codomain of the intermediate function to the output ~~codimension~~dimension) operators, respectively. These operators act pointwise on functions and are typically parametrized as [[~~Multilayer~~multilayer perceptron~~\|multilayer perceptrons~~]]s. <math>\sigma</math> is a pointwise nonlinearity, such as a [[Rectifier (neural networks)\|rectified linear unit (ReLU)]], or a [[Rectifier (neural networks)#~~Other_non~~Other non-~~linear_variants~~linear variants\|Gaussian error linear unit (GeLU)]]. Each layer <math>t=1, \dots, T</math> has a respective local operator <math>W_t</math> (usually parameterized by a pointwise neural network), a kernel integral operator <math>\mathcal{K}_t</math>, and a bias function <math>b_t</math>. Given some intermediate functional representation <math>v_t</math> with ___domain <math>D</math> in the <math>t</math>-th hidden layer, a kernel integral operator <math>\mathcal{K}_\phi</math> is defined as <math>(\mathcal{K}_\phi v_t)(x) := \int_D \kappa_\phi(x, y, v_t(x), v_t(y))v_t(y)dy, </math> Line 44 ⟶ 33: <math>v_{t+1}(x) \approx \sigma\left(\sum_j^n \kappa_\phi(x, y_j, v_t(x), v_t(y_j))v_t(y_j)\Delta_{y_j} + W_t(v_t(y_j)) + b_t(x)\right).</math> The above approximation, along with parametrizing <math>\kappa_\phi</math> as an implicit neural network, results in the graph neural operator (GNO).<ref name="Graph NO">{{cite arXiv \|last1=Li \|first1=Zongyi \|last2=Kovachki \|first2=Nikola \|last3=Azizzadenesheli \|first3=Kamyar \|last4=Liu \|first4=Burigede \|last5=Bhattacharya \|first5=Kaushik \|last6=Stuart \|first6=Andrew \|last7=Anima \|first7=Anandkumar \|title=Neural operator: Graph kernel network for partial differential equations \|date=2020 \|class=cs.LG \|eprint=2003.03485 }}</ref> There have been various parameterizations of neural operators for different applications.~~<ref name="patel2" />~~<ref name="FNO" /><ref name="Graph NO" /> These typically differ in their parameterization of <math>\kappa</math>. The most popular instantiation is the Fourier neural operator (FNO). FNO takes <math>\kappa_\phi(x, y, v_t(x), v_t(y)) := \kappa_\phi(x-y)</math> and by applying the [[convolution theorem]], arrives at the following parameterization of the kernel integral operator: <math>(\mathcal{K}_\phi v_t)(x) = \mathcal{F}^{-1} (R_\phi \cdot (\mathcal{F}v_t))(x), </math> Line 57 ⟶ 46: <math>\mathcal{L}_\mathcal{U}(\{(a_i, u_i)\}_{i=1}^N) := \sum_{i=1}^N \\|u_i - \mathcal{G}_\theta (a_i) \\|_\mathcal{U}^2</math>, where <math>\\|\cdot \\|_\mathcal{U}</math> is a norm on the output function space <math>\mathcal{U}</math>. Neural operators can be trained directly using [[backpropagation]] and [[gradient descent]]-based methods ~~<ref name="patel1" />~~. Another training paradigm is associated with physics-informed machine learning. In particular, [[physics-informed neural networks]] (PINNs) use complete physics laws to fit neural networks to solutions of PDEs. Extensions of this paradigm to operator learning are broadly called physics-informed neural operators (PINO),<ref name="PINO">{{cite arXiv \|last1=Li \|first1=Zongyi \| last2=Hongkai\| first2=Zheng \|last3=Kovachki \|first3=Nikola \| last4=Jin \| first4=David \| last5=Chen \| first5= Haoxuan \|last6=Liu \|first6=Burigede \| last7=Azizzadenesheli \|first7=Kamyar \|last8=Anima \|first8=Anandkumar \|title=Physics-Informed Neural Operator for Learning Partial Differential Equations \|date=2021 \|class=cs.LG \|eprint=2111.03794 }}</ref>, where loss functions can include full physics equations or partial physical laws. As opposed to standard PINNs, the PINO paradigm incorporates a data loss (as defined above) in addition to the physics loss <math>\mathcal{L}_{PDE}(a, \mathcal{G}_\theta (a))</math>. The physics loss <math>\mathcal{L}_{PDE}(a, \mathcal{G}_\theta (a))</math> quantifies how much the predicted solution of <math>\mathcal{G}_\theta (a)</math> violates the PDEs equation for the input <math>a</math>. == See also == * [[Neural network (machine learning)\|Neural network]] * [[Physics-informed neural networks]] * [[Neural field]] == References ==