Revision as of 03:58, 19 October 2023 edit Mocl125 (talk \| contribs) 270 edits Submitting using AfC-submit-wizard ← Previous edit		Revision as of 18:01, 20 October 2023 edit undo Juliusberner (talk \| contribs) 1 edit Make notation consistent, add details, and fix typos. Next edit →
Line 6: '''Neural operators''' are a class of [[Deep learning\|deep learning]] ~~architecture~~architectures designed to learn maps between infinite-dimensional [[Function space\|function spaces]]. Neural operators represent an extension of traditional [[Artificial neural network\|artificial neural networks]], marking a departure from the typical focus on learning mappings between finite-dimensional Euclidean spaces or finite sets. Neural operators directly learn [[Operator (mathematics)\|operators]] inbetween function spaces; they can receive input functions, and the output function can be evaluated at any discretization.<ref name="NO journal">{{cite journal \|last1=Kovachki \|first1=Nikola \|last2=Li \|first2=Zongyi \|last3=Liu \|first3=Burigede \|last4=Azizzadenesheli \|first4=Kamyar \|last5=Bhattacharya \|first5=Kaushik \|last6=Stuart \|first6=Andrew \|last7=Anandkumar \|first7=Anima \|title=Neural operator: Learning maps between function spaces \|journal=Journal of Machine Learning Research \|volume=24 \|page=1-97 \|url=https://www.jmlr.org/papers/volume24/21-1524/21-1524.pdf}}</ref> The primary application of neural operators is in learning surrogate maps for the solution operators of [[Partial differential equation\|partial differential equations]] (PDEs)<ref name="NO journal" />, which are critical tools in modeling the natural environment.<ref name="Evans"> {{cite journal \|author-link=Lawrence C. Evans \|first=L. C. \|last=Evans \|title=Partial Differential Equations \|publisher=American Mathematical Society \|___location=Providence \|year=1998 \|isbn=0-8218-0772-2 }}</ref> Standard PDE solvers can be time-consuming and computationally intensive, especially for complex systems. Neural operators have demonstrated improved performance in solving PDEs compared to existing machine learning methodologies while being significantly faster than numerical solvers.<ref name="FNO">{{cite journal \|last1=Li \|first1=Zongyi \|last2=Kovachki \|first2=Nikola \|last3=Azizzadenesheli \|first3=Kamyar \|last4=Liu \|first4=Burigede \|last5=Bhattacharya \|first5=Kaushik \|last6=Stuart \|first6=Andrew \|last7=Anima \|first7=Anandkumar \|title=Fourier neural operator for parametric partial differential equations \|journal=arXiv preprint arXiv:2010.08895 \|date=2020 \|url=https://arxiv.org/pdf/2010.08895.pdf}}</ref>. The operator learning paradigm allows learning maps between function spaces, and is different from parallel ideas of learning maps from finite-dimensional spaces to function spaces <ref name="meshfreeflownet">{{cite journal \| vauthors=((Esmaeilzadeh, S., Azizzadenesheli, K., Kashinath, K., Mustafa, M., Tchelepi, H. A., Marcus, P., Prabhat, M., Anandkumar, A., others)) \| title=Meshfreeflownet: A physics-constrained deep continuous space-time super-resolution framework \| pages=1--15 \| publisher=IEEE \| date=19 October 2020}}</ref><ref name="deeponet">{{cite journal \| vauthors=((Lu, L., Jin, P., Pang, G., Zhang, Z., Karniadakis, G. E.)) \| title=Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators \| volume=3 \| issue=3 \| pages=218--229 \| publisher=Nature Publishing Group UK London \| date=19 October 2021}}</ref>, and subsumes these settings when limited to fixed input resolution. Line 18: == Definition and formulation == Architecturally, neural operators are similar to feed-forward neural networks in the sense that they are comprised of alternating [[Linear map\|linear maps]] and non-linearities. Since neural operators act on and output functions, neural operators have been instead formulated as a sequence of alternating linear [[Integral operators\|integral operators]] on function spaces and point-wise non-linearities.<ref name="NO journal" /> Using an analogous architecture to finite-dimensional neural networks, similar [[Universal approximation theorem\|universal approximation theorems]] have been proven for neural operators. In particular, it has been shown that neural operators can approximate any continuous operator on a [[Compact space\|compact]] set.<ref name="NO journal">{{cite journal \|last1=Kovachki \|first1=Nikola \|last2=Li \|first2=Zongyi \|last3=Liu \|first3=Burigede \|last4=Azizzadenesheli \|first4=Kamyar \|last5=Bhattacharya \|first5=Kaushik \|last6=Stuart \|first6=Andrew \|last7=Anandkumar \|first7=Anima \|title=Neural operator: Learning maps between function spaces \|journal=Journal of Machine Learning Research \|volume=24 \|page=1-97 \|url=https://www.jmlr.org/papers/volume24/21-1524/21-1524.pdf}}</ref> Neural operators seek to approximate some operator <math>\mathcal{G} : \mathcal{A} \to \mathcal{U}</math> bybetween ~~building~~function ~~a parametric map~~spaces <math>~~\mathcal{G}_\theta :~~ \mathcal{A~~} \to \mathcal{U~~}</math>. ~~Let~~and <math>~~a \in~~ \mathcal{AU}</math>, ~~where~~by building a parametric map <math>\mathcal{AG}~~</math>~~_\phi ~~denotes~~: ~~some~~\mathcal{A} ~~input~~\to ~~function space. Let <math>~~\mathcal{U}</math>. ~~denote~~Such ~~the~~parametric ~~output space and let~~maps <math>~~u \in~~ \mathcal{UG}_\phi</math>. ~~Neural operators are~~can generally be defined in the form <math>\mathcal{G}_\~~theta~~phi := \mathcal{Q} \circ \sigma(W_T + \mathcal{K}_T + b_T) \circ \cdots \circ \sigma(W_1 + \mathcal{K}_1 + b_1) \circ \mathcal{P},</math> where <math>\mathcal{P}, \mathcal{Q}</math> are the lifting (lifting the codomain of the input function to a higher dimensional space) and projection (projecting the codomain of the intermediate function to the output codimension) operators, respectively. These operators act pointwise on functions and are typically parametrized as a [[Multilayer perceptron\|multilayer ~~perceptron~~perceptrons]]. <math>\sigma</math> is a pointwise nonlinearity, such as a [[Rectifier (neural networks)\|rectified linear unit (ReLU)]], or a [[Rectifier (neural networks)#Other_non-linear_variants\|Gaussian error linear unit (GeLU)]]. Each layer <math>it=1, \dots, T</math> has a respective local operator <math>~~W_i~~W_t</math> (usually parameterized by a pointwise neural network), a kernel integral operator <math>\mathcal{K}_t</math>, and a bias function <math>~~b_i~~b_t</math>. Given some intermediate functional representation <math>v_t</math> with ___domain <math>D</math> in athe <math>t</math>-th hidden layer, a kernel integral operator <math>\mathcal{K}_\phi</math> is defined as <math>(\mathcal{K}_\phi v_t)(x) := \int_D \kappa_\phi(x, y, v_t(x), v_t(y))v_t(y)dy, </math> where the kernel <math>\kappa_\phi</math> is a learnable implicit neural network, parametrized by <math>\phi</math>. In practice, one is often given the input function to the neural operator at a specific resolution. For instance ~~for the <math>i</math>th data sample~~, consider the setting where one is given the evaluation of <math>v_t</math> at <math>n</math> points <math>\{y_j\}_j^n</math>. Borrowing from [[Nyström method\|Nyström integral approximation methods]] such as [[Riemann sum\|Riemann sum integration]] and [[Gaussian quadrature\|Gaussian quadrature]], the above integral operation can be computed as follows: <math>\int_D \kappa_\phi(x, y, v_t(x), v_t(y))v_t(y)dy\approx \sum_j^n \kappa_\phi(x, y_j, v_t(x), v_t(y_j))v_t(y_j)\Delta_{y_j}, </math> where <math>\Delta_{y_j}</math> is the sub-area volume or quadrature weight associated to the point <math>y_j</math>. Thus, a simplified layer can be computed as <math>v_{t+1}(x) \approx \sigma\left(\sum_j^n \kappa_\phi(x, y_j, v_t(x), v_t(y_j))v_t(y_j)\Delta_{y_j} + W_t(v_t(y_j)) + b_t(x)\right).</math> The above approximation, along with ~~deployment of implicit neural network for~~parametrizing <math>\kappa_\phi</math> as an implicit neural network, results in the graph neural operator (GNO)<ref name="Graph NO">{{cite journal \|last1=Li \|first1=Zongyi \|last2=Kovachki \|first2=Nikola \|last3=Azizzadenesheli \|first3=Kamyar \|last4=Liu \|first4=Burigede \|last5=Bhattacharya \|first5=Kaushik \|last6=Stuart \|first6=Andrew \|last7=Anima \|first7=Anandkumar \|title=Neural operator: Graph kernel network for partial differential equations \|journal=arXiv preprint arXiv:2003.03485 \|date=2020 \|url=https://arxiv.org/pdf/2003.03485.pdf}}</ref>. There have been various parameterizations of neural operators for different applications<ref name="FNO" /><ref name="Graph NO">{{cite journal \|last1=Li \|first1=Zongyi \|last2=Kovachki \|first2=Nikola \|last3=Azizzadenesheli \|first3=Kamyar \|last4=Liu \|first4=Burigede \|last5=Bhattacharya \|first5=Kaushik \|last6=Stuart \|first6=Andrew \|last7=Anima \|first7=Anandkumar \|title=Neural operator: Graph kernel network for partial differential equations \|journal=arXiv preprint arXiv:2003.03485 \|date=2020 \|url=https://arxiv.org/pdf/2003.03485.pdf}}</ref>. These typically differ in their parameterization of <math>\kappa</math>. The most popular instantiation is the Fourier neural operator (FNO). FNO takes <math>\kappa_\phi(x, y, av_t(x), av_t(y)~~)v_t(y~~) := \kappa_\phi(x-y)</math> and by applying the [[Convolution theorem\|convolution theorem]], arrives at the following parameterization of the kernel ~~integration~~integral operator: <math>(\mathcal{K}_\phi~~(a)~~ v_t)(x) = \mathcal{F}^{-1} (R_\phi \cdot (\mathcal{F}v_t))(x), </math> where <math>\mathcal{F}</math> represents the Fourier transform and <math>R_\phi</math> represents the Fourier transform of some periodic function <math>\~~kappa~~kappa_\phi</math>. That is, FNO parameterizes the kernel integration directly in Fourier space, using a ~~handful~~prescribed number of Fourier modes. When the grid at which the input function is presented is uniform, the Fourier transform can be approximated using ~~summation, resulting in~~the [[Discrete Fourier transform\|discrete Fourier transform (DFT)]] with frequencies atbelow some specified threshold. The discrete Fourier transform can be computed using a [[Fast Fourier transform\|fast Fourier transform (FFT)]] implementation. == Training == Training neural operators is similar to the training process for a traditional neural network. Neural operators are typically trained in some [[Lp norm]] or [[Sobolev norm]]. In particular, for a dataset <math>\{(a_i, u_i)\}_{i=1}^N</math> of size <math>N</math>, neural operators minimize (a discretization of) <math>\mathcal{L}_\mathcal{U}(\{(a_i, u_i)\}_{i=1}^N) := \sum_{i=1}^N \\|u_i - \mathcal{G}_\theta (a_i) \\|_\mathcal{U}^2</math>, ~~in some norm~~where <math>\\|\cdot \\|_\mathcal{U}.</math> is a norm on the output function space <math>\mathcal{U}</math>. Neural operators can be trained directly using [[Backpropagation\|backpropagation]] and [[Gradient descent\|gradient descent]]-based methods. Another training paradigm is associated with physics-informed machine learning. In particular, [[Physics-informed neural networks\|physics-informed neural networks]] (PINNs) use complete physics laws to fit neural networks to solutions of PDEs. ~~The extension~~Extensions of this paradigm to operator learning are broadly called physics -informed neural operators (PINO),<ref name="PINO">{{cite journal \|last1=Li \|first1=Zongyi \| last2=Hongkai\| first2=Zheng \|last3=Kovachki \|first3=Nikola \| last4=Jin \| first4=David \| last5=Chen \| first5= Haoxuan \|last6=Liu \|first6=Burigede \| last7=Azizzadenesheli \|first7=Kamyar \|last8=Anima \|first8=Anandkumar \|title=Physics-Informed Neural Operator for Learning Partial Differential Equations \|journal=arXiv preprint arXiv:2111.03794 \|date=2021 \|url=https://arxiv.org/abs/2111.03794}}</ref>, where loss functions ~~can~~ can include full physics equations or partial physical laws. As opposed to standard PINNs, the PINO paradigm incorporates a data loss (as defined above) in addition to the physics loss <math>\mathcal{L}_{PDE}(a, \mathcal{G}_\theta (a))</math>. The physics loss <math>\mathcal{L}_{PDE}(a, \mathcal{G}_\theta (a))</math> quantifies how much the predicted solution of <math>\mathcal{G}_\theta (a)</math> violates the PDEs equation for the input <math>a</math>. == References ==

Neural operators: Difference between revisions