Neural operators: Difference between revisions

Content deleted Content added
No edit summary
Line 20:
<math>\mathcal{G}_\theta := \mathcal{Q} \circ \sigma(W_T + \mathcal{K}_T + b_T) \circ \cdots \circ \sigma(W_1 + \mathcal{K}_1 + b_1) \circ \mathcal{P},</math>
 
where <math>\mathcal{P}, \mathcal{Q}</math> are the lifting (lifting the codomain of the input function to a higher dimensional space) and projection (projecting the codomain of the intermediate function to the output codimension) operators, respectively. These operators act pointwise on functions and are typically parametrized as a [[Multilayer perceptron|multilayer perceptron]]. <math>\sigma</math> is a point-wisepointwise nonlinearity, such as a [[Rectifier (neural networks)|rectified linear unit (ReLU)]], or a [[Rectifier (neural networks)#Other_non-linear_variants|Gaussian error linear unit(GeLU)]]. Each layer <math>i=1, \dots, T</math> has a respective local operator <math>W_i</math> (usually parameterized by a pointwise neural network) and a bias function <math>b_i</math>. Given some intermediate functional representation <math>v_t</math> with ___domain <math>D</math> in a hidden layer, a kernel integral operator <math>\mathcal{K}_\phi</math> is defined as
 
<math>(\mathcal{K}_\phi(a) v_t)(x) = \int_D \kappa_\phi(x, y, av_t(x), av_t(y))v_t(y)dy, </math>
 
where the integral kernel <math>\kappa_\phi</math> is parametrized by <math>\phi</math> and can be instantiated in many ways. The varyinga parameterizationslearnable ofimplicit neural operatorsnetwork, typicallyparametrized differ in their parameterization ofby <math>\kappaphi</math>.
 
In practice, we are often given the input function to the neural operator at a certain resolution for each data point. For the <math>i</math>'th data point, let's consider the setting where we have evaluation of <math>v_t</math> at <math>n</math> points <math>\{y_j\}_j^n</math>. Borrowing from [[Nyström method|Nyström integral approximation methods]] such as [[Riemann sum|Riemann sum integration]] and [[Gaussian quadrature|Gaussian quadrature]], we compute the above integral operation as follows,
 
<math>\int_D \kappa_\phi(x, y, v_t(x), v_t(y))v_t(y)dy\approx \sum_j^n \kappa_\phi(x, y_j, v_t(x), v_t(y_j))v_t(y_j)\Delta_{y_j}, </math>
where <math>\Delta_{y_j}</math> is the sub-area volume or quadrature weight and approximation error . Ergo, a simplified layer can be computed as follows,
 
 
<math>v_{t+1}(x) \approx \sigma(\sum_j^n \kappa_\phi(x, y_j, v_t(x), v_t(y_j))v_t(y_j)\Delta_{y_j} + W_t(v_t(y_j)) + b_t(x))</math>
 
Many variants of the architecture is developed in the prior work, and some of them are supported in the [https://neuraloperator.github.io/neuraloperator/dev/index.html neural operator library]. The above approximation, along with deployment of implicit neural network for <math>\kappa_\phi</math> results in graph neural operator (GNO)<ref name="Graph NO">{{cite journal |last1=Li |first1=Zongyi |last2=Kovachki |first2=Nikola |last3=Azizzadenesheli |first3=Kamyar |last4=Liu |first4=Burigede |last5=Bhattacharya |first5=Kaushik |last6=Stuart |first6=Andrew |last7=Anima |first7=Anandkumar |title=Neural operator: Graph kernel network for partial differential equations |journal=arXiv preprint arXiv:2003.03485 |date=2020 |url=https://arxiv.org/pdf/2003.03485.pdf}}</ref>.
 
 
The varying parameterizations of neural operators typically differ in their parameterization of <math>\kappa</math>.
There have been various parameterizations of neural operators for different applications<ref name="FNO" /><ref name="Graph NO">{{cite journal |last1=Li |first1=Zongyi |last2=Kovachki |first2=Nikola |last3=Azizzadenesheli |first3=Kamyar |last4=Liu |first4=Burigede |last5=Bhattacharya |first5=Kaushik |last6=Stuart |first6=Andrew |last7=Anima |first7=Anandkumar |title=Neural operator: Graph kernel network for partial differential equations |journal=arXiv preprint arXiv:2003.03485 |date=2020 |url=https://arxiv.org/pdf/2003.03485.pdf}}</ref>. The most popular instantiation is the Fourier neural operator (FNO). FNO takes <math>\kappa_\phi(x, y, a(x), a(y))v_t(y) = \kappa_\phi(x-y)</math> and by applying the [[Convolution theorem|convolution theorem]], arrives at the following parameterization of the kernel integration:
 
<math>(\mathcal{K}_\phi(a)v_t)(x) = \mathcal{F}^{-1} (R_\phi \cdot (\mathcal{F}v_t))(x), </math>
 
where <math>\mathcal{F}</math> represents the Fourier transform and <math>R_\phi</math> represents the Fourier transform of some periodic function <math>\kappa</math>. That is, FNO parameterizes the kernel integration directly in Fourier space, truncatingusing froma handful of Fourier modes. When the grid at which the input function is presented is uniform, the Fourier transform can be approximated using summation, resulting in [[Discrete Fourier transform|discrete Fourier transform (DFT)]] with frequencies at some specified threshold. The discrete Fourier transform can be computed using a [[Fast Fourier transform|fast Fourier transform (FFT)]] implementation, making FNO architecture among the fastest and most sample-efficient neural operator architectures.
 
== Training ==
Line 38 ⟶ 50:
 
in some norm <math>\|\cdot \|_\mathcal{U}.</math> Neural operators can be trained directly using [[Backpropagation|backpropagation]] and [[Gradient descent|gradient descent]]-based methods.
 
When dealing with modeling natural phenomena, often physics equations, mostly in the form of PDEs, drive the physical world around us.<ref name="Evans"> {{cite journal |author-link=Lawrence C. Evans |first=L. C. |last=Evans |title=Partial Differential Equations |publisher=American Mathematical Society |___location=Providence |year=1998 |isbn=0-8218-0772-2 }}</ref>. Based on this idea, physics-informed neural networks[[Physics-informed neural networks|physics-informed neural networks]] utilize complete physics laws to fit neural networks to solutions of PDEs. The general extension to operator learning is physics informed neural operator paradigm (PINO),<ref name="PINO">{{cite journal |last1=Li |first1=Zongyi | last2=Hongkai| first2=Zheng |last3=Kovachki |first3=Nikola | last4=Jin | first4=David | last5=Chen | first5= Haoxuan | |last6=Liu |first6=Burigede | last7=Azizzadenesheli |first7=Kamyar |last8=Anima |first8=Anandkumar |title=Physics-Informed Neural Operator for Learning Partial Differential Equations |journal=https://arxiv.org/pdf/2111.03794.pdf |date=2021 |url=https://arxiv.org/abs/2111.03794}}</ref>, where the supervision can also be channeled through physics equations and can process learning through partially available physics. PINO is mainly a supervised learning setting that is suitable for cases where partial data or partial physics in available. In short, in PINO, in addition to the data loss mentioned above, physics loss <math>\mathcal{L}_PDE((a, \mathcal{G}_\theta (a))</math>, is used for further training. The physics loss <math>\mathcal{L}_PDE((a, \mathcal{G}_\theta (a))</math> quantifies how much the predicted solution of <math>\mathcal{G}_\theta (a)</math> violates the PDEs equation for the input <math>a</math>.
 
 
 
 
== References ==