Revision as of 18:00, 4 October 2023 edit Kazizzad (talk \| contribs) 2 edits →Definition and formulation ← Previous edit		Revision as of 21:40, 4 October 2023 edit undo 216.228.127.130 (talk) No edit summary Next edit →
Line 20: <math>\mathcal{G}_\theta := \mathcal{Q} \circ \sigma(W_T + \mathcal{K}_T + b_T) \circ \cdots \circ \sigma(W_1 + \mathcal{K}_1 + b_1) \circ \mathcal{P},</math> where <math>\mathcal{P}, \mathcal{Q}</math> are the lifting (lifting the codomain of the input function to a higher dimensional space) and projection (projecting the codomain of the intermediate function to the output codimension) operators, respectively. These operators act pointwise on functions and are typically parametrized as a [[Multilayer perceptron\|multilayer perceptron]]. <math>\sigma</math> is a ~~point-wise~~pointwise nonlinearity, such as a [[Rectifier (neural networks)\|rectified linear unit (ReLU)]], or a [[Rectifier (neural networks)#Other_non-linear_variants\|Gaussian error linear unit(GeLU)]]. Each layer <math>i=1, \dots, T</math> has a respective local operator <math>W_i</math> (usually parameterized by a pointwise neural network) and a bias function <math>b_i</math>. Given some intermediate functional representation <math>v_t</math> with ___domain <math>D</math> in a hidden layer, a kernel integral operator <math>\mathcal{K}_\phi</math> is defined as <math>(\mathcal{K}_\phi~~(a)~~ v_t)(x) = \int_D \kappa_\phi(x, y, av_t(x), av_t(y))v_t(y)dy, </math> where the ~~integral~~ kernel <math>\kappa_\phi</math> is ~~parametrized by <math>\phi</math> and can be instantiated in many ways. The varying~~a ~~parameterizations~~learnable ofimplicit neural ~~operators~~network, ~~typically~~parametrized ~~differ in their parameterization of~~by <math>\~~kappa~~phi</math>. In practice, we are often given the input function to the neural operator at a certain resolution for each data point. For the <math>i</math>'th data point, let's consider the setting where we have evaluation of <math>v_t</math> at <math>n</math> points <math>\{y_j\}_j^n</math>. Borrowing from [[Nyström method\|Nyström integral approximation methods]] such as [[Riemann sum\|Riemann sum integration]] and [[Gaussian quadrature\|Gaussian quadrature]], we compute the above integral operation as follows, <math>\int_D \kappa_\phi(x, y, v_t(x), v_t(y))v_t(y)dy\approx \sum_j^n \kappa_\phi(x, y_j, v_t(x), v_t(y_j))v_t(y_j)\Delta_{y_j}, </math> where <math>\Delta_{y_j}</math> is the sub-area volume or quadrature weight and approximation error . Ergo, a simplified layer can be computed as follows, <math>v_{t+1}(x) \approx \sigma(\sum_j^n \kappa_\phi(x, y_j, v_t(x), v_t(y_j))v_t(y_j)\Delta_{y_j} + W_t(v_t(y_j)) + b_t(x))</math> Many variants of the architecture is developed in the prior work, and some of them are supported in the [https://neuraloperator.github.io/neuraloperator/dev/index.html neural operator library]. The above approximation, along with deployment of implicit neural network for <math>\kappa_\phi</math> results in graph neural operator (GNO)<ref name="Graph NO">{{cite journal \|last1=Li \|first1=Zongyi \|last2=Kovachki \|first2=Nikola \|last3=Azizzadenesheli \|first3=Kamyar \|last4=Liu \|first4=Burigede \|last5=Bhattacharya \|first5=Kaushik \|last6=Stuart \|first6=Andrew \|last7=Anima \|first7=Anandkumar \|title=Neural operator: Graph kernel network for partial differential equations \|journal=arXiv preprint arXiv:2003.03485 \|date=2020 \|url=https://arxiv.org/pdf/2003.03485.pdf}}</ref>. The varying parameterizations of neural operators typically differ in their parameterization of <math>\kappa</math>. There have been various parameterizations of neural operators for different applications<ref name="FNO" /><ref name="Graph NO">{{cite journal \|last1=Li \|first1=Zongyi \|last2=Kovachki \|first2=Nikola \|last3=Azizzadenesheli \|first3=Kamyar \|last4=Liu \|first4=Burigede \|last5=Bhattacharya \|first5=Kaushik \|last6=Stuart \|first6=Andrew \|last7=Anima \|first7=Anandkumar \|title=Neural operator: Graph kernel network for partial differential equations \|journal=arXiv preprint arXiv:2003.03485 \|date=2020 \|url=https://arxiv.org/pdf/2003.03485.pdf}}</ref>. The most popular instantiation is the Fourier neural operator (FNO). FNO takes <math>\kappa_\phi(x, y, a(x), a(y))v_t(y) = \kappa_\phi(x-y)</math> and by applying the [[Convolution theorem\|convolution theorem]], arrives at the following parameterization of the kernel integration: <math>(\mathcal{K}_\phi(a)v_t)(x) = \mathcal{F}^{-1} (R_\phi \cdot (\mathcal{F}v_t))(x), </math> where <math>\mathcal{F}</math> represents the Fourier transform and <math>R_\phi</math> represents the Fourier transform of some periodic function <math>\kappa</math>. That is, FNO parameterizes the kernel integration directly in Fourier space, ~~truncating~~using ~~from~~a handful of Fourier modes. When the grid at which the input function is presented is uniform, the Fourier transform can be approximated using summation, resulting in [[Discrete Fourier transform\|discrete Fourier transform (DFT)]] with frequencies at some specified threshold. The discrete Fourier transform can be computed using a [[Fast Fourier transform\|fast Fourier transform (FFT)]] implementation, making FNO architecture among the fastest and most sample-efficient neural operator architectures. == Training == Line 38 ⟶ 50: in some norm <math>\\|\cdot \\|_\mathcal{U}.</math> Neural operators can be trained directly using [[Backpropagation\|backpropagation]] and [[Gradient descent\|gradient descent]]-based methods. When dealing with modeling natural phenomena, often physics equations, mostly in the form of PDEs, drive the physical world around us.<ref name="Evans"> {{cite journal \|author-link=Lawrence C. Evans \|first=L. C. \|last=Evans \|title=Partial Differential Equations \|publisher=American Mathematical Society \|___location=Providence \|year=1998 \|isbn=0-8218-0772-2 }}</ref>. Based on this idea, physics-informed neural networks[[Physics-informed neural networks\|physics-informed neural networks]] utilize complete physics laws to fit neural networks to solutions of PDEs. The general extension to operator learning is physics informed neural operator paradigm (PINO),<ref name="PINO">{{cite journal \|last1=Li \|first1=Zongyi \| last2=Hongkai\| first2=Zheng \|last3=Kovachki \|first3=Nikola \| last4=Jin \| first4=David \| last5=Chen \| first5= Haoxuan \| \|last6=Liu \|first6=Burigede \| last7=Azizzadenesheli \|first7=Kamyar \|last8=Anima \|first8=Anandkumar \|title=Physics-Informed Neural Operator for Learning Partial Differential Equations \|journal=https://arxiv.org/pdf/2111.03794.pdf \|date=2021 \|url=https://arxiv.org/abs/2111.03794}}</ref>, where the supervision can also be channeled through physics equations and can process learning through partially available physics. PINO is mainly a supervised learning setting that is suitable for cases where partial data or partial physics in available. In short, in PINO, in addition to the data loss mentioned above, physics loss <math>\mathcal{L}_PDE((a, \mathcal{G}_\theta (a))</math>, is used for further training. The physics loss <math>\mathcal{L}_PDE((a, \mathcal{G}_\theta (a))</math> quantifies how much the predicted solution of <math>\mathcal{G}_\theta (a)</math> violates the PDEs equation for the input <math>a</math>. == References ==

Neural operators: Difference between revisions