Radial basis function kernel: Difference between revisions

Content deleted Content added
No edit summary
No edit summary
Line 39:
=
\exp\left(-\frac{1}{2}\|\mathbf{x}\|^2\right)
\left(a^{(0)}_{l_0\ell_0},a^{(1)}_1,\dots,a^{(1)}_{l_1\ell_1},\dots,a^{(j)}_{1}_1,\dots,a^{(j)}_{l_j\ell_j},\dots \right )
</math>
where <math>l_j\ell_j=\tbinom {k+j-1}{j}</math>,
:<math>
a^{(j)}_{l\ell}=\frac{x_1^{n_1}\cdots x_k^{n_k} }{\sqrt{n_1! \cdots n_k! }} \quad|\quad n_1+n_2+\dots+n_k = j \wedge 1\leq l\ell\leq l_j\ell_j
</math>
==Approximations==
Line 58:
One way to construct such a ''z'' is to randomly sample from the [[Fourier transformation]] of the kernel<ref>{{Cite journal |last1=Rahimi |first1=Ali |last2=Recht |first2=Benjamin |date=2007 |title=Random Features for Large-Scale Kernel Machines |url=https://proceedings.neurips.cc/paper/2007/hash/013a006f03dbc5392effeb8f18fda755-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=20}}</ref><math display="block">\varphi(x) = \frac{1}{\sqrt D}[\cos\langle w_1, x\rangle, \sin\langle w_1, x\rangle, \cdots \cos\langle w_D, x\rangle, \sin\langle w_D, x\rangle]^T</math>where <math>w_1, ..., w_D</math> are independent samples from the normal distribution <math>N(0, \sigma^{-2} I)</math>.
 
'''Theorem:''' <math>\mathbboperatorname E[\langle \varphi(x), \varphi(y)\rangle] = e^{\frac{\|x-y\|^2}{2\sigma^2}}. </math>.
 
'''Proof:''' It suffices to prove the case of <math>D=1</math>. Use the trigonometric identity <math>\cos(a-b) = \cos(a)\cos(b) + \sin(a)\sin(b)</math>, the spherical symmetry of gaussian distribution, then evaluate the integral <math>\int_{-\infty}^{\infty} \frac{\cos (k x) e^{-x^2 / 2}}{\sqrt{2 \pi}} d x=e^{-k^2 / 2}</math>.
 
'''Theorem:''' <math>\operatorname{Var}[\langle \varphi(x), \varphi(y)\rangle] = O(D^{-1})</math>. (Appendix A.2<ref>{{Cite arXiv |last1=Peng |first1=Hao |last2=Pappas |first2=Nikolaos |last3=Yogatama |first3=Dani |last4=Schwartz |first4=Roy |last5=Smith |first5=Noah A. |last6=Kong |first6=Lingpeng |date=2021-03-19 |title=Random Feature Attention |class=cs.CL |eprint=2103.02143 }}</ref>).
 
=== Nyström method ===
Line 73:
* [[Radial basis function]]
* [[Radial basis function network]]
* [[Obst Kernelkernel network]]
 
==References==