Radial basis function kernel: Difference between revisions

Content deleted Content added
m simple clarification that both samples live in the same k-dimensional space
 
(6 intermediate revisions by 5 users not shown)
Line 1:
{{Short description|Machine learning kernel function}}
In [[machine learning]], the '''[[radial basis function]] kernel''', or '''RBF kernel''', is a popular [[Positive-definite kernel|kernel function]] used in various [[kernel method|kernelized]] learning algorithms. In particular, it is commonly used in [[support vector machine]] [[statistical classification|classification]].<ref name="Chang2010">{{cite journal | last1 = Chang | first1 = Yin-Wen | last2 = Hsieh | first2 = Cho-Jui | last3 = Chang | first3 = Kai-Wei | last4 = Ringgaard | first4 = Michael | last5 = Lin | first5 = Chih-Jen | year = 2010 | title = Training and testing low-degree polynomial data mappings via linear SVM | url = httphttps://jmlr.org/papers/v11/chang10a.html | journal = Journal of Machine Learning Research | volume = 11 | pages = 1471–1490 }}</ref>
 
The RBF kernel on two samples <math>\mathbf{x},\mathbf{x'}\in \mathbb{R}^{k}</math> and '''x'''', represented as feature vectors in some ''input space'', is defined as<ref name="primer">Jean-Philippe Vert, Koji Tsuda, and Bernhard Schölkopf (2004). [httphttps://cbio.ensmp.fr/~jvert/publi/04kmcbbook/kernelprimer.pdf "A primer on kernel methods".] ''Kernel Methods in Computational Biology''.</ref>
 
:<math>K(\mathbf{x}, \mathbf{x'}) = \exp\left(-\frac{\|\mathbf{x} - \mathbf{x'}\|^2}{2\sigma^2}\right)</math>
Line 46:
</math>
==Approximations==
Because support vector machines and other models employing the [[kernel trick]] do not scale well to large numbers of training samples or large numbers of features in the input space, several approximations to the RBF kernel (and similar kernels) have been introduced.<ref>Andreas Müller (2012). [httphttps://peekaboo-vision.blogspot.de/2012/12/kernel-approximations-for-efficient.html Kernel Approximations for Efficient SVMs (and other feature extraction methods)].</ref>
Typically, these take the form of a function ''z'' that maps a single vector to a vector of higher dimensionality, approximating the kernel:
 
Line 54:
 
=== Fourier random features ===
{{Main|Random Fourier feature}}
 
 
One way to construct such a ''z'' is to randomly sample from the [[Fourier transformation]] of the kernel<ref>{{Cite journal |last1=Rahimi |first1=Ali |last2=Recht |first2=Benjamin |date=2007 |title=Random Features for Large-Scale Kernel Machines |url=https://proceedings.neurips.cc/paper/2007/hash/013a006f03dbc5392effeb8f18fda755-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=20}}</ref><math display="block">\varphi(x) = \frac{1}{\sqrt D}[\cos\langle w_1, x\rangle, \sin\langle w_1, x\rangle, \ldots, \cos\langle w_D, x\rangle, \sin\langle w_D, x\rangle]^T</math>where <math>w_1, ..., w_D</math> are independent samples from the normal distribution <math>N(0, \sigma^{-2} I)</math>.
 
'''Theorem:''' <math> \operatorname E[\langle \varphi(x), \varphi(y)\rangle] = e^{\frac{\|x-y\|^2}{/(2\sigma^2})}. </math>
 
'''Proof:''' It suffices to prove the case of <math>D=1</math>. Use the trigonometric identity <math>\cos(a-b) = \cos(a)\cos(b) + \sin(a)\sin(b)</math>, the spherical symmetry of gaussian[[Gaussian distribution]], then evaluate the integral
 
: <math>\int_{-\infty}^\infty \frac{\cos (k x) e^{-x^2 / 2}}{\sqrt{2 \pi}} d x=e^{-k^2 / 2}. </math>
Line 67 ⟶ 68:
 
=== Nyström method ===
Another approach uses the [[Nyström method]] to approximate the [[eigendecomposition]] of the [[Gramian matrix|Gram matrix]] ''K'', using only a random sample of the training set.<ref>{{cite journal |author1=C.K.I. Williams |author2=M. Seeger |title=Using the Nyström method to speed up kernel machines |journal=Advances in Neural Information Processing Systems |year=2001 |volume=13 |url= httphttps://papers.nips.cc/paper/1866-using-the-nystrom-method-to-speed-up-kernel-machines}}</ref>
 
==See also==