Radial basis function kernel

In machine learning, the (Gaussian) radial basis function kernel, or RBF kernel, is a popular kernel function. It is the most popular kernel function used in support vector machine classification^[1]

The RBF kernel on two samples x and x', represented as feature vectors in some input space, is defined as^[2]

K(\mathbf {x} ,\mathbf {x'} )=\exp(-{\frac {||\mathbf {x} -\mathbf {x'} ||_{2}^{2}}{2\sigma ^{2}}})

$\textstyle ||\mathbf {x} -\mathbf {x'} ||_{2}^{2}$ may be recognized as the squared Euclidean distance between the two feature vectors. $\sigma$ is a free parameter. An equivalent, but simpler, definition involves a parameter $\textstyle \gamma =-{\tfrac {1}{2\sigma ^{2}}}$ :

K(\mathbf {x} ,\mathbf {x'} )=\exp(\gamma ||\mathbf {x} -\mathbf {x'} ||_{2}^{2})

Since the value of the RBF kernel decreases with distance and ranges between zero (in the limit) and one (when x = x'), it has a ready interpretation as a similarity measure.^[2] The feature space of the kernel has an infinite number of dimensions; for $\sigma =1$ , its expansion is:^[3]

\exp(-{\frac {1}{2}}||\mathbf {x} -\mathbf {x'} ||_{2}^{2})=\sum _{j=0}^{\infty }{\frac {(\mathbf {x'} ^{\top }\mathbf {x'} )^{j}}{j!}}\exp(-{\frac {1}{2}}||\mathbf {x} ||_{2}^{2})\exp(-{\frac {1}{2}}||\mathbf {x'} ||_{2}^{2})

Approximations

Because support vector machines and other models employing the kernel trick do not scale well to large numbers of training samples or large numbers of features in the input space, several approximations to the RBF kernel (and similar kernels) have been devised.^[4] Typically, these take the form z(x), i.e. a function transforming a single vector independently of other vectors (e.g. the support vectors in an SVM), such that

z(\mathbf {x} )z(\mathbf {x'} )\approx \varphi (\mathbf {x} )\varphi (\mathbf {x'} )=K(\mathbf {x} ,\mathbf {x'} )

where $\textstyle \varphi$ is the implicit mapping embedded in the RBF kernel.

One way to construct such a z is to randomly sample from the Fourier transformation of the kernel.^[5]

External links

Kernels Part 1: What is an RBF Kernel? Really?

References

^ Yin-Wen Chang, Cho-Jui Hsieh, Kai-Wei Chang, Michael Ringgaard and Chih-Jen Lin (2010). Training and testing low-degree polynomial data mappings via linear SVM. J. Machine Learning Research 11:1471–1490.
^ ^a ^b Vert, Jean-Philippe, Koji Tsuda, and Bernhard Schölkopf (2004). "A primer on kernel methods." Kernel Methods in Computational Biology.
^ Shashua, Amnon (2009). "Introduction to Machine Learning: Class Notes 67577". 1. arXiv:0904.3664 [cs.LG]. {{cite arXiv}}: |access-date= requires |url= (help); Unknown parameter |accessdate= ignored (help); Unknown parameter |version= ignored (help)
^ Andreas Müller (2012). Kernel Approximations for Efficient SVMs (and other feature extraction methods).
^ Ali Rahimi and Benjamin Recht (2007). Random features for large-scale kernel machines. Neural Information Processing Systems.

This artificial intelligence-related article is a stub. You can help Wikipedia by expanding it.

[Chang2010-1] Yin-Wen Chang, Cho-Jui Hsieh, Kai-Wei Chang, Michael Ringgaard and Chih-Jen Lin (2010). Training and testing low-degree polynomial data mappings via linear SVM. J. Machine Learning Research 11:1471–1490.

[primer-2] Vert, Jean-Philippe, Koji Tsuda, and Bernhard Schölkopf (2004). "A primer on kernel methods." Kernel Methods in Computational Biology.

[3] Shashua, Amnon (2009). "Introduction to Machine Learning: Class Notes 67577". 1. arXiv:0904.3664 [cs.LG]. {{cite arXiv}}: |access-date= requires |url= (help); Unknown parameter |accessdate= ignored (help); Unknown parameter |version= ignored (help)

[4] Andreas Müller (2012). Kernel Approximations for Efficient SVMs (and other feature extraction methods).

[5] Ali Rahimi and Benjamin Recht (2007). Random features for large-scale kernel machines. Neural Information Processing Systems.

[1]

[2]

[3]

[4]

[5]

Radial basis function kernel

Contents

Approximations

External links

See also

References