Radial basis function kernel: Difference between revisions

Content deleted Content added
m Reverted 1 edit by Enerjiparki (talk) to last revision by Qwertyus. (TW)
No edit summary
Line 1:
In [[machine learning]], the ('''Gaussian''') '''[[radial basis function]] kernel''', or '''RBF kernel''', is a popular [[Positive-definite kernel|kernel function]]. It is the most popular kernel function used in [[support vector machine]] [[statistical classification|classification]].<ref name="Chang2010">Yin-Wen Chang, Cho-Jui Hsieh, Kai-Wei Chang, Michael Ringgaard and Chih-Jen Lin (2010). ''Training and testing low-degree polynomial data mappings via linear SVM''. J. Machine Learning Research '''11''':1471–1490.</ref>K
 
The RBF kernel on two samples '''x''' and '''x'''', represented as feature vectors in some ''input space'', is defined as<ref name="primer">Vert, Jean-Philippe, Koji Tsuda, and Bernhard Schölkopf (2004). "A primer on kernel methods." Kernel Methods in Computational Biology.</ref>
 
:<math>K(\mathbf{x}, \mathbf{x'}) = \exp\left(-\frac{||\mathbf{x} - \mathbf{x'}||_2^2}{2\sigma^2}\right)</math>
 
<math>\textstyle||\mathbf{x} - \mathbf{x'}||_2^2</math> may be recognized as the [[Euclidean_distance#Squared_Euclidean_distance|squared Euclidean distance]] between the two feature vectors. <math>\sigma</math> is a free parameter. An equivalent, but simpler, definition involves a parameter <math>\textstyle\gamma = -\tfrac{1}{2\sigma^2}</math>:
 
:<math>K(\mathbf{x}, \mathbf{x'}) = \exp(\gamma||\mathbf{x} - \mathbf{x'}||_2^2)</math>
 
Since the value of the RBF kernel decreases with distance and ranges between zero (in the limit) and one (when '''x''' = '''x''''), it has a ready interpretation as a [[similarity measure]].<ref name="primer"/>
The [[feature space]] of the kernel has an infinite number of dimensions; for <math>\sigma = 1</math>, its expansion is:<ref>{{cite arXiv
|last=Shashua
|first=Amnon
|eprint=0904.3664
|title=Introduction to Machine Learning: Class Notes 67577
|class=cs.LG
|year=2009
Line 25 ⟶ 10:
 
==Approximations==
Because support vector machines and other models employing the [[kernel trick]] do not scale well to large numbers of training samples or large numbers of features in the input space, several approximations to the RBF kernel (and similar kernels) have been devised.<ref>Andreas Müller (2012). [http://peekaboo-vision.blogspot.de/2012/12/kernel-approximations-for-efficient.html Kernel Approximations for Efficient SVMs (and other feature extraction methods)].</ref>
Typically, these take the form of a function ''z'' that maps a single vector to a vector of higher dimensionality, approximating the kernel:
 
:<math>z(\mathbf{x})z(\mathbf{x'}) \approx \varphi(\mathbf{x})\varphi(\mathbf{x'}) = K(\mathbf{x}, \mathbf{x'})</math>
 
where <math>\textstyle\varphi</math> is the implicit mapping embedded in the RBF kernel.
 
One way to construct such a ''z'' is to randomly sample from the [[Fourier transformation]] of the kernel.<ref>Ali Rahimi and Benjamin Recht (2007). Random features for large-scale kernel machines. Neural Information Processing Systems.</ref> Another approach uses the [[Nyström method]] to approximate the [[eigendecomposition]] of the [[Gramian matrix|Gram matrix]] ''K'', using only a random sample of the training set.<ref>{{cite journal |authors=Williams, C.K.I. and Seeger, M. |title=Using the Nyström method to speed up kernel machines |journal=Advances in Neural Information Processing Systems |year=2001}}</ref>