Revision as of 09:36, 6 October 2013 edit Qwertyus (talk \| contribs) Extended confirmed users 31,640 edits m Reverted 1 edit by Enerjiparki (talk) to last revision by Qwertyus. (TW) ← Previous edit		Revision as of 10:26, 6 October 2013 edit undo 198.244.48.35 (talk) No edit summary Next edit →
Line 1: In [[machine learning]], the ('''Gaussian''') '''[[radial basis function]] kernel''', or '''RBF kernel''', is a popular [[Positive-definite kernel\|kernel function]]. It is the most popular kernel function used in [[support vector machine]] [[statistical classification\|classification]].<ref name="Chang2010">Yin-Wen Chang, Cho-Jui Hsieh, ~~Kai-Wei Chang, Michael Ringgaard and Chih-Jen Lin (2010). ''Training and testing low-degree polynomial data mappings via linear SVM''. J. Machine Learning Research '''11''':1471–1490.</ref>~~K The RBF kernel on two samples '''x''' and '''x'''', represented as feature vectors in some ''input space'', is defined as<ref name="primer">Vert, Jean-Philippe, Koji Tsuda, and Bernhard Schölkopf (2004). "A primer on kernel methods." Kernel Methods in Computational Biology.</ref> ~~:<math>K(\mathbf{x}, \mathbf{x'}) = \exp\left(-\frac{\|\|\mathbf{x} - \mathbf{x'}\|\|_2^2}{2\sigma^2}\right)</math>~~ <math>\textstyle\|\|\mathbf{x} - \mathbf{x'}\|\|_2^2</math> may be recognized as the [[Euclidean_distance#Squared_Euclidean_distance\|squared Euclidean distance]] between the two feature vectors. <math>\sigma</math> is a free parameter. An equivalent, but simpler, definition involves a parameter <math>\textstyle\gamma = -\tfrac{1}{2\sigma^2}</math>: ~~:<math>K(\mathbf{x}, \mathbf{x'}) = \exp(\gamma\|\|\mathbf{x} - \mathbf{x'}\|\|_2^2)</math>~~ Since the value of the RBF kernel decreases with distance and ranges between zero (in the limit) and one (when '''x''' = '''x''''), it has a ready interpretation as a [[similarity measure]].<ref name="primer"/> ~~The [[feature space]] of the kernel has an infinite number of dimensions; for <math>\sigma = 1</math>, its expansion is:<ref>{{cite arXiv~~ ~~\|last=Shashua~~ ~~\|first=Amnon~~ ~~\|eprint=0904.3664~~ ~~\|title=Introduction to Machine Learning: Class Notes 67577~~ \|class=cs.LG \|year=2009 Line 25 ⟶ 10: ==Approximations== Because support vector machines and other models employing the [[kernel trick]] do not scale well to large numbers of training samples or large numbers of features in the input space, several approximations to the RBF kernel (and similar kernels) have been devised.<ref>Andreas Müller (2012). [http://peekaboo-vision.blogspot.de/2012/12/kernel-~~approximations-for-efficient.html Kernel Approximations for Efficient SVMs (and other feature extraction methods)].</ref>~~ ~~Typically, these take the form of a function ''z'' that maps a single vector to a vector of higher dimensionality, approximating the kernel:~~ ~~:<math>z(\mathbf{x})z(\mathbf{x'}) \approx \varphi(\mathbf{x})\varphi(\mathbf{x'}) = K(\mathbf{x}, \mathbf{x'})</math>~~ ~~where <math>\textstyle\varphi</math> is the implicit mapping embedded in the RBF kernel.~~ One way to construct such a ''z'' is to randomly sample from the [[Fourier transformation]] of the kernel.<ref>Ali Rahimi and Benjamin Recht (2007). Random features for large-scale kernel machines. Neural Information Processing Systems.</ref> Another approach uses the [[Nyström method]] to approximate the [[eigendecomposition]] of the [[Gramian matrix\|Gram matrix]] ''K'', using only a random sample of the training set.<ref>{{cite journal \|authors=Williams, C.K.I. and Seeger, M. \|title=Using the Nyström method to speed up kernel machines \|journal=Advances in Neural Information Processing Systems \|year=2001}}</ref>

Radial basis function kernel: Difference between revisions