Kernel embedding of distributions: Difference between revisions

Content deleted Content added
No edit summary
Line 96:
* Let <math>\mathcal{X}</math> be a compact metric space and <math>C(\mathcal{X})</math> the set of [[Function_space#Functional_analysis|continuous functions]]. The reproducing kernel <math>k:\mathcal{X}\times \mathcal{X} \rightarrow \mathbb{R} </math> is called '''universal''' if and only if the RKHS <math>\mathcal{H}</math> of <math>k</math> is [[Dense set|dense]] in <math>C(\mathcal{X})</math>, i.e., for any <math>g \in C(\mathcal{X})</math> and all <math>\varepsilon > 0</math> there exists an <math> f \in \mathcal{H}</math> such that <math> \| f-g\|_{\infty} \leq \varepsilon</math>.<ref>*{{cite book |last1=Steinwart |first1=Ingo |last2=Christmann |first2=Andreas |title=Support Vector Machines |publisher=Springer |___location=New York |year=2008 |isbn=978-0-387-77241-7 }}</ref> All universal kernels defined on a compact space are characteristic kernels but the converse is not always true.<ref>{{Cite journal|last1=Sriperumbudur|first1= B. K.|last2=Fukumizu|first2=K.|last3=Lanckriet|first3=G.R.G.|title=Universality, Characteristic Kernels and RKHS Embedding of Measures|year= 2011 |journal=Journal of Machine Learning Research|volume=12|number=70}}</ref>
 
* Let <math>k</math> isbe a continuous [[Translational symmetry|translation invariant]] kernel <math>k(x, x') = h(x-x')</math> with <math>x \in \mathbb{R}^{b}</math>. Then [[Bochner's theorem]] guarantees the existence of a unique finite Borel measure <math>\mu</math> (called the [[Spectral_theory_of_ordinary_differential_equations#Spectral measure|spectral measure]]) on <math>\mathbb{R}^{b}</math> such that
::<math>h(t) = \int_{\mathbb{R}^{b}} e^{-i\langle t, \omega \rangle} d\mu(\omega), \quad \forall t \in \mathbb{R}^{b}.</math>
:For <math>k</math> to be universal it suffices that the continuous part of <math>\mu</math> in its unique [[Lebesgue's decomposition theorem|Lebesgue decomposition]] <math>\mu = \mu_c + \mu_s</math> is non-zero. Furthermore, if
::<math>d\mu_c(\omega) = s(\omega)d\omega,</math>
:then <math>s</math> is the [[Spectral density|spectral density]] of frequencies <math>\omega</math> in <math>\mathbb{R}^{b}</math> and <math>h</math> is the [[Fourier transform]] of <math>s</math>. If the [[Support (mathematics)|support]] of <math>\mu</math> is all of <math>\mathbb{R}^{b}</math>, then <math>k</math> is a characteristic kernel as well.<ref>{{Citation |last=Liang|first=Percy| year=2016|title=CS229T/STAT231: Statistical Learning Theory | series = Stanford lecture notes | url=https://web.stanford.edu/class/cs229t/notes.pdf}}</ref><ref>{{cite conference|last1=Sriperumbudur|first1= B. K.|last2=Fukumizu|first2=K.|last3=Lanckriet|first3=G.R.G.|title=On the relation between universality, characteristic kernels and RKHS embedding of measures|url= https://proceedings.mlr.press/v9/sriperumbudur10a.html|year=2010 | conference=Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics|___location=Italy}}
</ref><ref>{{cite journal | last1 = Micchelli|first1=C.A.| last2 = Xu|first2=Y.| last3 = Zhang|first3=H.| title = Universal Kernels | journal = Journal of Machine Learning Research | year = 2006 | volume = 7 | number = 95| pages = 2651--2667| url= http://jmlr.org/papers/v7/micchelli06a.html }}</ref>