Reproducing kernel Hilbert space: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 00:07, 20 November 2023 edit 2603:8000:ac00:b47d:5da8:307b:44e7:7957 (talk) No edit summary ← Previous edit		Latest revision as of 07:22, 17 August 2025 edit undo DrinksOrCoffee (talk \| contribs) Extended confirmed users 1,466 edits Link suggestions feature: 2 links added. Tags: Visual edit Mobile edit Mobile web edit Advanced mobile edit Newcomer task Suggested: add links
(36 intermediate revisions by 24 users not shown)
Line 2: [[File:Different Views on RKHS.png\|thumb\|right\|Figure illustrates related but varying approaches to viewing RKHS]] In [[functional analysis]] ~~(a branch of [[mathematics]])~~, a '''reproducing kernel Hilbert space''' ('''RKHS''') is a [[Hilbert space]] of functions in which point evaluation is a continuous ~~linear~~ [[~~Functional~~linear ~~(mathematics)\|~~functional]]. ~~Roughly speaking~~Specifically, ~~this~~a ~~means~~Hilbert ~~that if two functions~~space <math>fH</math> ~~and~~of functions from a set <math>gX</math> ~~in the RKHS are close in norm, i.e.,~~(to <math>\~~\|f-g\\|~~mathbb{R}</math> ~~is small, then~~or <math>f\mathbb{C}</math>) ~~and~~is an RKHS if the point-evaluation functional <math>gL_x:H\to\mathbb{C}</math> ~~are also pointwise close, i.e.~~, <math>\|fL_x(xf)-g=f(x)\|</math>, is ~~small~~continuous for ~~all~~every <math>x\in X</math>. ~~The~~Equivalently, ~~converse~~<math>H</math> ~~does~~is ~~not~~an ~~need~~RKHS toif bethere ~~true.~~exists ~~Informally,~~a ~~this~~function ~~can~~<math>K_x be\in ~~shown~~H</math> bysuch ~~looking~~that, atfor ~~the [[Uniform norm\|supremum norm]]: the sequence of functions~~all <math>f \~~sin^n~~in ~~(x)~~H</math>,<math ~~converges~~display="block">\langle ~~pointwise~~f, ~~but~~K_x ~~does not~~\rangle ~~converge~~= ~~[[Uniform Convergence\|uniformly]] i.e~~f(x).</math>The ~~does~~function ~~not~~<math>K_x</math> ~~converge~~is ~~with~~then ~~respect to~~called the ~~supremum~~''reproducing ~~norm.~~kernel'', ~~(This is not a~~and ~~counterexample~~it ~~because~~reproduces the ~~supremum~~value ~~norm~~of ~~does~~<math>f</math> ~~not~~at ~~arise~~<math>x</math> ~~from~~via ~~any~~the [[inner product~~]] due to not satisfying the [[Polarization identity\|parallelogram law]]~~.) An immediate consequence of this property is that convergence in norm implies [[uniform convergence]] on any subset of <math>X</math> on which <math>\\|K_x\\|</math> is bounded. However, the converse does not necessarily hold. Often the set <math>X</math> carries a topology, and <math>\\|K_x\\|</math> depends continuously on <math>x\in X</math>, in which case: convergence in norm implies uniform convergence on compact subsets of <math>X</math>. By the above definition, an example of a Hilbert space of functions that is not a RKHS are the functions in <math>C^\infty[a,b]</math> with a uniform bound <math>\|f\| \leq M</math>, and with the usual inner product. This since a sufficiently narrow bump function <math>f</math> peaking at <math>x</math> can have arbitrarily small <math>\int \|f\|^2</math> while <math>f(x)</math> remains bounded below. See <ref> Z. Pasternak-Winiarski, "On weights which admit reproducing kernel of Bergman type", ''International Journal of Mathematics and Mathematical Sciences'', vol. 15, Issue 1, 1992. </ref> <ref> T. Ł. Żynda, "On weights which admit reproducing kernel of Szeg¨o type", ''Journal of Contemporary Mathematical Analysis'' (Armenian Academy of Sciences), 55, 2020. </ref><ref>Alpay, D., and T. M. Mills. "A family of Hilbert spaces which are not reproducing kernel Hilbert spaces." J. Anal. Appl. 1.2 (2003): 107–111.</ref>▼ ▲ByIt ~~the~~is ~~above~~not ~~definition,~~entirely anstraightforward to ~~example~~construct natural examples of a Hilbert space of ~~functions~~which ~~that is~~are not aan RKHS ~~are the functions~~ in ~~<math>C^\infty[~~a~~,b]</math>~~ ~~with a uniform bound~~non-trivial fashion.<~~math~~ref>~~\|f\|~~Alpay, ~~\leq M</math>~~D., and ~~with~~T. ~~the~~M. ~~usual inner product~~Mills. ~~This~~"A ~~since~~family aof ~~sufficiently~~Hilbert ~~narrow~~spaces ~~bump~~which ~~function~~are ~~<math>f</math>~~not ~~peaking~~reproducing atkernel ~~<math>x</math>~~Hilbert ~~can~~spaces." ~~have~~J. ~~arbitrarily~~Anal. ~~small~~Appl. ~~<math>\int \|f\|^~~1.2~~</math> while~~ ~~<math>f~~(x2003): 107–111.</~~math~~ref> ~~remains~~Some ~~bounded~~examples, ~~below.~~however, ~~See~~have been found.<ref> Z. Pasternak-Winiarski, "On weights which admit reproducing kernel of Bergman type", ''International Journal of Mathematics and Mathematical Sciences'', vol. 15, Issue 1, 1992. </ref> <ref> T. Ł. Żynda, "On weights which admit reproducing kernel of ~~Szeg¨o~~Szegő type", ''Journal of Contemporary Mathematical Analysis'' (Armenian Academy of Sciences), 55, 2020. ~~</ref><ref>Alpay, D., and T. M. Mills. "A family of Hilbert spaces which are not reproducing kernel Hilbert spaces." J. Anal. Appl. 1.2 (2003): 107–111.~~</ref> [[Square-integrable function\|''L''<sup>2</sup> spaces]] are not Hilbert spaces of functions (and hence not RKHSs), but rather Hilbert spaces of equivalence classes of functions (for example, the functions <math>f</math> and <math>g</math> defined by <math>f(x)=0</math> and <math>g(x)=1_{\mathbb{Q}}</math> are equivalent in ''L''<sup>2</sup>). However, there are RKHSs in which the norm is an ''L''<sup>2</sup>-norm, such as the space of band-limited functions (see the example below).▼ ▲While, formally, [[Square-integrable function\|''L''<sup>2</sup> spaces]] are ~~not~~defined as Hilbert spaces of equivalence classes of functions, ~~(and~~this ~~hence~~definition ~~not~~can ~~RKHSs),~~trivially be extended ~~but~~to ~~rather~~a Hilbert ~~spaces~~space of ~~equivalence~~functions ~~classes~~by ofchoosing ~~functions~~a (total) function as a representative for ~~example~~each equivalence class. However, ~~the~~no ~~functions~~choice ~~<math>f</math>~~of ~~and~~representatives ~~<math>g</math>~~can ~~defined~~make bythis ~~<math>f(x)=0</math>~~space ~~and~~an RKHS (<math>~~g(x)=1_{\mathbb{Q}}~~K_0</math> ~~are~~would ~~equivalent~~need into ~~''L''<sup>2</sup>~~be the non-existent Dirac delta function). However, there are RKHSs in which the norm is an ''L''<sup>2</sup>-norm, such as the space of band-limited functions (see the example below). An RKHS is associated with a kernel that reproduces every function in the space in the sense that for every <math>x</math> in the set on which the functions are defined, "evaluation at <math>x</math>" can be performed by taking an inner product with a function determined by the kernel. Such a ''reproducing kernel'' exists if and only if every evaluation functional is continuous. The reproducing kernel was first introduced in the 1907 work of [[Stanisław Zaremba (mathematician)\|Stanisław Zaremba]]{{fact\|date=June 2025}} concerning [[boundary value problem]]s for [[Harmonic function\|harmonic]] and [[Biharmonic equation\|biharmonic functions]]. [[James Mercer (mathematician)\|James Mercer]] simultaneously examined [[Positive-definite kernel\|functions]] which satisfy the reproducing property in the theory of [[integral equation]]s. The idea of the reproducing kernel remained untouched for nearly twenty years until it appeared in the dissertations of [[Gábor Szegő]], [[Stefan Bergman]], and [[Salomon Bochner]]. The subject was eventually systematically developed in the early 1950s by [[Nachman Aronszajn]] and Stefan Bergman.<ref>Okutmustur</ref> These spaces have wide applications, including [[complex analysis]], [[harmonic analysis]], and [[quantum mechanics]]. Reproducing kernel Hilbert spaces are particularly important in the field of [[statistical learning theory]] because of the celebrated [[representer theorem]] which states that every function in an RKHS that minimises an empirical risk functional can be written as a [[linear combination]] of the kernel function evaluated at the training points. This is a practically useful result as it effectively simplifies the [[empirical risk minimization]] problem from an infinite dimensional to a finite dimensional optimization problem. Line 25 ⟶ 27: {{NumBlk\|:\|<math> \|L_x(f)\| := \|f(x)\| \le M_x\, \\|f\\|_H \qquad \forall f \in H. \,</math>\|{{EquationRef\|1}}}} Although <math>M_x<\infty</math> is assumed for all <math>x \in X</math>, it might still be the case that <math display="inline">\sup_x M_x = \infty</math>. While property ({{EquationNote\|1}}) is the weakest condition that ensures both the existence of an inner product and the evaluation of every function in <math>H</math> at every point in the ___domain, it does not lend itself to easy application in practice. A more intuitive definition of the RKHS can be obtained by observing that this property guarantees that the evaluation functional can be represented by taking the inner product of <math> f </math> with a function <math> K_x </math> in <math>H</math>. This function is the so-called '''reproducing kernel'''{{Citation needed\|date=September 2022}} for the Hilbert space <math>H</math> from which the RKHS takes its name. More formally, the [[Riesz representation theorem]] implies that for all <math>x</math> in <math>X</math> there exists a unique element <math> K_x </math> of <math>H</math> with the reproducing property, Line 35 ⟶ 37: where <math>K_y\in H</math> is the element in <math>H</math> associated to <math>L_y</math>. This allows us to define the reproducing kernel of <math>H</math> as a function <math> K: X \times X \to \mathbb{R} </math> (or <math>\mathbb{C}</math> in the complex case) by :<math> K(x,y) = \langle K_x,\ K_y \rangle_H. </math> Line 48 ⟶ 50: for every <math> n \in \mathbb{N}, x_1, \dots, x_n \in X, \text{ and } c_1, \dots, c_n \in \mathbb{R}. </math><ref>Durrett</ref> The Moore–Aronszajn theorem (see below) is a sort of converse to this: if a function <math>K</math> satisfies these conditions then there is a Hilbert space of functions on <math>X</math> for which it is a reproducing kernel. ==~~Example~~Examples== The simplest example of a reproducing kernel Hilbert space is the space <math>L^2(X,\mu)</math> where <math>X</math> is a set and <math>\mu</math> is the [[counting measure]] on <math>X</math>. For <math>x\in X</math>, the reproducing kernel <math>K_x</math> is the [[indicator function]] of the one point set <math>\{x\}\subset X</math>. The space of [[Bandlimiting\|bandlimited]] [[continuous function]]s <math>H</math> is a RKHS, as we now show. Formally, fix some [[cutoff frequency]] <math> 0<a < \infty </math> and define the Hilbert space▼ ▲~~The~~Nontrivial reproducing kernel Hilbert spaces often involve [[analytic function]]s, as we now illustrate by example. Consider the Hilbert space of [[Bandlimiting\|bandlimited]] [[continuous function]]s <math>H</math> ~~is a RKHS, as we now show~~. ~~Formally, fix~~Fix some [[cutoff frequency]] <math> 0<a < \infty </math> and define the Hilbert space ~~:<math> H = \{ f \in C(\mathbb{R}) \mid \operatorname{supp}(F) \subset [-a,a] \} </math>~~ :<math> H = \{ f \in L^2(\mathbb{R}) \mid \operatorname{supp}(F) \subset [-a,a] \} </math> where <math>CL^2(\mathbb{R})</math> is the set of ~~continuous~~ square integrable functions, and <math display="inline"> F(\omega) = \int_{-\infty}^\infty f(t) e^{-i\omega t} \, dt </math> is the [[Fourier transform]] of <math> f</math>. As the inner product ~~of this Hilbert space~~, we use : <math>\langle f, g\rangle_{L^2} = \int_{-\infty}^\infty f(x) \cdot \overline{g(x)} \, dx.</math> Since this is a closed subspace of <math>L^2(\mathbb R)</math>, it is a Hilbert space. Moreover, the elements of <math>H</math> are smooth functions on <math>\mathbb R</math> that tend to zero at infinity, essentially by the [[Riemann-Lebesgue lemma]]. In fact, the elements of <math>H</math> are the restrictions to <math>\mathbb R</math> of entire [[holomorphic function]]s, by the [[Paley–Wiener theorem]]. From the [[Fourier inversion theorem]], we have Line 64 ⟶ 70: :<math> \|f(x)\| \le \frac{1}{2 \pi} \sqrt{ 2a\int_{-a}^a 2a \|F(\omega)\|^2 \, d\omega} =\frac{1 \sqrt{2a} }{2\pi}\sqrt{~~\frac{a}{2}~~\int_{-\infty}^\infty \|F(\omega)\|^2 \, d\omega} = \sqrt{\frac{a}{\pi}} \\|f\\|_{L^2}. </math> Line 97 ⟶ 103: :'''Theorem'''. Suppose ''K'' is a symmetric, [[positive definite kernel]] on a set ''X''. Then there is a unique Hilbert space of functions on ''X'' for which ''K'' is a reproducing kernel. '''Proof'''. For all ''x'' in ''X'', define ''K<sub>x</sub>'' = ''K''(''x'', ⋅ ). Let ''H''<sub>0</sub> be the [[linear span]] of {''K<sub>x</sub>'' : ''x'' ∈ ''X''}. Define an inner product on ''H''<sub>0</sub> by :<math> \left\langle \sum_{j=1}^n b_j K_{y_j}, \sum_{i=1}^m a_i K_{x_i} \right \rangle_{H_0} = \sum_{i=1}^m \sum_{j=1}^n {a_i} b_j K(y_j, x_i),</math> Line 128 ⟶ 134: We may characterize a symmetric positive definite kernel <math>K</math> via the integral operator using [[Mercer's theorem]] and obtain an additional view of the RKHS. Let <math>X</math> be a compact space equipped with a strictly positive finite [[Borel measure]] <math>\mu</math> and <math>K: X \times X \to \R</math> a continuous, symmetric, and positive definite function. Define the integral operator <math>T_K: L_2(X) \to L_2(X)</math> as :<math> [T_K f](\cdot) =\int_X K({}\cdot{},t) f(t)\, d\mu(t) </math> where <math>L_2(X)</math> is the space of square integrable functions with respect to <math> \mu </math>. Line 134 ⟶ 140: Mercer's theorem states that the spectral decomposition of the integral operator <math>T_K</math> of <math>K</math> yields a series representation of <math>K</math> in terms of the eigenvalues and eigenfunctions of <math> T_K </math>. This then implies that <math>K</math> is a reproducing kernel so that the corresponding RKHS can be defined in terms of these eigenvalues and eigenfunctions. We provide the details below. Under these assumptions <math>T_K</math> is a compact, continuous, self-adjoint, and positive operator. The [[spectral theorem]] for self-adjoint operators implies that there is an at most countable decreasing sequence <math>(\sigma_i)_i_{i \geq 0} </math> such that <math display="inline">\lim_{i \to \infty}\sigma_i = 0</math> and <math>T_K\varphi_i(x) = \sigma_i\varphi_i(x)</math>, where the <math>\{\varphi_i\}</math> form an orthonormal basis of <math>L_2(X)</math>. By the positivity of <math>T_K, \sigma_i > 0</math> for all <math>i.</math> One can also show that <math>T_K </math> maps continuously into the space of continuous functions <math>C(X)</math> and therefore we may choose continuous functions as the eigenvectors, that is, <math>\varphi_i \in C(X)</math> for all <math>i.</math> Then by Mercer's theorem <math> K </math> may be written in terms of the eigenvalues and continuous eigenfunctions as Line 153 ⟶ 159: :<math> \left\langle f,g \right\rangle_H = \sum_{i=1}^\infty \frac{\left\langle f,\varphi_i \right\rangle_{L_2}\left\langle g,\varphi_i \right\rangle_{L_2}}{\sigma_i}. </math> This representation of the RKHS has application in probability and statistics, for example to the [[Karhunen–Loève theorem\|~~Karhunen-Loève~~Karhunen–Loève representation]] for stochastic processes and [[kernel PCA]]. ==Feature maps== Line 180 ⟶ 186: ==Properties== ~~The following~~Useful properties of RKHSs ~~may be useful to readers.~~: * Let <math>(X_i)_{i=1}^p</math> be a sequence of sets and <math>(K_i)_{i=1}^p</math> be a collection of corresponding positive definite functions on <math> (X_i)_{i=1}^p.</math> It then follows that Line 210 ⟶ 216: * '''Laplacian kernel''': ::<math> K(x,y) = e^{-\frac{\\|x - y\\|}{\sigma}}, \qquad \sigma > 0 </math> :The squared norm of a function <math>f</math> in the RKHS <math>H</math> with this kernel is:<ref>Berlinet, Alain and Thomas, Christine. ''[https://books.google.com/books?id=bX3TBwAAQBAJ&dq=%22Reproducing+kernel+Hilbert+spaces+in+Probability+and+Statistics%22&pg=PP11 Reproducing kernel Hilbert spaces in Probability and Statistics]'', Kluwer Academic Publishers, 2004</ref><ref>Thomas-Agnan C . Computing a family of reproducing kernels for statistical applications. Numerical Algorithms, 13, pp. 21-32 (1996)</ref> ::<math>\\|f\\|_H^2=\int_{\mathbb R}\Big( \frac1{\sigma} f(x)^2 + \sigma f'(x)^2\Big) \mathrm d x.</math> Line 221 ⟶ 227: In this case, ''H'' is isomorphic to <math>\Complex^n</math>. The case of <math>X= \mathbb{D}</math> (where <math>\mathbb{D}</math> denotes the [[unit disc]]) is more sophisticated. Here the [[Bergman space]] [[HA square\|<math>HA^2(\mathbb{D})</math>]] is the space of [[square-integrable function\|square-integrable]] [[holomorphic function]]s on <math>\mathbb{D}</math>. It can be shown that the reproducing kernel for <math>HA^2(\mathbb{D})</math> is :<math>K(x,y)=\frac{1}{\pi}\frac{1}{(1-x\overline{y})^2}.</math> Line 344 ⟶ 350: \|mr=290013 \|doi-access=free}} Okutmustur, Baver. “Reproducing Kernel Hilbert Spaces,” M.S. dissertation, Bilkent University, ~~http~~https://~~www~~users.~~thesis.bilkent~~metu.edu.tr/~~0002953~~baver/MS.Thesis.pdf, August 2005. Paulsen, Vern. “An introduction to the theory of reproducing kernel Hilbert spaces,” ~~http~~https://~~www~~citeseerx.~~math~~ist.uhpsu.edu/~~~vern/rkhs.~~document?repid=rep1&type=pdf&doi=440218056738e05b5ab43679f932a9f33fccee87. {{cite journal \|first1=Ingo \|last1=Steinwart Line 353 ⟶ 359: \|volume=35 \|issue=3\|year=2012 \|pages=363–417 \|mr=2914365 \|doi=10.1007/s00365-012-9153-3 \|s2cid=253885172 }} * Rosasco, Lorenzo and Poggio, Thomas. "A Regularization Tour of Machine Learning – MIT 9.520 Lecture Notes" Manuscript, Dec. 2014. * [[Grace Wahba\|Wahba, Grace]], ''Spline Models for Observational Data'', [http://www.siam.org/books/ SIAM], 1990.