Reproducing kernel Hilbert space: Difference between revisions

Content deleted Content added
mNo edit summary
Link suggestions feature: 2 links added.
 
(40 intermediate revisions by 26 users not shown)
Line 2:
[[File:Different Views on RKHS.png|thumb|right|Figure illustrates related but varying approaches to viewing RKHS]]
 
In [[functional analysis]] (a branch of [[mathematics]]), a '''reproducing kernel Hilbert space''' ('''RKHS''') is a [[Hilbert space]] of functions in which point evaluation is a continuous linear [[Functionallinear (mathematics)|functional]]. Roughly speakingSpecifically, thisa meansHilbert that if two functionsspace <math>fH</math> andof functions from a set <math>gX</math> in the RKHS are close in norm, i.e.,(to <math>\|f-g\|mathbb{R}</math> is small, thenor <math>f\mathbb{C}</math>) andis an RKHS if the point-evaluation functional <math>gL_x:H\to\mathbb{C}</math> are also pointwise close, i.e., <math>|fL_x(xf)-g=f(x)|</math>, is smallcontinuous for allevery <math>x\in X</math>. TheEquivalently, converse<math>H</math> doesis notan needRKHS toif bethere true.exists Informally,a thisfunction can<math>K_x be\in shownH</math> bysuch lookingthat, atfor the [[Uniform norm|supremum norm]]: the sequence of functionsall <math>f \sin^nin (x)H</math>,<math convergesdisplay="block">\langle pointwisef, butK_x does not\rangle converge= [[Uniform Convergence|uniformly]] i.ef(x).</math>The doesfunction not<math>K_x</math> convergeis withthen respect tocalled the supremum''reproducing norm.kernel'', (This is not aand counterexampleit becausereproduces the supremumvalue normof does<math>f</math> notat arise<math>x</math> fromvia anythe [[inner product]] due to not satisfying the [[Polarization identity|parallelogram law]].)
 
An immediate consequence of this property is that convergence in norm implies [[uniform convergence]] on any subset of <math>X</math> on which <math>\|K_x\|</math> is bounded. However, the converse does not necessarily hold. Often the set <math>X</math> carries a topology, and <math>\|K_x\|</math> depends continuously on <math>x\in X</math>, in which case: convergence in norm implies uniform convergence on compact subsets of <math>X</math>.
It is not entirely straightforward to construct a Hilbert space of functions which is not an RKHS.<ref>Alpay, D., and T. M. Mills. "A family of Hilbert spaces which are not reproducing kernel Hilbert spaces." J. Anal. Appl. 1.2 (2003): 107–111.</ref> Some examples, however, have been found.<ref> Z. Pasternak-Winiarski, "On weights which admit reproducing kernel of Bergman type", ''International Journal of Mathematics and Mathematical Sciences'', vol. 15, Issue 1, 1992. </ref><ref> T. Ł. Żynda, "On weights which admit reproducing kernel of Szeg¨o type", ''Journal of Contemporary Mathematical Analysis'' (Armenian Academy of Sciences), 55, 2020. </ref>
 
It is not entirely straightforward to construct natural examples of a Hilbert space of functions which isare not an RKHS in a non-trivial fashion.<ref>Alpay, D., and T. M. Mills. "A family of Hilbert spaces which are not reproducing kernel Hilbert spaces." J. Anal. Appl. 1.2 (2003): 107–111.</ref> Some examples, however, have been found.<ref> Z. Pasternak-Winiarski, "On weights which admit reproducing kernel of Bergman type", ''International Journal of Mathematics and Mathematical Sciences'', vol. 15, Issue 1, 1992. </ref><ref> T. Ł. Żynda, "On weights which admit reproducing kernel of Szeg¨oSzegő type", ''Journal of Contemporary Mathematical Analysis'' (Armenian Academy of Sciences), 55, 2020. </ref>
[[Square-integrable function|''L''<sup>2</sup> spaces]] are not Hilbert spaces of functions (and hence not RKHSs), but rather Hilbert spaces of equivalence classes of functions (for example, the functions <math>f</math> and <math>g</math> defined by <math>f(x)=0</math> and <math>g(x)=1_{\mathbb{Q}}</math> are equivalent in ''L''<sup>2</sup>). However, there are RKHSs in which the norm is an ''L''<sup>2</sup>-norm, such as the space of band-limited functions (see the example below).
 
While, formally, [[Square-integrable function|''L''<sup>2</sup> spaces]] are notdefined as Hilbert spaces of equivalence classes of functions, (andthis hencedefinition notcan RKHSs),trivially be extended butto rathera Hilbert spacesspace of equivalencefunctions classesby ofchoosing functionsa (total) function as a representative for exampleeach equivalence class. However, theno functionschoice <math>f</math>of andrepresentatives <math>g</math>can definedmake bythis <math>f(x)=0</math>space andan RKHS (<math>g(x)=1_{\mathbb{Q}}K_0</math> arewould equivalentneed into ''L''<sup>2</sup>be the non-existent Dirac delta function). However, there are RKHSs in which the norm is an ''L''<sup>2</sup>-norm, such as the space of band-limited functions (see the example below).
 
An RKHS is associated with a kernel that reproduces every function in the space in the sense that for every <math>x</math> in the set on which the functions are defined, "evaluation at <math>x</math>" can be performed by taking an inner product with a function determined by the kernel. Such a ''reproducing kernel'' exists if and only if every evaluation functional is continuous.
 
The reproducing kernel was first introduced in the 1907 work of [[Stanisław Zaremba (mathematician)|Stanisław Zaremba]]{{fact|date=June 2025}} concerning [[boundary value problem]]s for [[Harmonic function|harmonic]] and [[Biharmonic equation|biharmonic functions]]. [[James Mercer (mathematician)|James Mercer]] simultaneously examined [[Positive-definite kernel|functions]] which satisfy the reproducing property in the theory of [[integral equation]]s. The idea of the reproducing kernel remained untouched for nearly twenty years until it appeared in the dissertations of [[Gábor Szegő]], [[Stefan Bergman]], and [[Salomon Bochner]]. The subject was eventually systematically developed in the early 1950s by [[Nachman Aronszajn]] and Stefan Bergman.<ref>Okutmustur</ref>
 
These spaces have wide applications, including [[complex analysis]], [[harmonic analysis]], and [[quantum mechanics]]. Reproducing kernel Hilbert spaces are particularly important in the field of [[statistical learning theory]] because of the celebrated [[representer theorem]] which states that every function in an RKHS that minimises an empirical risk functional can be written as a [[linear combination]] of the kernel function evaluated at the training points. This is a practically useful result as it effectively simplifies the [[empirical risk minimization]] problem from an infinite dimensional to a finite dimensional optimization problem.
Line 25 ⟶ 27:
{{NumBlk|:|<math> |L_x(f)| := |f(x)| \le M_x\, \|f\|_H \qquad \forall f \in H. \,</math>|{{EquationRef|1}}}}
 
Although <math>M_x<\infty</math> is assumed for all <math>x \in X</math>, it might still be the case that <math display="inline">\sup_x M_x = \infty</math>.
 
While property ({{EquationNote|1}}) is the weakest condition that ensures both the existence of an inner product and the evaluation of every function in <math>H</math> at every point in the ___domain, it does not lend itself to easy application in practice. A more intuitive definition of the RKHS can be obtained by observing that this property guarantees that the evaluation functional can be represented by taking the inner product of <math> f </math> with a function <math> K_x </math> in <math>H</math>. This function is the so-called '''reproducing kernel'''{{Citation needed|date=September 2022}} for the Hilbert space <math>H</math> from which the RKHS takes its name. More formally, the [[Riesz representation theorem]] implies that for all <math>x</math> in <math>X</math> there exists a unique element <math> K_x </math> of <math>H</math> with the reproducing property,
Line 35 ⟶ 37:
where <math>K_y\in H</math> is the element in <math>H</math> associated to <math>L_y</math>.
 
This allows us to define the reproducing kernel of <math>H</math> as a function <math> K: X \times X \to \mathbb{R} </math> (or <math>\mathbb{C}</math> in the complex case) by
 
:<math> K(x,y) = \langle K_x,\ K_y \rangle_H. </math>
Line 48 ⟶ 50:
for every <math> n \in \mathbb{N}, x_1, \dots, x_n \in X, \text{ and } c_1, \dots, c_n \in \mathbb{R}. </math><ref>Durrett</ref> The Moore–Aronszajn theorem (see below) is a sort of converse to this: if a function <math>K</math> satisfies these conditions then there is a Hilbert space of functions on <math>X</math> for which it is a reproducing kernel.
 
==ExampleExamples==
The simplest example of a reproducing kernel Hilbert space is the space <math>L^2(X,\mu)</math> where <math>X</math> is a set and <math>\mu</math> is the [[counting measure]] on <math>X</math>. For <math>x\in X</math>, the reproducing kernel <math>K_x</math> is the [[indicator function]] of the one point set <math>\{x\}\subset X</math>.
The space of [[Bandlimiting|bandlimited]] [[continuous function]]s <math>H</math> is a RKHS, as we now show. Formally, fix some [[cutoff frequency]] <math> 0<a < \infty </math> and define the Hilbert space
 
TheNontrivial reproducing kernel Hilbert spaces often involve [[analytic function]]s, as we now illustrate by example. Consider the Hilbert space of [[Bandlimiting|bandlimited]] [[continuous function]]s <math>H</math> is a RKHS, as we now show. Formally, fixFix some [[cutoff frequency]] <math> 0<a < \infty </math> and define the Hilbert space
:<math> H = \{ f \in C(\mathbb{R}) \mid \operatorname{supp}(F) \subset [-a,a] \} </math>
 
:<math> H = \{ f \in L^2(\mathbb{R}) \mid \operatorname{supp}(F) \subset [-a,a] \} </math>

where <math>CL^2(\mathbb{R})</math> is the set of continuous square integrable functions, and <math display="inline"> F(\omega) = \int_{-\infty}^\infty f(t) e^{-i\omega t} \, dt </math> is the [[Fourier transform]] of <math> f</math>. As the inner product of this Hilbert space, we use
 
: <math>\langle f, g\rangle_{L^2} = \int_{-\infty}^\infty f(x) \cdot \overline{g(x)} \, dx.</math>
 
Since this is a closed subspace of <math>L^2(\mathbb R)</math>, it is a Hilbert space. Moreover, the elements of <math>H</math> are smooth functions on <math>\mathbb R</math> that tend to zero at infinity, essentially by the [[Riemann-Lebesgue lemma]]. In fact, the elements of <math>H</math> are the restrictions to <math>\mathbb R</math> of entire [[holomorphic function]]s, by the [[Paley–Wiener theorem]].
 
From the [[Fourier inversion theorem]], we have
Line 64 ⟶ 70:
 
:<math> |f(x)| \le
\frac{1}{2 \pi} \sqrt{ 2a\int_{-a}^a 2a |F(\omega)|^2 \, d\omega}
=\frac{1 \sqrt{2a} }{2\pi}\sqrt{\frac{a}{2}\int_{-\infty}^\infty |F(\omega)|^2 \, d\omega}
= \sqrt{\frac{a}{\pi}} \|f\|_{L^2}. </math>
 
Line 97 ⟶ 103:
:'''Theorem'''. Suppose ''K'' is a symmetric, [[positive definite kernel]] on a set ''X''. Then there is a unique Hilbert space of functions on ''X'' for which ''K'' is a reproducing kernel.
 
'''Proof'''. For all ''x'' in ''X'', define ''K<sub>x</sub>'' = ''K''(''x'', ⋅ ). Let ''H''<sub>0</sub> be the [[linear span]] of {''K<sub>x</sub>'' : ''x'' ∈ ''X''}. Define an inner product on ''H''<sub>0</sub> by
 
:<math> \left\langle \sum_{j=1}^n b_j K_{y_j}, \sum_{i=1}^m a_i K_{x_i} \right \rangle_{H_0} = \sum_{i=1}^m \sum_{j=1}^n {a_i} b_j K(y_j, x_i),</math>
Line 128 ⟶ 134:
We may characterize a symmetric positive definite kernel <math>K</math> via the integral operator using [[Mercer's theorem]] and obtain an additional view of the RKHS. Let <math>X</math> be a compact space equipped with a strictly positive finite [[Borel measure]] <math>\mu</math> and <math>K: X \times X \to \R</math> a continuous, symmetric, and positive definite function. Define the integral operator <math>T_K: L_2(X) \to L_2(X)</math> as
 
:<math> [T_K f](\cdot) =\int_X K({}\cdot{},t) f(t)\, d\mu(t) </math>
 
where <math>L_2(X)</math> is the space of square integrable functions with respect to <math> \mu </math>.
Line 134 ⟶ 140:
Mercer's theorem states that the spectral decomposition of the integral operator <math>T_K</math> of <math>K</math> yields a series representation of <math>K</math> in terms of the eigenvalues and eigenfunctions of <math> T_K </math>. This then implies that <math>K</math> is a reproducing kernel so that the corresponding RKHS can be defined in terms of these eigenvalues and eigenfunctions. We provide the details below.
 
Under these assumptions <math>T_K</math> is a compact, continuous, self-adjoint, and positive operator. The [[spectral theorem]] for self-adjoint operators implies that there is an at most countable decreasing sequence <math>(\sigma_i)_i_{i \geq 0} </math> such that <math display="inline">\lim_{i \to \infty}\sigma_i = 0</math> and
<math>T_K\varphi_i(x) = \sigma_i\varphi_i(x)</math>, where the <math>\{\varphi_i\}</math> form an orthonormal basis of <math>L_2(X)</math>. By the positivity of <math>T_K, \sigma_i > 0</math> for all <math>i.</math> One can also show that <math>T_K </math> maps continuously into the space of continuous functions <math>C(X)</math> and therefore we may choose continuous functions as the eigenvectors, that is, <math>\varphi_i \in C(X)</math> for all <math>i.</math> Then by Mercer's theorem <math> K </math> may be written in terms of the eigenvalues and continuous eigenfunctions as
 
Line 153 ⟶ 159:
:<math> \left\langle f,g \right\rangle_H = \sum_{i=1}^\infty \frac{\left\langle f,\varphi_i \right\rangle_{L_2}\left\langle g,\varphi_i \right\rangle_{L_2}}{\sigma_i}. </math>
 
This representation of the RKHS has application in probability and statistics, for example to the [[Karhunen–Loève theorem|Karhunen-LoèveKarhunen–Loève representation]] for stochastic processes and [[kernel PCA]].
 
==Feature maps==
Line 180 ⟶ 186:
==Properties==
 
The followingUseful properties of RKHSs may be useful to readers.:
 
* Let <math>(X_i)_{i=1}^p</math> be a sequence of sets and <math>(K_i)_{i=1}^p</math> be a collection of corresponding positive definite functions on <math> (X_i)_{i=1}^p.</math> It then follows that
Line 210 ⟶ 216:
* '''Laplacian kernel''':
*::<math> K(x,y) = e^{-\frac{\|x - y\|}{\sigma}}, \qquad \sigma > 0 </math>
*:The squared norm of a function <math>f</math> in the RKHS <math>H</math> with this kernel is:<ref>Berlinet, Alain and Thomas, Christine. ''[https://books.google.com/books?id=bX3TBwAAQBAJ&dq=%22Reproducing+kernel+Hilbert+spaces+in+Probability+and+Statistics%22&pg=PP11 Reproducing kernel Hilbert spaces in Probability and Statistics]'', Kluwer Academic Publishers, 2004</ref><ref>Thomas-Agnan C . Computing a family of reproducing kernels for statistical applications. Numerical Algorithms, 13, pp. 21-32 (1996)</ref>
*::<math>\|f\|_H^2=\int_{\mathbb R}\Big( \frac1{\sigma} f(x)^2 + \sigma f'(x)^2\Big) \mathrm d x.</math>
 
Line 221 ⟶ 227:
In this case, ''H'' is isomorphic to <math>\Complex^n</math>.
 
The case of <math>X= \mathbb{D}</math> (where <math>\mathbb{D}</math> denotes the [[unit disc]]) is more sophisticated. Here the [[Bergman space]] [[HA square|<math>HA^2(\mathbb{D})</math>]] is the space of [[square-integrable function|square-integrable]] [[holomorphic function]]s on <math>\mathbb{D}</math>. It can be shown that the reproducing kernel for <math>HA^2(\mathbb{D})</math> is
 
:<math>K(x,y)=\frac{1}{\pi}\frac{1}{(1-x\overline{y})^2}.</math>
Line 267 ⟶ 273:
We lastly remark that the above theory can be further extended to spaces of functions with values in function spaces but obtaining kernels for these spaces is a more difficult task.<ref>Rosasco</ref>
 
== Connection between RKHSRKHSs withand the ReLU function ==
The [[Rectifier (neural networks)|ReLU function]] is commonly defined as <math>f(x)=\max \{0, x\}</math> and is a mainstay in the architecture of neural networks where it is used as an activation function. One can construct a ReLU-like nonlinear function using the theory of reproducing kernel Hilbert spaces. Below, we derive this construction and show how it implies the representation power of neural networks with ReLU activations.
 
Line 344 ⟶ 350:
|mr=290013
|doi-access=free}}
*Okutmustur, Baver. “Reproducing Kernel Hilbert Spaces,” M.S. dissertation, Bilkent University, httphttps://wwwusers.thesis.bilkentmetu.edu.tr/0002953baver/MS.Thesis.pdf, August 2005.
*Paulsen, Vern. “An introduction to the theory of reproducing kernel Hilbert spaces,” httphttps://wwwciteseerx.mathist.uhpsu.edu/~vern/rkhs.document?repid=rep1&type=pdf&doi=440218056738e05b5ab43679f932a9f33fccee87.
* {{cite journal
|first1=Ingo |last1=Steinwart
Line 353 ⟶ 359:
|volume=35 |issue=3|year=2012 |pages=363–417
|mr=2914365 |doi=10.1007/s00365-012-9153-3
|s2cid=253885172
}}
* Rosasco, Lorenzo and Poggio, Thomas. "A Regularization Tour of Machine Learning – MIT 9.520 Lecture Notes" Manuscript, Dec. 2014.
* [[Grace Wahba|Wahba, Grace]], ''Spline Models for Observational Data'', [http://www.siam.org/books/ SIAM], 1990.