Content deleted Content added
No edit summary |
Link suggestions feature: 2 links added. Tags: Visual edit Mobile edit Mobile web edit Advanced mobile edit Newcomer task Suggested: add links |
||
(36 intermediate revisions by 24 users not shown) | |||
Line 2:
[[File:Different Views on RKHS.png|thumb|right|Figure illustrates related but varying approaches to viewing RKHS]]
In [[functional analysis]]
An immediate consequence of this property is that convergence in norm implies [[uniform convergence]] on any subset of <math>X</math> on which <math>\|K_x\|</math> is bounded. However, the converse does not necessarily hold. Often the set <math>X</math> carries a topology, and <math>\|K_x\|</math> depends continuously on <math>x\in X</math>, in which case: convergence in norm implies uniform convergence on compact subsets of <math>X</math>.
By the above definition, an example of a Hilbert space of functions that is not a RKHS are the functions in <math>C^\infty[a,b]</math> with a uniform bound <math>|f| \leq M</math>, and with the usual inner product. This since a sufficiently narrow bump function <math>f</math> peaking at <math>x</math> can have arbitrarily small <math>\int |f|^2</math> while <math>f(x)</math> remains bounded below. See <ref> Z. Pasternak-Winiarski, "On weights which admit reproducing kernel of Bergman type", ''International Journal of Mathematics and Mathematical Sciences'', vol. 15, Issue 1, 1992. </ref> <ref> T. Ł. Żynda, "On weights which admit reproducing kernel of Szeg¨o type", ''Journal of Contemporary Mathematical Analysis'' (Armenian Academy of Sciences), 55, 2020. </ref><ref>Alpay, D., and T. M. Mills. "A family of Hilbert spaces which are not reproducing kernel Hilbert spaces." J. Anal. Appl. 1.2 (2003): 107–111.</ref>▼
▲
[[Square-integrable function|''L''<sup>2</sup> spaces]] are not Hilbert spaces of functions (and hence not RKHSs), but rather Hilbert spaces of equivalence classes of functions (for example, the functions <math>f</math> and <math>g</math> defined by <math>f(x)=0</math> and <math>g(x)=1_{\mathbb{Q}}</math> are equivalent in ''L''<sup>2</sup>). However, there are RKHSs in which the norm is an ''L''<sup>2</sup>-norm, such as the space of band-limited functions (see the example below).▼
▲While, formally, [[Square-integrable function|''L''<sup>2</sup> spaces]] are
An RKHS is associated with a kernel that reproduces every function in the space in the sense that for every <math>x</math> in the set on which the functions are defined, "evaluation at <math>x</math>" can be performed by taking an inner product with a function determined by the kernel. Such a ''reproducing kernel'' exists if and only if every evaluation functional is continuous.
The reproducing kernel was first introduced in the 1907 work of [[Stanisław Zaremba (mathematician)|Stanisław Zaremba]]{{fact|date=June 2025}} concerning [[boundary value problem]]s for [[Harmonic function|harmonic]] and [[Biharmonic equation|biharmonic functions]]. [[James Mercer (mathematician)|James Mercer]] simultaneously examined [[Positive-definite kernel|functions]] which satisfy the reproducing property in the theory of [[integral equation]]s. The idea of the reproducing kernel remained untouched for nearly twenty years until it appeared in the dissertations of [[Gábor Szegő]], [[Stefan Bergman]], and [[Salomon Bochner]]. The subject was eventually systematically developed in the early 1950s by [[Nachman Aronszajn]] and Stefan Bergman.<ref>Okutmustur</ref>
These spaces have wide applications, including [[complex analysis]], [[harmonic analysis]], and [[quantum mechanics]]. Reproducing kernel Hilbert spaces are particularly important in the field of [[statistical learning theory]] because of the celebrated [[representer theorem]] which states that every function in an RKHS that minimises an empirical risk functional can be written as a [[linear combination]] of the kernel function evaluated at the training points. This is a practically useful result as it effectively simplifies the [[empirical risk minimization]] problem from an infinite dimensional to a finite dimensional optimization problem.
Line 25 ⟶ 27:
{{NumBlk|:|<math> |L_x(f)| := |f(x)| \le M_x\, \|f\|_H \qquad \forall f \in H. \,</math>|{{EquationRef|1}}}}
Although <math>M_x<\infty</math> is assumed for all <math>x \in X</math>, it might still be the case that <math display="inline">\sup_x M_x = \infty</math>.
While property ({{EquationNote|1}}) is the weakest condition that ensures both the existence of an inner product and the evaluation of every function in <math>H</math> at every point in the ___domain, it does not lend itself to easy application in practice. A more intuitive definition of the RKHS can be obtained by observing that this property guarantees that the evaluation functional can be represented by taking the inner product of <math> f </math> with a function <math> K_x </math> in <math>H</math>. This function is the so-called '''reproducing kernel'''{{Citation needed|date=September 2022}} for the Hilbert space <math>H</math> from which the RKHS takes its name. More formally, the [[Riesz representation theorem]] implies that for all <math>x</math> in <math>X</math> there exists a unique element <math> K_x </math> of <math>H</math> with the reproducing property,
Line 35 ⟶ 37:
where <math>K_y\in H</math> is the element in <math>H</math> associated to <math>L_y</math>.
This allows us to define the reproducing kernel of <math>H</math> as a function <math> K: X \times X \to \mathbb{R} </math> (or <math>\mathbb{C}</math> in the complex case) by
:<math> K(x,y) = \langle K_x,\ K_y \rangle_H. </math>
Line 48 ⟶ 50:
for every <math> n \in \mathbb{N}, x_1, \dots, x_n \in X, \text{ and } c_1, \dots, c_n \in \mathbb{R}. </math><ref>Durrett</ref> The Moore–Aronszajn theorem (see below) is a sort of converse to this: if a function <math>K</math> satisfies these conditions then there is a Hilbert space of functions on <math>X</math> for which it is a reproducing kernel.
==
The simplest example of a reproducing kernel Hilbert space is the space <math>L^2(X,\mu)</math> where <math>X</math> is a set and <math>\mu</math> is the [[counting measure]] on <math>X</math>. For <math>x\in X</math>, the reproducing kernel <math>K_x</math> is the [[indicator function]] of the one point set <math>\{x\}\subset X</math>.
The space of [[Bandlimiting|bandlimited]] [[continuous function]]s <math>H</math> is a RKHS, as we now show. Formally, fix some [[cutoff frequency]] <math> 0<a < \infty </math> and define the Hilbert space▼
▲
:<math> H = \{ f \in L^2(\mathbb{R}) \mid \operatorname{supp}(F) \subset [-a,a] \} </math>
where <math> : <math>\langle f, g\rangle_{L^2} = \int_{-\infty}^\infty f(x) \cdot \overline{g(x)} \, dx.</math>
Since this is a closed subspace of <math>L^2(\mathbb R)</math>, it is a Hilbert space. Moreover, the elements of <math>H</math> are smooth functions on <math>\mathbb R</math> that tend to zero at infinity, essentially by the [[Riemann-Lebesgue lemma]]. In fact, the elements of <math>H</math> are the restrictions to <math>\mathbb R</math> of entire [[holomorphic function]]s, by the [[Paley–Wiener theorem]].
From the [[Fourier inversion theorem]], we have
Line 64 ⟶ 70:
:<math> |f(x)| \le
\frac{1}{2 \pi} \sqrt{ 2a\int_{-a}^a
=\frac{
= \sqrt{\frac{a}{\pi}} \|f\|_{L^2}. </math>
Line 97 ⟶ 103:
:'''Theorem'''. Suppose ''K'' is a symmetric, [[positive definite kernel]] on a set ''X''. Then there is a unique Hilbert space of functions on ''X'' for which ''K'' is a reproducing kernel.
'''Proof'''. For all ''x'' in ''X'', define ''K<sub>x</sub>'' = ''K''(''x'', ⋅ ). Let ''H''<sub>0</sub> be the [[linear span]] of {''K<sub>x</sub>'' : ''x'' ∈ ''X''}. Define an inner product on ''H''<sub>0</sub> by
:<math> \left\langle \sum_{j=1}^n b_j K_{y_j}, \sum_{i=1}^m a_i K_{x_i} \right \rangle_{H_0} = \sum_{i=1}^m \sum_{j=1}^n {a_i} b_j K(y_j, x_i),</math>
Line 128 ⟶ 134:
We may characterize a symmetric positive definite kernel <math>K</math> via the integral operator using [[Mercer's theorem]] and obtain an additional view of the RKHS. Let <math>X</math> be a compact space equipped with a strictly positive finite [[Borel measure]] <math>\mu</math> and <math>K: X \times X \to \R</math> a continuous, symmetric, and positive definite function. Define the integral operator <math>T_K: L_2(X) \to L_2(X)</math> as
:<math> [T_K f](\cdot) =\int_X K({}\cdot{},t) f(t)\, d\mu(t) </math>
where <math>L_2(X)</math> is the space of square integrable functions with respect to <math> \mu </math>.
Line 134 ⟶ 140:
Mercer's theorem states that the spectral decomposition of the integral operator <math>T_K</math> of <math>K</math> yields a series representation of <math>K</math> in terms of the eigenvalues and eigenfunctions of <math> T_K </math>. This then implies that <math>K</math> is a reproducing kernel so that the corresponding RKHS can be defined in terms of these eigenvalues and eigenfunctions. We provide the details below.
Under these assumptions <math>T_K</math> is a compact, continuous, self-adjoint, and positive operator. The [[spectral theorem]] for self-adjoint operators implies that there is an at most countable decreasing sequence <math>(\sigma_i)
<math>T_K\varphi_i(x) = \sigma_i\varphi_i(x)</math>, where the <math>\{\varphi_i\}</math> form an orthonormal basis of <math>L_2(X)</math>. By the positivity of <math>T_K, \sigma_i > 0</math> for all <math>i.</math> One can also show that <math>T_K </math> maps continuously into the space of continuous functions <math>C(X)</math> and therefore we may choose continuous functions as the eigenvectors, that is, <math>\varphi_i \in C(X)</math> for all <math>i.</math> Then by Mercer's theorem <math> K </math> may be written in terms of the eigenvalues and continuous eigenfunctions as
Line 153 ⟶ 159:
:<math> \left\langle f,g \right\rangle_H = \sum_{i=1}^\infty \frac{\left\langle f,\varphi_i \right\rangle_{L_2}\left\langle g,\varphi_i \right\rangle_{L_2}}{\sigma_i}. </math>
This representation of the RKHS has application in probability and statistics, for example to the [[Karhunen–Loève theorem|
==Feature maps==
Line 180 ⟶ 186:
==Properties==
* Let <math>(X_i)_{i=1}^p</math> be a sequence of sets and <math>(K_i)_{i=1}^p</math> be a collection of corresponding positive definite functions on <math> (X_i)_{i=1}^p.</math> It then follows that
Line 210 ⟶ 216:
* '''Laplacian kernel''':
*::<math> K(x,y) = e^{-\frac{\|x - y\|}{\sigma}}, \qquad \sigma > 0 </math>
*:The squared norm of a function <math>f</math> in the RKHS <math>H</math> with this kernel is:<ref>Berlinet, Alain and Thomas, Christine. ''[https://books.google.com/books?id=bX3TBwAAQBAJ&dq=%22Reproducing+kernel+Hilbert+spaces+in+Probability+and+Statistics%22&pg=PP11 Reproducing kernel Hilbert spaces in Probability and Statistics]'', Kluwer Academic Publishers, 2004</ref><ref>Thomas-Agnan C . Computing a family of reproducing kernels for statistical applications. Numerical Algorithms, 13, pp. 21-32 (1996)</ref>
*::<math>\|f\|_H^2=\int_{\mathbb R}\Big( \frac1{\sigma} f(x)^2 + \sigma f'(x)^2\Big) \mathrm d x.</math>
Line 221 ⟶ 227:
In this case, ''H'' is isomorphic to <math>\Complex^n</math>.
The case of <math>X= \mathbb{D}</math> (where <math>\mathbb{D}</math> denotes the [[unit disc]]) is more sophisticated. Here the [[Bergman space]] [[
:<math>K(x,y)=\frac{1}{\pi}\frac{1}{(1-x\overline{y})^2}.</math>
Line 344 ⟶ 350:
|mr=290013
|doi-access=free}}
*Okutmustur, Baver. “Reproducing Kernel Hilbert Spaces,” M.S. dissertation, Bilkent University,
*Paulsen, Vern. “An introduction to the theory of reproducing kernel Hilbert spaces,”
* {{cite journal
|first1=Ingo |last1=Steinwart
Line 353 ⟶ 359:
|volume=35 |issue=3|year=2012 |pages=363–417
|mr=2914365 |doi=10.1007/s00365-012-9153-3
|s2cid=253885172
}}
* Rosasco, Lorenzo and Poggio, Thomas. "A Regularization Tour of Machine Learning – MIT 9.520 Lecture Notes" Manuscript, Dec. 2014.
* [[Grace Wahba|Wahba, Grace]], ''Spline Models for Observational Data'', [http://www.siam.org/books/ SIAM], 1990.
|