Reproducing kernel Hilbert space: Difference between revisions

Content deleted Content added
Stated uniform convergence more sharply
Link suggestions feature: 2 links added.
 
(15 intermediate revisions by 8 users not shown)
Line 2:
[[File:Different Views on RKHS.png|thumb|right|Figure illustrates related but varying approaches to viewing RKHS]]
 
In [[functional analysis]], a '''reproducing kernel Hilbert space''' ('''RKHS''') is a [[Hilbert space]] of functions in which point evaluation is a continuous linear [[functionallinear (mathematics)|functional]]. Specifically, a Hilbert space <math>H</math> of functions from a set <math>X</math> (to <math>\mathbb{R}</math> or <math>\mathbb{C}</math> ) is an RKHS if, forthe eachpoint-evaluation functional <math>L_x:H\to\mathbb{C}</math>, <math>L_x(f)=f(x)</math>, is continuous for every <math>x\in X</math>. Equivalently, <math>H</math> is an RKHS if there exists a function <math>K_x \in H</math> such that, for all <math>f \in H</math>,<math display="block">\langle f, K_x \rangle = f(x).</math>The function <math>K_x</math> is then called the ''reproducing kernel'', and it reproduces the value of <math>f</math> at <math>x</math> via the inner product.
 
An immediate consequence of this property is that convergence in norm implies [[uniform convergence]] on any subset of <math>X</math> on which <math>\|K_x\|</math> is bounded. However, the converse does not necessarily hold. Often the set <math>X</math> carries a topology, and <math>\|K_x\|</math> depends continuously on <math>x\in X</math>, in which case: convergence in norm implies uniform convergence on compact subsets of <math>X</math>.
<math>\langle f, K_x \rangle = f(x).</math>
 
The function <math>K_x</math> is called the reproducing kernel, and it reproduces the value of <math>f</math> at <math>x</math> via the inner product.
 
An immediate consequence of this property is that convergence in norm implies uniform convergence on any subset of <math>X</math> on which <math>\|K_x\|</math> is bounded. However, the converse does not necessarily hold.
 
For example, consider the sequence of functions <math>\sin^{2n}(x)</math>. These functions converge pointwise to 0 as <math>n \to \infty</math> , but they do not converge uniformly (i.e., they do not converge with respect to the supremum norm). This illustrates that pointwise convergence does not imply convergence in norm. It is important to note that the supremum norm does not arise from any inner product, as it does not satisfy the [[Polarization identity|parallelogram law]].
 
It is not entirely straightforward to construct natural examples of a Hilbert space which are not an RKHS in a non-trivial fashion.<ref>Alpay, D., and T. M. Mills. "A family of Hilbert spaces which are not reproducing kernel Hilbert spaces." J. Anal. Appl. 1.2 (2003): 107–111.</ref> Some examples, however, have been found.<ref> Z. Pasternak-Winiarski, "On weights which admit reproducing kernel of Bergman type", ''International Journal of Mathematics and Mathematical Sciences'', vol. 15, Issue 1, 1992. </ref><ref> T. Ł. Żynda, "On weights which admit reproducing kernel of Szegő type", ''Journal of Contemporary Mathematical Analysis'' (Armenian Academy of Sciences), 55, 2020. </ref>
 
While, formally, [[Square-integrable function|''L''<sup>2</sup> spaces]] is usuallyare defined as a Hilbert spacespaces whose elements areof equivalence classes of functions, itthis definition can trivially be triviallyextended redefined asto a Hilbert space of functions by using choice to selectchoosing a (total) function as a representative for each equivalence class. However, no choice of representatives can make this space an RKHS (<math>K_0</math> would need to be the non-existent Dirac delta function). However, there are RKHSs in which the norm is an ''L''<sup>2</sup>-norm, such as the space of band-limited functions (see the example below).
 
An RKHS is associated with a kernel that reproduces every function in the space in the sense that for every <math>x</math> in the set on which the functions are defined, "evaluation at <math>x</math>" can be performed by taking an inner product with a function determined by the kernel. Such a ''reproducing kernel'' exists if and only if every evaluation functional is continuous.
 
The reproducing kernel was first introduced in the 1907 work of [[Stanisław Zaremba (mathematician)|Stanisław Zaremba]]{{fact|date=June 2025}} concerning [[boundary value problem]]s for [[Harmonic function|harmonic]] and [[Biharmonic equation|biharmonic functions]]. [[James Mercer (mathematician)|James Mercer]] simultaneously examined [[Positive-definite kernel|functions]] which satisfy the reproducing property in the theory of [[integral equation]]s. The idea of the reproducing kernel remained untouched for nearly twenty years until it appeared in the dissertations of [[Gábor Szegő]], [[Stefan Bergman]], and [[Salomon Bochner]]. The subject was eventually systematically developed in the early 1950s by [[Nachman Aronszajn]] and Stefan Bergman.<ref>Okutmustur</ref>
 
These spaces have wide applications, including [[complex analysis]], [[harmonic analysis]], and [[quantum mechanics]]. Reproducing kernel Hilbert spaces are particularly important in the field of [[statistical learning theory]] because of the celebrated [[representer theorem]] which states that every function in an RKHS that minimises an empirical risk functional can be written as a [[linear combination]] of the kernel function evaluated at the training points. This is a practically useful result as it effectively simplifies the [[empirical risk minimization]] problem from an infinite dimensional to a finite dimensional optimization problem.
Line 109 ⟶ 103:
:'''Theorem'''. Suppose ''K'' is a symmetric, [[positive definite kernel]] on a set ''X''. Then there is a unique Hilbert space of functions on ''X'' for which ''K'' is a reproducing kernel.
 
'''Proof'''. For all ''x'' in ''X'', define ''K<sub>x</sub>'' = ''K''(''x'', ⋅ ). Let ''H''<sub>0</sub> be the [[linear span]] of {''K<sub>x</sub>'' : ''x'' ∈ ''X''}. Define an inner product on ''H''<sub>0</sub> by
 
:<math> \left\langle \sum_{j=1}^n b_j K_{y_j}, \sum_{i=1}^m a_i K_{x_i} \right \rangle_{H_0} = \sum_{i=1}^m \sum_{j=1}^n {a_i} b_j K(y_j, x_i),</math>
Line 140 ⟶ 134:
We may characterize a symmetric positive definite kernel <math>K</math> via the integral operator using [[Mercer's theorem]] and obtain an additional view of the RKHS. Let <math>X</math> be a compact space equipped with a strictly positive finite [[Borel measure]] <math>\mu</math> and <math>K: X \times X \to \R</math> a continuous, symmetric, and positive definite function. Define the integral operator <math>T_K: L_2(X) \to L_2(X)</math> as
 
:<math> [T_K f](\cdot) =\int_X K({}\cdot{},t) f(t)\, d\mu(t) </math>
 
where <math>L_2(X)</math> is the space of square integrable functions with respect to <math> \mu </math>.
Line 146 ⟶ 140:
Mercer's theorem states that the spectral decomposition of the integral operator <math>T_K</math> of <math>K</math> yields a series representation of <math>K</math> in terms of the eigenvalues and eigenfunctions of <math> T_K </math>. This then implies that <math>K</math> is a reproducing kernel so that the corresponding RKHS can be defined in terms of these eigenvalues and eigenfunctions. We provide the details below.
 
Under these assumptions <math>T_K</math> is a compact, continuous, self-adjoint, and positive operator. The [[spectral theorem]] for self-adjoint operators implies that there is an at most countable decreasing sequence <math>(\sigma_i)_i_{i \geq 0} </math> such that <math display="inline">\lim_{i \to \infty}\sigma_i = 0</math> and
<math>T_K\varphi_i(x) = \sigma_i\varphi_i(x)</math>, where the <math>\{\varphi_i\}</math> form an orthonormal basis of <math>L_2(X)</math>. By the positivity of <math>T_K, \sigma_i > 0</math> for all <math>i.</math> One can also show that <math>T_K </math> maps continuously into the space of continuous functions <math>C(X)</math> and therefore we may choose continuous functions as the eigenvectors, that is, <math>\varphi_i \in C(X)</math> for all <math>i.</math> Then by Mercer's theorem <math> K </math> may be written in terms of the eigenvalues and continuous eigenfunctions as
 
Line 165 ⟶ 159:
:<math> \left\langle f,g \right\rangle_H = \sum_{i=1}^\infty \frac{\left\langle f,\varphi_i \right\rangle_{L_2}\left\langle g,\varphi_i \right\rangle_{L_2}}{\sigma_i}. </math>
 
This representation of the RKHS has application in probability and statistics, for example to the [[Karhunen–Loève theorem|Karhunen-LoèveKarhunen–Loève representation]] for stochastic processes and [[kernel PCA]].
 
==Feature maps==
Line 233 ⟶ 227:
In this case, ''H'' is isomorphic to <math>\Complex^n</math>.
 
The case of <math>X= \mathbb{D}</math> (where <math>\mathbb{D}</math> denotes the [[unit disc]]) is more sophisticated. Here the [[Bergman space]] [[HA square|<math>HA^2(\mathbb{D})</math>]] is the space of [[square-integrable function|square-integrable]] [[holomorphic function]]s on <math>\mathbb{D}</math>. It can be shown that the reproducing kernel for <math>HA^2(\mathbb{D})</math> is
 
:<math>K(x,y)=\frac{1}{\pi}\frac{1}{(1-x\overline{y})^2}.</math>