Content deleted Content added
Added free to read link in citations with OAbot #oabot |
→Theoretical background: fix ref |
||
Line 12:
where <math>\mathcal{H}</math> is a [[hypothesis space]]<ref>A hypothesis space is the set of functions used to model the data in a machine-learning problem. Each function corresponds to a hypothesis about the structure of the data. Typically the functions in a hypothesis space form a [[Hilbert space]] of functions with norm formed from the loss function.</ref> of functions, <math>V \colon \mathbf Y \times \mathbf Y \to \mathbb R</math> is the loss function, <math>\|\cdot\|_\mathcal H</math> is a [[norm (mathematics)|norm]] on the hypothesis space of functions, and <math>\lambda \in \mathbb R</math> is the [[regularization parameter]].<ref>For insight on choosing the parameter, see, e.g., {{cite journal |last=Wahba |first=Grace |author2=Yonghua Wang |title=When is the optimal regularization parameter insensitive to the choice of the loss function |journal=Communications in Statistics – Theory and Methods |year=1990 |volume=19 |issue=5 |pages=1685–1700 |doi=10.1080/03610929008830285 }}</ref>
When <math>\mathcal{H}</math> is a [[reproducing kernel Hilbert space]], there exists a [[kernel function]] <math>K \colon \mathbf X \times \mathbf X \to \mathbb R</math> that can be written as an <math>n \times n</math> [[symmetric]] [[Positive-definite kernel|positive-definite]] [[matrix (mathematics)|matrix]] <math>\mathbf K</math>. By the [[representer theorem]],<ref>
| last1 = Schölkopf | first1 = Bernhard
| last2 = Herbrich | first2 = Ralf
| last3 = Smola | first3 = Alexander J.
| editor1-last = Helmbold | editor1-first = David P.
| editor2-last = Williamson | editor2-first = Robert C.
| contribution = A generalized representer theorem
| doi = 10.1007/3-540-44581-1_27
| pages = 416–426
| publisher = Springer
| series = Lecture Notes in Computer Science
| title = Computational Learning Theory, 14th Annual Conference on Computational Learning Theory, COLT 2001 and 5th European Conference on Computational Learning Theory, EuroCOLT 2001, Amsterdam, The Netherlands, July 16–19, 2001, Proceedings
| volume = 2111
| year = 2001}}</ref>
: <math>f(x_i) = \sum_{j=1}^n c_j \mathbf K_{ij}, \text{ and } \|f\|^2_{\mathcal H} = \langle f, f\rangle_\mathcal H = \sum_{i=1}^n \sum_{j=1}^n c_i c_jK(x_i, x_j) = c^T \mathbf K c.</math>
|