Regularization perspectives on support vector machines: Difference between revisions

Content deleted Content added
Monkbot (talk | contribs)
m replace/remove deprecated cs1|2 parameters; using AWB
Line 13:
where <math>\mathcal{H}</math> is a [[hypothesis space]]<ref>A hypothesis space is the set of functions used to model the data in a machine learning problem. Each function corresponds to a hypothesis about the structure of the data. Typically the functions in a hypothesis space form a [[Hilbert space]] of functions with norm formed from the loss function.</ref> of functions, <math>V:\mathbf Y \times \mathbf Y \to \mathbb R</math> is the loss function, <math>||\cdot||_\mathcal H</math> is a [[norm (mathematics)|norm]] on the hypothesis space of functions, and <math>\lambda\in\mathbb R</math> is the [[regularization parameter]].<ref>For insight on choosing the parameter, see, e.g., {{cite journal|last=Wahba|first=Grace|author2=Yonghua Wang |title=When is the optimal regularization parameter insensitive to the choice of the loss function|journal=Communications in Statistics - Theory and Methods|year=1990|volume=19|issue=5|pages=1685–1700|doi=10.1080/03610929008830285|url=http://www.tandfonline.com/doi/abs/10.1080/03610929008830285}}</ref>
 
When <math>\mathcal{H}</math> is a [[reproducing kernel Hilbert space]], there exists a [[kernel function]] <math>K: \mathbf X \times \mathbf X \to \mathbb R</math> that can be written as an <math>n\times n</math> [[symmetric]] [[Positive-definite kernel|positive definite]] [[matrix (mathematics)|matrix]] <math>\mathbf K</math>. By the [[representer theorem]],<ref>See {{cite journal|last=Scholkopf|first=Bernhard |coauthorsauthor2=Ralf Herbrich and |author3=Alex Smola|title=A Generalized Representer Theorem|journal=Computational Learning Theory: Lecture Notes in Computer Science|year=2001|volume=2111|pages=416–426|doi=10.1007/3-540-44581-1_27|url=http://www.springerlink.com/content/v1tvba62hd4837h9/?MUD=MP}}</ref> <math>f(x_i) = \sum_{f=1}^n c_j \mathbf K_{ij}</math>, and <math> ||f||^2_{\mathcal H} = \langle f,f\rangle_\mathcal H = \sum_{i=1}^n\sum_{j=1}^n c_ic_jK(x_i,x_j) = c^T\mathbf K c </math>
 
==Special properties of the hinge loss==
Line 41:
{{Reflist}}
 
*{{cite journal|last=Evgeniou|first=Theodoros |coauthorsauthor2=Massimiliano Pontil and |author3=Tomaso Poggio|title=Regularization Networks and Support Vector Machines|journal=Advances in Computational Mathematics|year=2000|volume=13|issue=1|pages=1–50|doi=10.1023/A:1018946025316|url=http://cbcl.mit.edu/projects/cbcl/publications/ps/evgeniou-reviewall.pdf}}
 
*{{cite web|last=Joachims|first=Thorsten|title=SVMlight|url=http://svmlight.joachims.org/}}