Regularization perspectives on support vector machines: Difference between revisions

Content deleted Content added
Elmackev (talk | contribs)
Elmackev (talk | contribs)
Making the intro more accessible
Line 1:
{{context|date=May 2012}}
'''Regularization perspectives on support vector machinemachines''' interpretprovide a way of interpreting [[Support vector machine]]s (SVMs) in the context of other machine learning algorithms. SVM algorithms categorize [[multidimensional]] data, with the goal of fitting the [[training set]] data well, but also avoiding [[overfitting]], so that the solution [[generalizes]] to new data points. [[Regularization]] algorithms also aim to fit training set data and avoid overfitting. They do this by choosing a fitting function that has low error on the training set, but also is not too complicated, where complicated functions are functions with high [[norm]]s in some [[function space]]. Specifically, [[Tikhonov regularization]] algorithms choose a function that minimize the sum of training set error plus the function's norm. The training set error can be calculated with different [[loss function]]s. For example, [[regularized least squares]] is a special case of Tikhonov regularization using the [[squared error loss]].

Regularization perspectives on support vector machines interpret SVM as a special case [[Tikhonov regularization]], specifically Tikhonov regularization with the [[hinge loss]] for a [[loss function]]. This provides a theoretical framework with which to analyze SVM algorithms and compare them to other algorithms with the same goals: to [[generalize]] without [[overfitting]]. SVM was first proposed in 1995 by [[Corinna Cortes]] and [[Vladimir Vapnik]], and framed geometrically as a method for finding [[hyperplane]]s that can separate [[multidimensional]] data into two categories.<ref>{{cite journal|last=Cortes|first=Corinna|coauthors=Vladimir Vapnik|title=Suppor-Vector Networks|journal=Machine Learning|year=1995|volume=20|pages=273-297|doi=10.1007/BF00994018|url=http://www.springerlink.com/content/k238jx04hm87j80g/?MUD=MP}}</ref> This traditional geometric interpretation of SVMs provides useful intuition about how SVMs work, but is difficult to relate to other [[machine learning]] techniques for avoiding overfitting like [[regularization]], [[early stopping]], [[sparsity]] and [[Bayesian inference]]. However, once it was discovered that SVM is also a [[special case]] of Tikhonov regularization, regularization perspectives on SVM provided the theory necessary to fit SVM within a broader class of algorithms.<ref> {{cite web|last=Rosasco|first=Lorenzo|title=Regularized Least-Squares and Support Vector Machines|url=http://www.mit.edu/~9.520/spring12/slides/class06/class06_RLSSVM.pdf}},
</ref><ref>{{cite book|last=Rifkin|first=Ryan|title=Everything Old is New Again: A Fresh Look at Historical Approaches in Machine Learning|year=2002|publisher=MIT (PhD thesis)|url=http://web.mit.edu/~9.520/www/Papers/thesis-rifkin.pdf}}
</ref><ref>{{cite journal|last=Lee|first=Yoonkyung|coauthors=Grace Wahba|title=Multicategory Support Vector Machines|journal=Journal of the American Statistical Association|year=2012|volume=99|issue=465|pages=67-81|doi=10.1198/016214504000000098|url=http://www.tandfonline.com/doi/abs/10.1198/016214504000000098}}