Revision as of 13:36, 29 May 2012 edit Elmackev (talk \| contribs) 131 edits →Theoretical background ← Previous edit		Revision as of 14:03, 29 May 2012 edit undo Elmackev (talk \| contribs) 131 edits Making the intro more accessible Next edit →
Line 1: {{context\|date=May 2012}} '''Regularization perspectives on support vector ~~machine~~machines''' ~~interpret~~provide a way of interpreting [[Support vector machine]]s (SVMs) in the context of other machine learning algorithms. SVM algorithms categorize [[multidimensional]] data, with the goal of fitting the [[training set]] data well, but also avoiding [[overfitting]], so that the solution [[generalizes]] to new data points. [[Regularization]] algorithms also aim to fit training set data and avoid overfitting. They do this by choosing a fitting function that has low error on the training set, but also is not too complicated, where complicated functions are functions with high [[norm]]s in some [[function space]]. Specifically, [[Tikhonov regularization]] algorithms choose a function that minimize the sum of training set error plus the function's norm. The training set error can be calculated with different [[loss function]]s. For example, [[regularized least squares]] is a special case of Tikhonov regularization using the [[squared error loss]]. Regularization perspectives on support vector machines interpret SVM as a special case [[Tikhonov regularization]], specifically Tikhonov regularization with the [[hinge loss]] for a [[loss function]]. This provides a theoretical framework with which to analyze SVM algorithms and compare them to other algorithms with the same goals: to [[generalize]] without [[overfitting]]. SVM was first proposed in 1995 by [[Corinna Cortes]] and [[Vladimir Vapnik]], and framed geometrically as a method for finding [[hyperplane]]s that can separate [[multidimensional]] data into two categories.<ref>{{cite journal\|last=Cortes\|first=Corinna\|coauthors=Vladimir Vapnik\|title=Suppor-Vector Networks\|journal=Machine Learning\|year=1995\|volume=20\|pages=273-297\|doi=10.1007/BF00994018\|url=http://www.springerlink.com/content/k238jx04hm87j80g/?MUD=MP}}</ref> This traditional geometric interpretation of SVMs provides useful intuition about how SVMs work, but is difficult to relate to other [[machine learning]] techniques for avoiding overfitting like [[regularization]], [[early stopping]], [[sparsity]] and [[Bayesian inference]]. However, once it was discovered that SVM is also a [[special case]] of Tikhonov regularization, regularization perspectives on SVM provided the theory necessary to fit SVM within a broader class of algorithms.<ref> {{cite web\|last=Rosasco\|first=Lorenzo\|title=Regularized Least-Squares and Support Vector Machines\|url=http://www.mit.edu/~9.520/spring12/slides/class06/class06_RLSSVM.pdf}}, </ref><ref>{{cite book\|last=Rifkin\|first=Ryan\|title=Everything Old is New Again: A Fresh Look at Historical Approaches in Machine Learning\|year=2002\|publisher=MIT (PhD thesis)\|url=http://web.mit.edu/~9.520/www/Papers/thesis-rifkin.pdf}} </ref><ref>{{cite journal\|last=Lee\|first=Yoonkyung\|coauthors=Grace Wahba\|title=Multicategory Support Vector Machines\|journal=Journal of the American Statistical Association\|year=2012\|volume=99\|issue=465\|pages=67-81\|doi=10.1198/016214504000000098\|url=http://www.tandfonline.com/doi/abs/10.1198/016214504000000098}}

Regularization perspectives on support vector machines: Difference between revisions