Revision as of 18:34, 25 May 2021 edit Citation bot (talk \| contribs) Bots 5,873,265 edits Alter: url. URLs might have been anonymized. Add: authors 1-1. Removed parameters. Some additions/deletions were parameter name changes. \| Use this bot. Report bugs. \| Suggested by SemperIocundus \| #UCB_webform ← Previous edit		Revision as of 17:27, 8 March 2022 edit undo JoeNMLC (talk \| contribs) Extended confirmed users 135,566 edits →top: ce lead, add wikilink, rm context tag Next edit →
Line 1: Within [[mathematical analysis]], '''Regularization perspectives on support-vector machines''' provide a way of interpreting [[support-vector machine]]s (SVMs) in the context of other machine-learning algorithms. SVM algorithms categorize [[multidimensional]] data, with the goal of fitting the [[training set]] data well, but also avoiding [[overfitting]], so that the solution [[generalize]]s to new data points. [[Regularization (mathematics)\|Regularization]] algorithms also aim to fit training set data and avoid overfitting. They do this by choosing a fitting function that has low error on the training set, but also is not too complicated, where complicated functions are functions with high [[norm (mathematics)\|norm]]s in some [[function space]]. Specifically, [[Tikhonov regularization]] algorithms choose a function that minimizes the sum of training-set error plus the function's norm. The training-set error can be calculated with different [[loss function]]s. For example, [[regularized least squares]] is a special case of Tikhonov regularization using the [[squared error loss]] as the loss function.<ref name="rosasco1">{{cite web \|last=Rosasco \|first=Lorenzo \|title=Regularized Least-Squares and Support Vector Machines \|url=https://www.mit.edu/~9.520/spring12/slides/class06/class06_RLSSVM.pdf}}</ref>▼ ~~{{context\|date=May 2012}}~~ ▲'''Regularization perspectives on support-vector machines''' provide a way of interpreting [[support-vector machine]]s (SVMs) in the context of other machine-learning algorithms. SVM algorithms categorize [[multidimensional]] data, with the goal of fitting the [[training set]] data well, but also avoiding [[overfitting]], so that the solution [[generalize]]s to new data points. [[Regularization (mathematics)\|Regularization]] algorithms also aim to fit training set data and avoid overfitting. They do this by choosing a fitting function that has low error on the training set, but also is not too complicated, where complicated functions are functions with high [[norm (mathematics)\|norm]]s in some [[function space]]. Specifically, [[Tikhonov regularization]] algorithms choose a function that minimizes the sum of training-set error plus the function's norm. The training-set error can be calculated with different [[loss function]]s. For example, [[regularized least squares]] is a special case of Tikhonov regularization using the [[squared error loss]] as the loss function.<ref name="rosasco1">{{cite web \|last=Rosasco \|first=Lorenzo \|title=Regularized Least-Squares and Support Vector Machines \|url=https://www.mit.edu/~9.520/spring12/slides/class06/class06_RLSSVM.pdf}}</ref> Regularization perspectives on support-vector machines interpret SVM as a special case of Tikhonov regularization, specifically Tikhonov regularization with the [[hinge loss]] for a loss function. This provides a theoretical framework with which to analyze SVM algorithms and compare them to other algorithms with the same goals: to [[generalize]] without [[overfitting]]. SVM was first proposed in 1995 by [[Corinna Cortes]] and [[Vladimir Vapnik]], and framed geometrically as a method for finding [[hyperplane]]s that can separate [[multidimensional]] data into two categories.<ref>{{cite journal \|last=Cortes \|first=Corinna \|author2=Vladimir Vapnik \|title=Support-Vector Networks \|journal=Machine Learning \|year=1995 \|volume=20 \|issue=3 \|pages=273–297 \|doi=10.1007/BF00994018 \|doi-access=free }}</ref> This traditional geometric interpretation of SVMs provides useful intuition about how SVMs work, but is difficult to relate to other [[machine-learning]] techniques for avoiding overfitting, like [[regularization (mathematics)\|regularization]], [[early stopping]], [[sparsity]] and [[Bayesian inference]]. However, once it was discovered that SVM is also a [[special case]] of Tikhonov regularization, regularization perspectives on SVM provided the theory necessary to fit SVM within a broader class of algorithms.<ref name="rosasco1"/><ref>{{cite book \|last=Rifkin \|first=Ryan \|title=Everything Old is New Again: A Fresh Look at Historical Approaches in Machine Learning \|year=2002 \|publisher=MIT (PhD thesis) \|url=http://web.mit.edu/~9.520/www/Papers/thesis-rifkin.pdf}}</ref><ref name="Lee 2012 67–81">{{cite journal \|last1=Lee \|first1=Yoonkyung \|author1-link= Yoonkyung Lee \|first2=Grace \|last2=Wahba \|author2-link=Grace Wahba \|title=Multicategory Support Vector Machines \|journal=Journal of the American Statistical Association \|year=2012 \|volume=99 \|issue=465 \|pages=67–81 \|doi=10.1198/016214504000000098 }}</ref> This has enabled detailed comparisons between SVM and other forms of Tikhonov regularization, and theoretical grounding for why it is beneficial to use SVM's loss function, the hinge loss.<ref name="Rosasco 2004 1063–1076">{{cite journal \|authors=Rosasco L., De Vito E., Caponnetto A., Piana M., Verri A. \|title=Are Loss Functions All the Same \|journal=Neural Computation \|date=May 2004 \|volume=16 \|issue=5 \|series=5 \|pages=1063–1076 \|doi=10.1162/089976604773135104 \|pmid=15070510\|citeseerx=10.1.1.109.6786 }}</ref>

Regularization perspectives on support vector machines: Difference between revisions