Sequential minimal optimization: Difference between revisions

Content deleted Content added
fixed overcounting
Citation bot (talk | contribs)
m Alter: journal, isbn, pages. Add: citeseerx. | You can use this bot yourself. Report bugs here. | Headbomb
Line 23:
| url = http://dspace.mit.edu/handle/1721.1/17549
| title = Everything Old is New Again: a Fresh Look at Historical Approaches in Machine Learning
| journal = Ph.D. thesisThesis
| pages = 18
}}</ref>
Line 55:
 
== Related Work ==
The first approach to splitting large SVM learning problems into a series of smaller optimization tasks was proposed by [[Bernhard E Boser]], [[Isabelle M Guyon]], [[Vladimir Vapnik]].<ref name="ReferenceA">{{Cite book | doi = 10.1145/130385.130401| chapter = A training algorithm for optimal margin classifiers| title = Proceedings of the fifth annual workshop on Computational learning theory - COLT '92| pages = 144| year = 1992| last1 = Boser | first1 = B. E. | last2 = Guyon | first2 = I. M. | last3 = Vapnik | first3 = V. N. | isbn = 089791497X978-0897914970| citeseerx = 10.1.1.21.3818}}</ref> It is known as the "chunking algorithm". The algorithm starts with a random subset of the data, solves this problem, and iteratively adds examples which violate the optimality conditions. One disadvantage of this algorithm is that it is necessary to solve QP-problems scaling with the number of SVs. On real world sparse data sets, SMO can be more than 1000 times faster than the chunking algorithm.<ref name = "Platt"/>
 
In 1997, [[E. Osuna]], [[R. Freund]], and [[F. Girosi]] proved a theorem which suggests a whole new set of QP algorithms for SVMs.<ref>{{Cite book | doi = 10.1109/NNSP.1997.622408| chapter = An improved training algorithm for support vector machines| title = Neural Networks for Signal Processing [1997] VII. Proceedings of the 1997 IEEE Workshop| pages = 276 - 285276–285| year = 1997| last1 = Osuna | first1 = E. | last2 = Freund | first2 = R. | last3 = Girosi | first3 = F. | isbn = 978-0-7803-4256-9| citeseerx = 10.1.1.392.7405}}</ref> By the virtue of this theorem a large QP problem can be broken down into a series of smaller QP sub-problems. A sequence of QP sub-problems that always add at least one violator of the [[Karush–Kuhn–Tucker conditions|Karush–Kuhn–Tucker (KKT) conditions]] is guaranteed to converge. The chunking algorithm obeys the conditions of the theorem, and hence will converge.<ref name = "Platt"/> The SMO algorithm can be considered a special case of the Osuna algorithm, where the size of the optimization is two and both Lagrange multipliers are replaced at every step with new multipliers that are chosen via good heuristics.<ref name = "Platt"/>
 
The SMO algorithm is closely related to a family of optimization algorithms called [[Bregman method]]s or row-action methods. These methods solve convex programming problems with linear constraints. They are iterative methods where each step projects the current primal point onto each constraint.<ref name = "Platt"/>