Content deleted Content added
caps |
|||
Line 56:
When all the Lagrange multipliers satisfy the KKT conditions (within a user-defined tolerance), the problem has been solved. Although this algorithm is guaranteed to converge, heuristics are used to choose the pair of multipliers so as to accelerate the rate of convergence. This is critical for large data sets since there are <math>n(n-1)/2</math> possible choices for <math>\alpha_i</math> and <math>\alpha_j</math>.
== Related
The first approach to splitting large SVM learning problems into a series of smaller optimization tasks was proposed by [[Bernhard Boser]], [[Isabelle Guyon]], [[Vladimir Vapnik]].<ref name="ReferenceA">{{Cite book | doi = 10.1145/130385.130401| chapter = A training algorithm for optimal margin classifiers| title = Proceedings of the fifth annual workshop on Computational learning theory - COLT '92| pages = 144| year = 1992| last1 = Boser | first1 = B. E. | last2 = Guyon | first2 = I. M. | last3 = Vapnik | first3 = V. N. | isbn = 978-0897914970| citeseerx = 10.1.1.21.3818| s2cid = 207165665}}</ref> It is known as the "chunking algorithm". The algorithm starts with a random subset of the data, solves this problem, and iteratively adds examples which violate the optimality conditions. One disadvantage of this algorithm is that it is necessary to solve QP-problems scaling with the number of SVs. On real world sparse data sets, SMO can be more than 1000 times faster than the chunking algorithm.<ref name = "Platt"/>
|