The Ho–Kashyap algorithm is an iterative method in machine learning for finding a linear decision boundary that separates two linearly separable classes. It was developed by Yu-Chi Ho and Rangasami L. Kashyap in 1965,[1][2] and usually presented as a problem in linear programming.

Setup

edit

Given a training set consisting of samples from two classes, the Ho–Kashyap algorithm seeks to find a weight vector   and a margin vector   such that:   where   is the augmented data matrix with samples from both classes (with appropriate sign conventions, e.g., samples from class 2 are negated),   is the weight vector to be determined, and   is a positive margin vector.

The algorithm minimizes the criterion function:  subject to the constraint that   (element-wise).

Given a problem of linearly separating two classes, we consider a dataset of elements   where  . Linearly separating them by a perceptron is equivalent to finding weight and bias   for a perceptron, such that: 

Algorithm

edit

The idea of the Ho–Kashyap rule is as follows:

  • Given any  , the corresponding   is known: It is simply  , where   denotes the Moore–Penrose pseudoinverse of  .
  • Therefore, it only remains to find   by gradient descent.
  • However, the gradient descent may sometimes decrease some of the coordinates of  , which may cause some coordinates of   to become negative, which is undesirable. Therefore, whenever some coordinates of   would have decreased, those coordinates are unchanged instead. As for the coordinates of   that would increase, those would increase without issue.

Formally, the algorithm is as follows:

  1. Initialization: Set   to an arbitrary positive vector, typically   (a vector of ones). Set the iteration counter  . Set  
  2. Loop until convergence, or until iteration counter exceeds some  .
    1. Error calculation: Compute the error vector:  .
    2. Margin update: Update the margin vector:   where   is a positive learning rate parameter, and   denotes the element-wise absolute value.
    3. Weight calculation: Compute the weight vector using the pseudoinverse:  .
    4. Convergence check: If   for some predetermined threshold   (close to zero), then return  .
    5. if   (all components non-positive), return "Samples not separable.".
  3. Return "Algorithm failed to converge in time.".

Properties

edit
  • If the training data is linearly separable, the algorithm converges to a solution (where  ) in a finite number of iterations.
  • If the data is not linearly separable, the algorithm may or may not ever reach the point where  . However, if it does happen that   at some iteration, this proves non-separability.
  • The convergence rate depends on the choice of the learning rate parameter   and the degree of linear separability of the data.

Relationship to other algorithms

edit
  • Perceptron algorithm: Both seek linear separators. The perceptron updates weights incrementally based on individual misclassified samples, while Ho–Kashyap is a batch method that processes all samples to compute the pseudoinverse and updates based on an overall error vector.
  • Linear discriminant analysis (LDA): LDA assumes underlying Gaussian distributions with equal covariances for the classes and derives the decision boundary from these statistical assumptions. Ho–Kashyap makes no explicit distributional assumptions and instead tries to solve a system of linear inequalities directly.
  • Support vector machines (SVM): For linearly separable data, SVMs aim to find the maximum-margin hyperplane. The Ho–Kashyap algorithm finds a separating hyperplane but not necessarily the one with the maximum margin. If the data is not separable, soft-margin SVMs allow for some misclassifications by optimizing a trade-off between margin size and misclassification penalty, while Ho–Kashyap provides a least-squares solution.

Variants

edit

Modified Ho–Kashyap algorithm changes weight calculation step   to  .

Kernel Ho–Kashyap algorithm: Applies kernel methods (the "kernel trick") to the Ho–Kashyap framework to enable non-linear classification by implicitly mapping data to a higher-dimensional feature space.[3]

See also

edit

References

edit
  1. ^ Ho, Y-C.; Kashyap, R. L. (October 1965). "An Algorithm for Linear Inequalities and its Applications". IEEE Transactions on Electronic Computers. EC-14 (5): 683–688. doi:10.1109/PGEC.1965.264207. ISSN 0367-7508.
  2. ^ Ho, Yu-Chi; Kashyap, R. L. (February 1966). "A Class of Iterative Procedures for Linear Inequalities". SIAM Journal on Control. 4 (1): 112–115. doi:10.1137/0304010. ISSN 0036-1402.
  3. ^ Łęski, Jacek (2004). "Kernel Ho–Kashyap classifier with generalization control". International Journal of Applied Mathematics and Computer Science. 14 (1): 53–61.
  • Duda, R. O.; Hart, P. E.; Stork, D. G. (2001). "5.9. The Ho–Kashyap Procedures". Pattern Classification (2nd ed.). Wiley-Interscience. ISBN 978-0-471-05669-0.
  • Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer. ISBN 978-0-387-31073-2.
  • Theodoridis, S.; Koutroumbas, K. (2008). Pattern Recognition (4th ed.). Academic Press. ISBN 978-1-59749-272-0.