Ho–Kashyap rule: Difference between revisions

Content deleted Content added
move
 
mNo edit summary
Line 2:
{{Use dmy dates|date=May 2024}}
 
The Ho-KashyapHo–Kashyap algorithm is an iterative method in [[machine learning]] for finding a [[linear classifier|linear decision boundary]] that separates two [[Linear separability|linearly separable]] classes. It was developed by [[Yu-Chi Ho]] and [[Rangasami L. Kashyap]] in 1965,<ref>{{Cite journal |last=Ho |first=Y-C. |last2=Kashyap |first2=R. L. |date=1965-10 |title=An Algorithm for Linear Inequalities and its Applications |url=http://ieeexplore.ieee.org/document/4038553/ |journal=IEEE Transactions on Electronic Computers |volume=EC-14 |issue=5 |pages=683–688 |doi=10.1109/PGEC.1965.264207 |issn=0367-7508}}</ref><ref>{{Cite journal |last=Ho |first=Yu-Chi |last2=Kashyap |first2=R. L. |date=1966-02 |title=A Class of Iterative Procedures for Linear Inequalities |url=http://epubs.siam.org/doi/10.1137/0304010 |journal=SIAM Journal on Control |language=en |volume=4 |issue=1 |pages=112–115 |doi=10.1137/0304010 |issn=0036-1402}}</ref> and usually presented as a problem in [[linear programming]].
 
== Setup ==
Given a training set consisting of samples from two classes, the Ho-KashyapHo–Kashyap algorithm seeks to find a weight vector <math>\mathbf{w}</math> and a margin vector <math>\mathbf{b}</math> such that:
<math display="block"> \mathbf{Yw} = \mathbf{b} </math>
where <math>\mathbf{Y}</math> is the augmented data matrix with samples from both classes (with appropriate sign conventions, e.g., samples from class 2 are negated), <math>\mathbf{w}</math> is the weight vector to be determined, and <math>\mathbf{b}</math> is a positive margin vector.
Line 22:
 
== Algorithm ==
The idea of the Ho-KashyapHo–Kashyap rule is as follows:
 
* Given any <math>\mathbf{b}</math>, the corresponding <math>\mathbf{w}</math> is known: It is simply <math>\mathbf{w} = \mathbf{Y}^+ \mathbf{b}</math>, where <math>\mathbf{Y}^+</math> denotes the [[Moore–Penrose inverse|Moore-PenroseMoore–Penrose pseudoinverse]] of <math>\mathbf{Y}</math>.
* Therefore, it only remains to find <math>\mathbf{b}</math> by gradient descent.
* However, the gradient descent may sometimes ''decrease'' some of the coordinates of <math>\mathbf{b}</math>, which may cause some coordinates of <math>\mathbf{b}</math> to become negative, which is undesirable. Therefore, whenever some coordinates of <math>\mathbf{b}</math> would have decreased, those coordinates are unchanged instead. As for the coordinates of <math>\mathbf{b}</math> that would increase, those would increase without issue.
Line 44:
 
== Relationship to other algorithms ==
* [[Perceptron]] algorithm: Both seek linear separators. The perceptron updates weights incrementally based on individual misclassified samples, while Ho-KashyapHo–Kashyap is a batch method that processes all samples to compute the pseudoinverse and updates based on an overall error vector.
* [[Linear discriminant analysis]] (LDA): LDA assumes underlying Gaussian distributions with equal covariances for the classes and derives the decision boundary from these statistical assumptions. Ho-KashyapHo–Kashyap makes no explicit distributional assumptions and instead tries to solve a system of linear inequalities directly.
* [[Support vector machine]]s (SVM): For linearly separable data, SVMs aim to find the maximum-margin hyperplane. The Ho-KashyapHo–Kashyap algorithm finds ''a'' separating hyperplane but not necessarily the one with the maximum margin. If the data is not separable, soft-margin SVMs allow for some misclassifications by optimizing a trade-off between margin size and misclassification penalty, while Ho-KashyapHo–Kashyap provides a least-squares solution.
 
== Variants ==
Modified Ho-KashyapHo–Kashyap algorithm changes weight calculation step <math>\mathbf{w}(k+1) = \mathbf{Y}^+ \mathbf{b}(k+1)</math> to <math>\mathbf{w}(k+1) = \mathbf{w}(k) + \eta_k \mathbf{Y}^+ |\mathbf{e}(k)|</math>.
 
Kernel Ho-KashyapHo–Kashyap algorithm: Applies [[kernel method]]s (the "[[kernel trick]]") to the Ho-KashyapHo–Kashyap framework to enable non-linear classification by implicitly mapping data to a higher-dimensional feature space.<ref>{{cite journal |last=Łęski |first=Jacek |date=2004 |title=Kernel Ho-KashyapHo–Kashyap classifier with generalization control. |journal=International Journal of Applied Mathematics and Computer Science |volume=14 |issue=1 |pages=53-61}}</ref>
 
== See also ==
Line 63:
== References ==
<references />
* {{cite book |last1=Duda |first1=R. O. |last2=Hart |first2=P. E. |last3=Stork |first3=D. G. |year=2001 |title=Pattern Classification |edition=2nd |chapter=5.9. The Ho-KashyapHo–Kashyap Procedures |publisher=Wiley-Interscience |isbn=978-0-471-05669-0}}
* {{cite book |last1=Bishop |first1=C. M. |year=2006 |title=Pattern Recognition and Machine Learning |publisher=Springer |isbn=978-0-387-31073-2}}
* {{cite book |last1=Theodoridis |first1=S. |last2=Koutroumbas |first2=K. |year=2008 |title=Pattern Recognition |edition=4th |publisher=Academic Press |isbn=978-1-59749-272-0}}