Revision as of 11:27, 7 March 2024 edit Maxeto0910 (talk \| contribs) Extended confirmed users 116,704 edits m →Condensed Nearest Neighbor for data reduction: no sentence Tag: Visual edit ← Previous edit		Revision as of 15:28, 11 June 2024 edit undo BD2412 (talk \| contribs) Autopatrolled, Administrators 2,527,114 edits m Clean up spacing errors around ref tags., replaced: " <ref → "<ref Tag: AWB Next edit →
Line 1: {{short description\|Non-parametric classification method}} {{About\|supervised classification/regression, but NOT a clustering algorithm\|algorithms for finding nearest neighbors\|Nearest neighbor search}} {{Distinguish\|k-means clustering}} {{DISPLAYTITLE:''k''-nearest neighbors algorithm}} In [[statistics]], the '''''k''-nearest neighbors algorithm''' ('''''k''-NN''') is a [[Non-parametric statistics\|non-parametric]] [[supervised learning]] method first developed by [[Evelyn Fix]] and [[Joseph Lawson Hodges Jr.\|Joseph Hodges]] in 1951,<ref>{{Cite report \| last1=Fix \| first1=Evelyn \| last2= Hodges \| first2=Joseph L. \| title=Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties \| issue=Report Number 4, Project Number 21-49-004 \| year=1951 \| url=https://apps.dtic.mil/dtic/tr/fulltext/u2/a800276.pdf \| archive-url=https://web.archive.org/web/20200926212807/https://apps.dtic.mil/dtic/tr/fulltext/u2/a800276.pdf \| url-status=live \| archive-date=September 26, 2020 \| publisher=USAF School of Aviation Medicine, Randolph Field, Texas}}</ref> and later expanded by [[Thomas M. Cover\|Thomas Cover]].<ref name=":1" /> It is used for [[statistical classification\|classification]] and [[regression analysis\|regression]]. In both cases, the input consists of the ''k'' closest training examples in a [[data set]]. The output depends on whether ''k''-NN is used for classification or regression: Line 153 ⟶ 155: [[Feature extraction]] and dimension reduction can be combined in one step using [[Principal Component Analysis\|principal component analysis]] (PCA), [[linear discriminant analysis]] (LDA), or [[Canonical correlation\|canonical correlation analysis]] (CCA) techniques as a pre-processing step, followed by clustering by ''k''-NN on [[Feature (machine learning)\|feature vectors]] in reduced-dimension space. This process is also called low-dimensional [[embedding]].<ref>{{citation \|last1=Shaw \|first1=Blake \|last2=Jebara \|first2=Tony \|title=Structure preserving embedding \|work=Proceedings of the 26th Annual International Conference on Machine Learning \|year=2009 \|pages=1–8 \| publication-date=June 2009 \|url=http://www.cs.columbia.edu/~jebara/papers/spe-icml09.pdf \|doi=10.1145/1553374.1553494 \|isbn=9781605585161 \|s2cid=8522279 }}</ref> For very-high-dimensional datasets (e.g. when performing a similarity search on live video streams, DNA data or high-dimensional [[time series]]) running a fast '''approximate''' ''k''-NN search using [[Locality Sensitive Hashing\|locality sensitive hashing]], "random projections",<ref>{{cite book\|doi=10.1145/502512.502546\|chapter=Random projection in dimensionality reduction \|title=Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '01 \|year=2001 \|last1=Bingham \|first1=Ella \|last2=Mannila \|first2=Heikki \|pages=245–250 \|isbn=158113391X \|s2cid=1854295 }}</ref> "sketches" <ref>Ryan, Donna (editor); ''High Performance Discovery in Time Series'', Berlin: Springer, 2004, {{ISBN\|0-387-00857-8}}</ref> or other high-dimensional similarity search techniques from the [[VLDB conference\|VLDB]] toolbox might be the only feasible option. == Decision boundary ==

K-nearest neighbors algorithm: Difference between revisions