K-nearest neighbors algorithm: Difference between revisions

Content deleted Content added
No edit summary
Tags: Mobile edit Mobile web edit
mNo edit summary
Line 154:
<ref>Beyer, Kevin, et al.. [https://minds.wisconsin.edu/bitstream/handle/1793/60174/TR1377.pdf?sequence=1 'When is “nearest neighbor” meaningful?]. Database Theory—ICDT’99, 217-235|year 1999</ref>
 
The curse of dimensionality in the ''k''-NN context basically means that [[Euclidean distance]] is unhelpful in high dimensions because all vectors are almost equidistant to the search query vector (imagine multiple points lying more or less on a circle with the query point at the center; the distance from the query to all data points in the search space is almost the same).<ref>{{Cite journal|last=Angiulli|first=Fabrizio|date=2018|title=On the Behavior of Intrinsically High-Dimensional Spaces: Distances, Direct and Reverse Nearest Neighbors, and Hubness|url=http://jmlr.csail.mit.edu/papers/v18/17-151.html|journal=Journal of Machine Learning Research (JMLR)|volume=18(170)|pages=1-60|via=}}</ref>
 
[[Feature extraction]] and dimension reduction can be combined in one step using [[Principal Component Analysis|principal component analysis]] (PCA), [[linear discriminant analysis]] (LDA), or [[Canonical correlation|canonical correlation analysis]] (CCA) techniques as a pre-processing step, followed by clustering by ''k''-NN on [[Feature (machine learning)|feature vectors]] in reduced-dimension space. In [[machine learning]] this process is also called low-dimensional [[embedding]].<ref>Shaw, Blake, and Tony Jebara. 'Structure preserving embedding. Proceedings of the 26th Annual International Conference on Machine Learning. ACM,2009</ref>