Revision as of 07:56, 11 December 2024 edit Citation bot (talk \| contribs) Bots 5,861,813 edits Added bibcode. \| Use this bot. Report bugs. \| Suggested by Dominic3203 \| Category:Machine learning algorithms \| #UCB_Category 66/84 ← Previous edit		Revision as of 13:55, 5 February 2025 edit undo InternetArchiveBot (talk \| contribs) Bots, Pending changes reviewers 5,671,039 edits Rescuing 2 sources and tagging 0 as dead.) #IABot (v2.0.9.5 Next edit →
Line 111: The naive version of the algorithm is easy to implement by computing the distances from the test example to all stored examples, but it is computationally intensive for large training sets. Using an approximate [[nearest neighbor search]] algorithm makes ''k-''NN computationally tractable even for large data sets. Many nearest neighbor search algorithms have been proposed over the years; these generally seek to reduce the number of distance evaluations actually performed. ''k-''NN has some strong [[consistency (statistics)\|consistency]] results. As the amount of data approaches infinity, the two-class ''k-''NN algorithm is guaranteed to yield an error rate no worse than twice the [[Bayes error rate]] (the minimum achievable error rate given the distribution of the data).<ref name=":1">{{cite journal \|url=http://ssg.mit.edu/cal/abs/2000_spring/np_dens/classification/cover67.pdf \|doi=10.1109/TIT.1967.1053964 \|author-link1=Thomas M. Cover \|last1=Cover \|first1=Thomas M. \|author-link2=Peter E. Hart \|last2=Hart \|first2=Peter E. \|title=Nearest neighbor pattern classification \|journal=IEEE Transactions on Information Theory \|year=1967 \|volume=13 \|issue=1 \|pages=21–27 \| citeseerx=10.1.1.68.2616 \|s2cid=5246200 \|archive-date=2018-12-23 \|access-date=2018-05-24 \|archive-url=https://web.archive.org/web/20181223080756/http://ssg.mit.edu/cal/abs/2000_spring/np_dens/classification/cover67.pdf \|url-status=dead }}</ref> Various improvements to the ''k''-NN speed are possible by using proximity graphs.<ref>{{cite journal \|doi=10.1142/S0218195905001622 \|last=Toussaint \|first=Godfried T. \|title=Geometric proximity graphs for improving nearest neighbor methods in instance-based learning and data mining \|journal=International Journal of Computational Geometry and Applications \|volume=15 \|issue=2 \|date=April 2005 \|pages=101–150 }}</ref> For multi-class ''k-''NN classification, [[Thomas M. Cover\|Cover]] and [[Peter E. Hart\|Hart]] (1967) prove an upper bound error rate of Line 188: Use ''U'' instead of ''X'' for classification. The examples that are not prototypes are called "absorbed" points. It is efficient to scan the training examples in order of decreasing border ratio.<ref name="MirkesKnn">Mirkes, Evgeny M.; [http://www.math.le.ac.uk/people/ag153/homepage/KNN/KNN3.html ''KNN and Potential Energy: applet''] {{Webarchive\|url=https://web.archive.org/web/20120119201715/http://www.math.le.ac.uk/people/ag153/homepage/KNN/KNN3.html \|date=2012-01-19 }}, University of Leicester, 2011</ref> The border ratio of a training example ''x'' is defined as {{block indent \| em = 1.5 \| text = {{math\|1= ''a''(''x'') = {{sfrac\|{{norm\|''x'-y''}}\|{{norm\|''x-y''}}}} }} }}

K-nearest neighbors algorithm: Difference between revisions