K-nearest neighbors algorithm: Difference between revisions

Content deleted Content added
Moueza (talk | contribs)
Minor intro structuring
Tags: Mobile edit Mobile web edit
Line 8:
* In ''k-NN regression'', the output is the property value for the object. This value is the average of the values of ''k'' nearest neighbors. If ''k'' = 1, then the output is simply assigned to the value of that single nearest neighbor.
 
''k''-NN is a type of [[classification]] where the function is only approximated locally and all computation is deferred until function evaluation. Since this algorithm relies on distance for classification, if the features represent different physical units or come in vastly different scales, then feature-wise [[Normalization (statistics)|normalizing]] of the training data can greatly improve its accuracy dramatically.<ref>{{Cite book|last=Hastie, Trevor.|title=The elements of statistical learning : data mining, inference, and prediction : with 200 full-color illustrations|date=2001|publisher=Springer|others=Tibshirani, Robert., Friedman, J. H. (Jerome H.)|isbn=0-387-95284-5|___location=New York|oclc=46809224}}</ref>
 
BothFor forboth classification and regression, a useful technique can be to assign weights to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme consists inof giving each neighbor a weight of 1/''d'', where ''d'' is the distance to the neighbor.<ref>This scheme is a generalization of linear interpolation.</ref>
 
The neighbors are taken from a set of objects for which the class (for ''k''-NN classification) or the object property value (for ''k''-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required.
 
A peculiarity (sometimes even a disadvantage) of the ''k''-NN algorithm is that it isits sensitivesensitivity to the local structure of the data.
 
==Statistical setting==