Revision as of 19:57, 24 July 2024 edit Moueza (talk \| contribs) 29 edits →See also ← Previous edit		Revision as of 11:49, 29 September 2024 edit undo Perceptron599 (talk \| contribs) 389 edits Minor intro structuring Tags: Mobile edit Mobile web edit Next edit →
Line 8: * In ''k-NN regression'', the output is the property value for the object. This value is the average of the values of ''k'' nearest neighbors. If ''k'' = 1, then the output is simply assigned to the value of that single nearest neighbor. ''k''-NN is a type of [[classification]] where the function is only approximated locally and all computation is deferred until function evaluation. Since this algorithm relies on distance for classification, if the features represent different physical units or come in vastly different scales, then feature-wise [[Normalization (statistics)\|normalizing]] of the training data can greatly improve its accuracy ~~dramatically~~.<ref>{{Cite book\|last=Hastie, Trevor.\|title=The elements of statistical learning : data mining, inference, and prediction : with 200 full-color illustrations\|date=2001\|publisher=Springer\|others=Tibshirani, Robert., Friedman, J. H. (Jerome H.)\|isbn=0-387-95284-5\|___location=New York\|oclc=46809224}}</ref> ~~Both~~For ~~for~~both classification and regression, a useful technique can be to assign weights to the contributions of the neighbors, so that ~~the~~ nearer neighbors contribute more to the average than ~~the more~~ distant ones. For example, a common weighting scheme consists inof giving each neighbor a weight of 1/''d'', where ''d'' is the distance to the neighbor.<ref>This scheme is a generalization of linear interpolation.</ref> The neighbors are taken from a set of objects for which the class (for ''k''-NN classification) or the object property value (for ''k''-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required. A peculiarity (sometimes even a disadvantage) of the ''k''-NN algorithm is ~~that it is~~its ~~sensitive~~sensitivity to the local structure of the data. ==Statistical setting==

K-nearest neighbors algorithm: Difference between revisions