Nearest centroid classifier

This is an old revision of this page, as edited by Velvel2 (talk | contribs) at 04:38, 7 March 2015 (image). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In machine learning, a nearest centroid or nearest prototype classifier is a classification model that assigns to observations the label of the class of training samples whose mean (centroid) is closest to the observation.

Rocchio Classification

When applied to text classification using tf*idf vectors to represent documents, the nearest centroid classifier is known as the Rocchio classifier because of its similarity to the Rocchio algorithm for relevance feedback.[1]

An extended version of the nearest centroid classifier has found applications in the medical ___domain, specifically classification of tumors.[2]

Algorithm

  • Training procedure: given labeled training samples   with class labels  , compute the per-class centroids   where   is the set of indices of samples belonging to class  .
  • Prediction function: the class assigned to an observation   is  .

See also

References

  1. ^ Manning, Christopher; Raghavan, Prabhakar; Schütze, Hinrich (2008). "Vector space classification". Introduction to Information Retrieval. Cambridge University Press.
  2. ^ Tibshirani, Robert; Hastie, Trevor; Narasimhan, Balasubramanian; Chu, Gilbert (2002). "Diagnosis of multiple cancer types by shrunken centroids of gene expression". Proceedings of the National Academy of Sciences. 99 (10). doi:10.1073/pnas.082099299.