Content deleted Content added
No edit summary |
m I added a header describing the difference between Clustering and Classification methods, under supervised v. unsupervised learning processes. I also contributed an additional reference: "Introduction to Information Retrieval" by Manning et al. |
||
Line 24:
* Clustering divides the results of a search for "cell" into groups like "biology," "battery," and "prison."
* [http://FirstGov.gov FirstGov.gov], the official Web portal for the U.S. government, uses document clustering to automatically organize its search results into categories. For example, if a user submits “immigration”, next to their list of results they will see categories for “Immigration Reform”, “Citizenship and Immigration Services”, “Employment”, “Department of Homeland Security”, and more.
== Clustering v. Classifying ==
Clustering algorithms in computational text analysis groups documents into what are called subsets or ''clusters'' where the algorithm's goal is to create internally coherent clusters that are distinct from one another<ref>{{Cite web|url=http://nlp.stanford.edu/IR-book/|title=Introduction to Information Retrieval|website=nlp.stanford.edu|pages=349|access-date=2016-05-03}}</ref>. Classification on the other hand, is a form of [[supervised learning]] where the individual coder creates internal, coherent clusters that are based on either [[Inductive reasoning|inductive]], [[Deductive reasoning|deductive]], or [[Abductive reasoning|abductive]] reasoning. Clustering relies on no supervisory teacher imposing previously derived categories upon the data, just types of distances, of which the most commonly found distance is [[Euclidean distance|Euclidean]]<ref>{{Cite web|url=http://nlp.stanford.edu/IR-book/|title=Introduction to Information Retrieval|website=nlp.stanford.edu|pages=349-50|access-date=2016-05-03}}</ref>.
== References ==
{{reflist}}
Publications:
* Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. ''Flat Clustering'' in <u>Introduction to Information Retrieval.</u> Cambridge University Press. 2008
* Nicholas O. Andrews and Edward A. Fox, Recent Developments in Document Clustering, October 16, 2007 [http://eprints.cs.vt.edu/archive/00001000/01/docclust.pdf]
* Claudio Carpineto, Stanislaw Osiński, Giovanni Romano, Dawid Weiss. A survey of Web clustering engines. ACM Computing Surveys, Volume 41, Issue 3 (July 2009), Article No. 17, {{ISSN|0360-0300}}
|