Revision as of 04:18, 13 May 2016 edit Egoodfellow (talk \| contribs) 10 edits m →Overview ← Previous edit		Revision as of 10:51, 15 May 2016 edit undo Rja112 (talk \| contribs) 15 edits added information and reference to this article . Tag: Visual edit Next edit →
Line 28: == Clustering v. Classifying == Clustering algorithms in computational text analysis groups documents into what are called subsets or ''clusters'' where the algorithm's goal is to create internally coherent clusters that are distinct from one another.<ref>{{Cite web\|url=http://nlp.stanford.edu/IR-book/\|title=Introduction to Information Retrieval\|website=nlp.stanford.edu\|pages=349\|access-date=2016-05-03}}</ref> Classification on the other hand, is a form of [[supervised learning]] where the individual coder creates internal, coherent clusters that are based on either [[Inductive reasoning\|inductive]], [[Deductive reasoning\|deductive]], or [[Abductive reasoning\|abductive]] reasoning. Clustering relies on no supervisory teacher imposing previously derived categories upon the data, just types of distances, of which the most commonly found distance is [[Euclidean distance\|Euclidean]].<ref>{{Cite web\|url=http://nlp.stanford.edu/IR-book/\|title=Introduction to Information Retrieval\|website=nlp.stanford.edu\|pages=349–50\|access-date=2016-05-03}}</ref> Implementation the system of document clustering using k-means algorithm, which makes faster searching of unstructured data as well as structured data.<ref>{{Cite journal\|last=Shewale\|first=\|date=April 2016\|title=DOCUMENT CLUSTERING USING K MEANS ALGORITHMS\|url=http://ijre.org/wp-content/uploads/2016/04/IJRE_DOCUMENT_CLUSTERING_USING_K_MEANS_ALGORITHMS_30431.pdf\|journal=International Journal of Research and Engineering\|doi=\|pmid=\|access-date=}}</ref> == References ==

Document clustering: Difference between revisions