Document clustering: Difference between revisions

Content deleted Content added
fix - layout / structure
m linking
Line 15:
These algorithms can further be classified as hard or soft clustering algorithms. Hard clustering computes a hard assignment – each document is a member of exactly one cluster. The assignment of soft clustering algorithms is soft – a document’s assignment is a distribution over all clusters. In a soft assignment, a document has fractional membership in several clusters.<ref name="manning"/>{{rp|499}} [[Dimensionality reduction]] methods can be considered a subtype of soft clustering; for documents, these include [[latent semantic indexing]] ([[truncated singular value decomposition]] on term histograms)<ref>http://nlp.stanford.edu/IR-book/pdf/16flat.pdf</ref> and [[topic model]]s.
 
Other algorithms involve graph based clustering, [[ontology (information science)|ontology]] supported clustering and order sensitive clustering.
 
Given a clustering, it can be beneficial to automatically derive human-readable labels for the clusters. [[Cluster labeling|Various methods]] exist for this purpose.