Revision as of 20:49, 29 September 2016 edit 77.176.195.164 (talk) Remove spam links ← Previous edit		Revision as of 21:19, 7 October 2016 edit undo Shiv4nsh (talk \| contribs) 1 edit No edit summary Next edit →
Line 37: 2. [[Stemming]] and [[lemmatization]] Different tokens might carry out similar information (e.g. ~~tokenizaiton~~tokenization and tokenizing). And we can avoid calculating similar information repeatedly by reducing all tokens to its base form using various stemming and lemmatization dictionaries. 3. Removing [[stop words]] and [[punctuation]]

Document clustering: Difference between revisions