Content deleted Content added
expanding |
few additions |
||
Line 1:
{{Distinguish|word normalization|Unicode normalization}}
{{unreferenced|date=October 2007}}
Line 12 ⟶ 13:
* expanding abbreviations
* removing [[stopwords]] or "too common" words
* [[stemming|word normalization]] (also known as stemming)
* text [[canonicalization]] (replacing words with their full equivalents, e.g. "co-operation" → "cooperation", "valour" → "valor", "should've" → "should have")
* removing repeating characters ("I looooove it!" → "I love it!")
|