Text normalization: Difference between revisions

Content deleted Content added
added 'unreferenced' template
No edit summary
Line 8:
* converting all letters to lower or upper case
* removing punctuation
* removing letters with accent marks and other diacritics from letters
* expanding abbreviations
* removing [[stopwords]] or "too common" words
 
While this may be done manually, and usually is in the case of ad hoc and personal documents, many [[programming language]]s support mechanisms which enable text normalization.