Revision as of 04:07, 8 October 2012 edit Rangi42 (talk \| contribs) 558 edits →Applications ← Previous edit		Revision as of 23:19, 7 November 2012 edit undo BG19bot (talk \| contribs) 1,005,055 edits WP:CHECKWIKI error fix #86. External link with two brackets. Do general fixes and cleanup if needed using AWB (8512) Next edit →
Line 17: \| accessdate = October 2, 2012 \| url = http://mylanguages.org/samoan_numbers.php}}</ref> * "vi" could be pronounced as "[[vi~~\|vie~~]]e," "[[Violet (name)\|vee]]," or "[[Roman numerals\|the sixth]]" depending on the surrounding words.<ref name="msdn">{{cite web \| title = Text-to-Speech Engines Text Normalization \| work = MSDN Line 27: == Techniques == For simple, context-independent normalization, such as removing non-[[alphanumeric]] characters or [[diacritical marks]], [[regular expressions]] would suffice. For example, the [[sed]] script <tt>sed -e "s/\s+/ /g" ''inputfile''</tt> would normalize runs of [[whitespace character]]s into a single space. More complex normalization requires correspondingly complicated algorithms, including [[___domain knowledge]] of the language and vocabulary being normalized. Among other approaches, text normalization has been modeled as a problem of tokenizing and tagging streams of text<ref name="tagging">Zhu, C.; Tang, J.; Li, H.; Ng , H.; Zhao, T. (2007). "A Unified Tagging Approach to Text Normalization." ''Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics''; 688–695. [[Digital object identifier\|doi]]:[[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.72.8138 10.1.1.72.8138]].</ref> and as a special case of machine translation.<ref name="mt">Filip, G.; Krzysztof, J.; Agnieszka, W.; Mikołaj, W. (2006). [http://www.proceedings2006.imcsit.org/pliks/202.pdf "Text Normalization as a Special Case of Machine Translation."] ''Proceedings of the International Multiconference on Computer Science and Information Technology'' '''1'''; 51–56.</ref>. == References == Line 39: [[Category:Natural language processing]] {{compu-sci-stub}}

Text normalization: Difference between revisions