Text normalization: Difference between revisions

Content deleted Content added
No edit summary
Tag: Reverted
See also: annlink
 
(4 intermediate revisions by 4 users not shown)
Line 1:
{{shortShort description|processProcess of transforming text into a single canonical form}}
{{Use American English|date=March 2021}}
{{Use mdy dates|date=March 2021}}
Line 15:
Text normalization is frequently used when converting [[speech synthesis|text to speech]]. [[Number]]s, [[Calendar date|date]]s, [[acronym]]s, and [[abbreviation]]s are non-standard "words" that need to be pronounced differently depending on context.<ref name="sproate">Sproat, R.; Black, A.; Chen, S.; Kumar, S.; Ostendorf, M.; Richards, C. (2001). "Normalization of non-standard words." ''Computer Speech and Language'' '''15'''; 287–333. [[Digital object identifier|doi]]:[https://dx.doi.org/10.1006/csla.2001.0169 10.1006/csla.2001.0169].</ref> For example:
 
* "$100200" would be pronounced as "onetwo hundred dollars" in English, but as "lua selau tālā" in Samoan.<ref>{{cite web
| title = Samoan Numbers
| work = MyLanguages.org
| accessdate = October 2, 2012
| url = http://mylanguages.org/samoan_numbers.php}}</ref>
* "vi" could be pronounced as "[[viViolet (name)|vie]]e," "[[VioletVi (nametext editor)|vee]]," or "[[Roman numerals|the sixth]]" depending on the surrounding words.<ref name="msdn">{{cite web
| title = Text-to-Speech Engines Text Normalization
| work = MSDN
Line 36:
 
== See also ==
* [[{{annotated link|Automated paraphrasing]]}}
* [[{{annotated link|Canonicalization]]}}
* [[{{annotated link|Text simplification]]}}
* [[{{annotated link|Unicode equivalence]]}}
 
== References ==