Text normalization: Difference between revisions

Content deleted Content added
Textual scholarship: expand a bit (per cited source)
See also: annlink
 
(6 intermediate revisions by 6 users not shown)
Line 1:
{{shortShort description|processProcess of transforming text into a single canonical form}}
{{Use American English|date=March 2021}}
{{Use mdy dates|date=March 2021}}
Line 13:
== Applications==
 
Text normalization is frequently used when converting [[speech synthesis|text to speech]]. [[Number]]s, [[Calendar date|date]]s, [[acronym]]s, and [[abbreviation]]s are non-standard "words" that need to be pronounced differently depending on context.<ref name="sproate">Sproat, R.; Black, A.; Chen, S.; Kumar, S.; OstendorfkOstendorf, M.; Richards, C. (2001). "Normalization of non-standard words." ''Computer Speech and Language'' '''15'''; 287–333. [[Digital object identifier|doi]]:[https://dx.doi.org/10.1006/csla.2001.0169 10.1006/csla.2001.0169].</ref> For example:
 
* "$200" would be pronounced as "two hundred dollars" in English, but as "lua selau tālā" in Samoan.<ref>{{cite web
Line 20:
| accessdate = October 2, 2012
| url = http://mylanguages.org/samoan_numbers.php}}</ref>
* "vi" could be pronounced as "[[viViolet (name)|vie]]e," "[[VioletVi (nametext editor)|vee]]," or "[[Roman numerals|the sixth]]" depending on the surrounding words.<ref name="msdn">{{cite web
| title = Text-to-Speech Engines Text Normalization
| work = MSDN
Line 36:
 
== See also ==
* [[{{annotated link|Automated paraphrasing]]}}
* [[{{annotated link|Canonicalization]]}}
* [[{{annotated link|Text simplification]]}}
* [[{{annotated link|Unicode equivalence]]}}
 
== References ==