Text normalization: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 20:14, 20 November 2021 edit 209.209.9.162 (talk) No edit summary Tag: Reverted ← Previous edit		Latest revision as of 14:00, 14 November 2024 edit undo Kku (talk \| contribs) Extended confirmed users 122,082 edits →See also: annlink
(4 intermediate revisions by 4 users not shown)
Line 1: {{~~short~~Short description\|~~process~~Process of transforming text into a single canonical form}} {{Use American English\|date=March 2021}} {{Use mdy dates\|date=March 2021}} Line 15: Text normalization is frequently used when converting [[speech synthesis\|text to speech]]. [[Number]]s, [[Calendar date\|date]]s, [[acronym]]s, and [[abbreviation]]s are non-standard "words" that need to be pronounced differently depending on context.<ref name="sproate">Sproat, R.; Black, A.; Chen, S.; Kumar, S.; Ostendorf, M.; Richards, C. (2001). "Normalization of non-standard words." ''Computer Speech and Language'' '''15'''; 287–333. [[Digital object identifier\|doi]]:[https://dx.doi.org/10.1006/csla.2001.0169 10.1006/csla.2001.0169].</ref> For example: * "$~~100~~200" would be pronounced as "~~one~~two hundred dollars" in English, but as "lua selau tālā" in Samoan.<ref>{{cite web \| title = Samoan Numbers \| work = MyLanguages.org \| accessdate = October 2, 2012 \| url = http://mylanguages.org/samoan_numbers.php}}</ref> * "vi" could be pronounced as "[[viViolet (name)\|vie]]e," "[[~~Violet~~Vi (~~name~~text editor)\|vee]]," or "[[Roman numerals\|the sixth]]" depending on the surrounding words.<ref name="msdn">{{cite web \| title = Text-to-Speech Engines Text Normalization \| work = MSDN Line 36: == See also == * [[{{annotated link\|Automated paraphrasing]]}} * [[{{annotated link\|Canonicalization]]}} * [[{{annotated link\|Text simplification]]}} * [[{{annotated link\|Unicode equivalence]]}} == References ==