Text normalization: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 15:16, 18 April 2021 edit GrindtXX (talk \| contribs) Extended confirmed users, IP block exemptions 34,584 edits →Textual scholarship: expand a bit (per cited source) ← Previous edit		Latest revision as of 14:00, 14 November 2024 edit undo Kku (talk \| contribs) Extended confirmed users 122,082 edits →See also: annlink
(6 intermediate revisions by 6 users not shown)
Line 1: {{~~short~~Short description\|~~process~~Process of transforming text into a single canonical form}} {{Use American English\|date=March 2021}} {{Use mdy dates\|date=March 2021}} Line 13: == Applications== Text normalization is frequently used when converting [[speech synthesis\|text to speech]]. [[Number]]s, [[Calendar date\|date]]s, [[acronym]]s, and [[abbreviation]]s are non-standard "words" that need to be pronounced differently depending on context.<ref name="sproate">Sproat, R.; Black, A.; Chen, S.; Kumar, S.; ~~Ostendorfk~~Ostendorf, M.; Richards, C. (2001). "Normalization of non-standard words." ''Computer Speech and Language'' '''15'''; 287–333. [[Digital object identifier\|doi]]:[https://dx.doi.org/10.1006/csla.2001.0169 10.1006/csla.2001.0169].</ref> For example: * "$200" would be pronounced as "two hundred dollars" in English, but as "lua selau tālā" in Samoan.<ref>{{cite web Line 20: \| accessdate = October 2, 2012 \| url = http://mylanguages.org/samoan_numbers.php}}</ref> * "vi" could be pronounced as "[[viViolet (name)\|vie]]e," "[[~~Violet~~Vi (~~name~~text editor)\|vee]]," or "[[Roman numerals\|the sixth]]" depending on the surrounding words.<ref name="msdn">{{cite web \| title = Text-to-Speech Engines Text Normalization \| work = MSDN Line 36: == See also == * [[{{annotated link\|Automated paraphrasing]]}} * [[{{annotated link\|Canonicalization]]}} * [[{{annotated link\|Text simplification]]}} * [[{{annotated link\|Unicode equivalence]]}} == References ==