Revision as of 21:07, 8 February 2022 edit Spitzak (talk \| contribs) Extended confirmed users 10,500 edits →Unicode introduced separators: remove duplicate links and useless html ← Previous edit		Revision as of 22:06, 8 February 2022 edit undo Spitzak (talk \| contribs) Extended confirmed users 10,500 edits →Language tags Next edit →
Line 26: == Language tags == {{main\|Tags (Unicode block)}} Unicode previously included 128 characters, now deprecated, for language tags. These characters essentially mirrored the 128 ASCII characters but were used to identify the subsequent text as belonging to a particular language according to [[BCP 47]]. For example, to indicate subsequent text as the variant of English as written in the United States, the ~~initiating~~sequence ~~‘Language Tag character’ (U+~~{{unichar\|E0001)\|LANGUAGE ~~followed~~TAG}}, ~~by the sequence ‘Tag~~{{unichar\|E0065\|Tag Small Letter ~~e’ (U+E0065)~~e}}, ~~‘Tag~~{{unichar\|E006E\|Tag Small Letter ~~n’ (U+E006E)~~n}}, ~~‘Tag~~{{unichar\|E002D\|Tag Hyphen-~~minus’ (U+E002D)~~minus}}, ~~‘Tag~~{{unichar\|E0075\|Tag Small Letter ~~u’ (U+E0075)~~u}} and ~~‘Tag~~{{unichar\|E0073\|Tag Small Letter ~~s’ (U+E0073)~~s}} would have been used. These language tag characters would not be displayed themselves. However, they would provide information for text processing or even for the display of other characters. For example, the display of Unihan ideographs might have substituted different glyphs if the language tags indicated Korean than if the tags indicated Japanese. Another example, might have influenced the display of decimal digits 0 through 9 differently depending on the language they appeared in. The tag characters U+{{unichar\|E0001,\|LANGUAGE ~~U+E0020–U+E007E,~~TAG}} and U+{{unichar\|E007F\|CANCEL TAG}} were deprecated in Unicode 5.1 (2008) and should not be used for language information.<ref>{{cite document\|url=http://tools.ietf.org/html/rfc6082\|title=RFC6082: Deprecating Unicode Language Tag Characters: RFC 2482 is Historic \| publisher=Internet Engineering Task Force (IETF)\|date=November 2010\|last1=Klensin \|first1=John C. \|last2=Presuhn \|first2=Randy \|last3=Whistler \|first3=Ken \|last4=Dürst \|first4=Martin J. \|last5=Adams \|first5=Glenn }}</ref> The characters {{tt\|U+E0020—U+E0073}} were also deprecated, but were restored with the release of Unicode 8.0 (2015). The change was made "to clear the way for the potential future use of tag characters for a purpose other than to represent language tags".<ref name="migration">{{cite web\|url=http://unicode.org/versions/Unicode8.0.0/#Migration\|title=Unicode 8.0.0, Implications for Migration \| publisher=Unicode Consortium}}</ref> ~~With the release of Unicode 8.0 (2015), U+E0020–U+E007E are no longer deprecated characters.~~ ~~(U+E0001 LANGUAGE TAG and U+E007F CANCEL TAG remain deprecated.)~~ The change was made "to clear the way for the potential future use of tag characters for a purpose other than to represent language tags".<ref name="migration">{{cite web\|url=http://unicode.org/versions/Unicode8.0.0/#Migration\|title=Unicode 8.0.0, Implications for Migration \| publisher=Unicode Consortium}}</ref> Unicode states that "the use of tag characters to represent language tags in a plain text stream is still a deprecated mechanism for conveying language information about text.<ref name="migration" />

Unicode control characters: Difference between revisions