Unicode character property: Difference between revisions

Content deleted Content added
Bidirectional writing: Corrected markup per MOS:BOLD and MOS:WAW, other tweaks
Line 109:
 
==Bidirectional writing==
Six character properties pertain to bi-directional writing: ''Bidi_Class'', ''Bidi_Control'', ''Bidi_Mirrored'', ''Bidi_Mirroring_Glyph'', ''Bidi_Paired_Bracket'' and ''Bidi_Paired_Bracket_Type''.
 
One of Unicode's major features is support of bi-directional (''Bidi'') text display right-to-left (R-to-L) and left-to-right (L-to-R). The Unicode Bidirectional Algorithm UAX9<ref name="UAX9">{{cite web|url=https://www.unicode.org/reports/tr9/|title=Unicode Standard Annex #9: Unicode Bidirectional Algorithm|work=The Unicode Standard|date=2024-09-02}}</ref> describes the process of presenting text with altering script directions. For example, it enables a Hebrew quote in an English text. The ''Bidi_Character_Type'' marks a character's behaviour in directional writing. To override a direction, Unicode has defined special ''formatting control characters'' ('''Bidi-Control'''s). These characters can enforce a direction, and by definition only affect bi-directional writing.
 
Each code point has a property called '''Bidi_Class'''. It defines its behaviour in a bidirectional text as interpreted by the algorithm:
 
{{Bidi Class (Unicode)}}
 
In normal situations, the algorithm can determine the direction of a text by this character property. To control more complex Bidi situations, e.g. when an English text has a Hebrew quote, extra options are added to Unicode. Twelve characters have the property '''{{code|1=Bidi_Control=Yes'''}}: ALM, FSI, LRE, LRI, LRM, LRO, PDF, PDI, RLE, RLI, RLM and RLO as named in the table. These are invisible formatting control characters, only used by the algorithm and with no effect outside of bidirectional formatting.<ref name="UAX9"/> Despite the name, they are formatting characters, not control characters, and have General category "''Other, format (Cf)"'' in the Unicode definition.
 
Basically, the algorithm determines a sequence of characters with the same strong direction type (R-to-L ''or'' L-to-R), taking in account an overruling by the special Bidi-controls. Number strings (Weak types) are assigned a direction according to their strong environment, as are Neutral characters. Finally, the characters are displayed per a string's direction.
 
Two character properties are relevant to determining a mirror image of a glyph in bidirectional text: '''{{code|1=Bidi_Mirrored=Yes'''}} indicates that the glyph should be mirrored when written R-to-L. The property '''{{code|1=Bidi_Mirroring_Glyph=U+''hhhh'''''}} can then point to the mirrored character. For example, bracketsparentheses "{{char|(}}, {{char|)"}} are mirrored this way. Shaping cursive scripts such as Arabic, and mirroring glyphs that have a direction, is not part of the algorithm.
<!-- Bidi_Paired_Bracket and Bidi_Paired_Bracket_Type go here -->