Unicode control characters: Difference between revisions

Content deleted Content added
No edit summary
Line 2:
Many '''[[Unicode]] control characters''' are used to control the interpretation or display of text, but these characters themselves have no visual or spatial representation. For example, the [[null character]] ({{unichar|0000|NULL|nlink=control characters}}) is used in C-programming application environments to indicate the end of a string of characters. In this way, these programs only require a single starting memory address for a string (as opposed to a starting address and a length), since the string ends once the program reads the null character.
 
In the narrowest sense, a ''control charactercode'' is onea character with the [[Unicode character property#General Category|general category]] {{code|Cc}}, which comprises the [[C0 and C1 control codes]], a concept defined in [[ISO/IEC 2022]] and inherited by Unicode, with the most common set being defined in [[ISO/IEC 6429]]. Control codes are handled distinctly from ordinary Unicode characters, for example, by not being assigned character names (although they are assigned normative formal aliases).<ref name="aliases">{{cite web |url=https://www.unicode.org/Public/UCD/latest/ucd/NameAliases.txt |title=Name Aliases |work=Unicode Character Database |institution=[[Unicode Consortium]]}}</ref> In a broader sense, other non-printing format characters, such as those used in [[bidirectional text]], are also referred to as ''control characters'' by software;<ref>{{cite web |url=http://kvota.net/guadec/localised-desktop-talk/ |title=Towards a localised desktop |quotation=For some cases where automatic decision making doesn't work, you can manually add specific direction markers by right-clicking the text field, choosing "Insert Unicode control character" from the menu, and selecting appropriate direction mark. This would allow you, for instance, to start your RTL text with an otherwise LTR word (such as "GNOME"). |first=Danilo |last=Segan}}</ref> these are mostly assigned to the general category {{code|Cf}} (format), used for format effectors introduced and defined by Unicode itself.
 
== Category "Cc" control codes (C0 and C1) ==
Line 8:
The control code ranges 0x00–0x1F ("C0") and 0x7F originate from the 1967 edition of [[US-ASCII]]. The standard [[ISO/IEC 2022]] (ECMA-35) defines extension methods for ASCII, including a secondary "C1" range of 8-bit control codes from 0x80 to 0x9F, equivalent to 7-bit sequences of {{ctrl|ESC}} with the bytes 0x40 through 0x5F. Collectively, codes in these ranges are known as the [[C0 and C1 control codes]]. Although ISO/IEC 2022 allows for the existence of multiple control code sets specifying differing interpretations of these control codes, their most common interpretation is specified in [[ISO/IEC 6429]] (ECMA-48).
The [[ISO/IEC 8859]] series of encodings conforms to [[ISO/IEC 4873]] (ECMA-43) level 1, a subset of ISO/IEC 2022 designed for 8-bit character encodings, and therefore designates the range 0x80–0x9F for use by a C1 control code set such as ISO/IEC 6429. Unicode inherits its [[Basic Latin (Unicode block)|first]] and [[Latin-1 Supplement (Unicode block)|second]] blocks (comprising U+0000 through U+00FF) from ASCII and [[ISO/IEC 8859-1]], thus incorporating the C0 and C1 control code ranges (U+0000&ndash;U+001F, U+007F&ndash;U+009F). It does not assign normative names to these control codes, though it does assign them normative aliases.<ref name="aliases" />
 
Category "Cc" control codes can serve a variety of purposes, not limited to format effectors: for example, the default ASCII C0 set includes six format effectors ({{ctrl|BS}}, {{ctrl|HT}}, {{ctrl|LF}}, {{ctrl|VT}}, {{ctrl|FF}} and {{ctrl|CR}}), ten transmission controls, four device controls, four information separators and eight other control codes.<ref name="ir001">{{citation|mode=cs1 |author=ISO/TC 97/SC 2 |author-link=ISO/IEC JTC 1/SC 2#History |title=The set of control characters of the ISO 646 |date=1975 |publisher=ITSCJ/[[Information Processing Society of Japan|IPSJ]] |id=ISO-IR-1 |url=https://www.itscj.ipsj.or.jp/iso-ir/001.pdf}}</ref> Most of these characters play no explicit role in Unicode text handling, and are used only by higher-level protocols such as those used by [[terminal emulator]]s. Certain characters are commonly used for formatting or [[sentinel value|sentinel]] purposes:
* {{unichar|0000||note=NUL: NULL}} (used in [[null-terminated string]]s)
* {{unichar|0009||note=HT: HORIZONTAL TABULATION}} (inserted by the [[tab key]])