C0 and C1 control codes: Difference between revisions

Content deleted Content added
{{anchor|C0}}C0 controls: LS1 and LS0 were swapped in the list (but the alternative names SO and SI were correct); these names do not appear in the reference for the table as a whole, so I have cited them individually to make it possible to confirm that they are the right way round now
Rescuing 2 sources and tagging 0 as dead.) #IABot (v2.0.9.5
 
(25 intermediate revisions by 10 users not shown)
Line 8:
 
=={{anchor|C0}}C0 controls==
[[ASCII]] defineddefines 32 control characters, plus athe necessary extraDEL character. forThis thelarge DELnumber characterof codes was desirable at the time, 7F<sub>HEX</sub>as ormulti-byte 01111111<sub>BIN</sub>controls (neededwould torequire punchimplementation outof alla state machine in the holesterminal, onwhich awas very difficult with papercontemporary tapeelectronics and erasemechanical it)terminals.
 
Only a few codes have maintained their use: BEL, ESC, and the "Format''format effector''<ref>{{linktextcite tech report|effectortitle=Standard ECMA-6 7-bit Coded Character Set|year=1965|url=https://www.ecma-international.org/wp-content/uploads/ECMA-6_5th_edition_march_1985.pdf|page=4}}"</ref> (FE<sub>n</sub>) characters BS, TAB, LF, VT, FF, and CR. Others are unused or have acquired different meanings such as NUL being the [[Null-terminated string|C string terminator]]. Some data transfer protocols such as [[ANPA-1312]], [[Kermit (protocol)|Kermit]], and [[XMODEM]] do make extensive use of SOH, STX, ETX, EOT, ACK, NAK and SYN for purposes approximating their original definitions; and some file formats use the "Information Separators" (IS<sub>n</sub>) such as the [[info (Unix)|Unix info]] format<ref>{{cite web |url=https://www2.pd.infn.it/TeX/doc/html/texinfo/info_2.html#SEC13 |title=Adding a new node to Info |work=Info: The online, menu-driven GNU documentation system |last=Fox |first=Brian |author-link=Brian Fox (computer programmer) |publisher=[[GNU Project]]}}</ref> and [[Python (programming language)|Python]]'s {{tt|splitlines}} string method.<ref>{{cite web |url=https://docs.python.org/3/library/stdtypes.html#str.splitlines |title=Built-in Types § str.splitlines |work=The Python Standard Library |publisher=[[Python Software Foundation]]}}</ref>
This large number of codes was desirable at the time, as multi-byte controls would require implementation of a state machine in the terminal, which was very difficult with contemporary electronics and mechanical terminals.
 
Only a few codes have maintained their use: BEL, ESC, and the "Format {{linktext|effector}}" (FE<sub>n</sub>) characters BS, TAB, LF, VT, FF, and CR. Others are unused or have acquired different meanings such as NUL being the [[Null-terminated string|C string terminator]]. Some data transfer protocols such as [[ANPA-1312]], [[Kermit (protocol)|Kermit]], and [[XMODEM]] do make extensive use of SOH, STX, ETX, EOT, ACK, NAK and SYN for purposes approximating their original definitions; and some file formats use the "Information Separators" (IS<sub>n</sub>) such as the [[info (Unix)|Unix info]] format<ref>{{cite web |url=https://www2.pd.infn.it/TeX/doc/html/texinfo/info_2.html#SEC13 |title=Adding a new node to Info |work=Info: The online, menu-driven GNU documentation system |last=Fox |first=Brian |author-link=Brian Fox (computer programmer) |publisher=[[GNU Project]]}}</ref> and [[Python (programming language)|Python]]'s {{tt|splitlines}} string method.<ref>{{cite web |url=https://docs.python.org/3/library/stdtypes.html#str.splitlines |title=Built-in Types § str.splitlines |work=The Python Standard Library |publisher=[[Python Software Foundation]]}}</ref>
 
The names of some codes were changed in ISO 6429:1992 (or ECMA-48:1991) to be neutral with respect to writing direction. The abbreviations used were not changed, as the standard had already specified that those would remain unchanged when the standard is translated to other languages. In this table both new and old names are shown for the renamed controls (the old name is the one matching the abbreviation).
 
Unicode provides [[Control Pictures]] that can replace C0 control characters to make them visible on screen. However [[caret notation]] is used more often.
 
{{anchor|ASCII}}
Line 23:
! {{vert header|Hexadecimal}}
! Abbreviations
! {{vert header|[[Control Pictures|Symbol]]}}
! Name
! {{vert header|[[Escape Sequences in C|C escape]]}}
Line 139:
 
Except for {{control code link|internal=1|SS2}} and {{control code link|internal=1|SS3}} in [[EUC-JP]] text, and {{control code link|internal=1|NEL}} in text transcoded from [[EBCDIC]], the 8-bit forms of these codes were almost never used. {{control code link|internal=1|CSI}}, {{control code link|internal=1|DCS}} and {{control code link|internal=1|OSC}} are used to control [[text terminal]]s and [[terminal emulator]]s, but almost always by using their 7-bit escape code representations. Nowadays if these codes are encountered it is far more likely they are intended to be printing characters from that position of [[Windows-1252]] or [[Mac OS Roman]].
 
Except for {{control code link|internal=1|NEL}}, Unicode does not provide a "control picture" for any of these. There is no well-known variation of Caret notation for them either.
 
{| class="wikitable"
Line 225 ⟶ 227:
Several official and unofficial alternatives have been defined, but this is pretty much obsolete. Most were forced to retain a good deal of compatibility with the ASCII controls for interoperability. The standard makes ESC,<ref name="tc-c0"/><ref>{{cite iso-ir |number=104 |sponsor1=ISO/TC97/SC2/WG-7 |sponsor-link1=ISO/IEC JTC 1/SC 2#History |sponsor2=ECMA |sponsor-link2=Ecma International |title=Minimum C0 set for ISO 4873 |date=1985-08-01}}</ref> SP and DEL{{efn|[[ISO/IEC 4873]] extends this requirement to the C1 SS2 and SS3,{{refn|{{cite iso-ir |number=105 |title=Minimum C1 Set for ISO 4873 |date=1985-08-01 |sponsor1=ISO/TC97/SC2/WG-7 |sponsor-link1=ISO/IEC JTC 1/SC 2#History |sponsor2=ECMA |sponsor-link2=Ecma International}}}} although ISO/IEC 2022 itself does not.}} "fixed" coded characters, which are available in their ASCII locations in all encodings that conform to the standard.<ref>{{cite book |id=ECMA-35 |title=Character Code Structure and Extension Techniques |edition=6th |type=ECMA Standard |date=1994 |author=ECMA |author-link=Ecma International |url=https://www.ecma-international.org/wp-content/uploads/ECMA-35_6th_edition_december_1994.pdf#page=17 |page=7 |section=6.2: Fixed coded characters}}</ref> It also specifies that if a C0 set included transmission control (TC<sub>n</sub>) codes, they must be encoded at their ASCII locations<ref name="tc-c0">{{cite book |id=ECMA-35 |title=Character Code Structure and Extension Techniques |edition=6th |type=ECMA Standard |date=1994 |author=ECMA |author-link=Ecma International |url=https://www.ecma-international.org/wp-content/uploads/ECMA-35_6th_edition_december_1994.pdf#page=21 |page=11 |section=6.4.2: Primary sets of coded control functions}}</ref> and could not be put in a C1 set,<ref name="tc-c1">{{cite book |id=ECMA-35 |title=Character Code Structure and Extension Techniques |edition=6th |type=ECMA Standard |date=1994 |author=ECMA |author-link=Ecma International |url=https://www.ecma-international.org/wp-content/uploads/ECMA-35_6th_edition_december_1994.pdf#page=21 |page=11 |section=6.4.3: Supplementary sets of coded control functions}}</ref> and any new transmission controls must be in a C1 set.<ref name="tc-c0" />
 
=== OtherAlternative C0 control codecharacter sets ===
* [[ANPA-1312#C0 control codes|ANPA-1312]], a text markup language used for news transmission, replaces several C0 control characters.
* [[IPTC 7901#C0 control codes|IPTC 7901]], the newer international version of the above, has its own variations.
* [[Videotex character set#C0 control codes|Videotex]] has a completely different set.
* [[Teletext character set#Control characters|Teletext]] also defines a set similar to Videotex.
* [[ITU T.61|T.61]]/[[ITU T.51|T.51]],<ref name="T61C0">{{cite iso-ir |sponsor=ITU |sponsor-link=ITU |title=Teletex Primary Set of Control Functions |date=1985 |number=106}}</ref> and others<ref>{{cite iso-ir |sponsor=Úřad pro normalizaci a měřeni |title=The set of control characters of ISO 646, with EM replaced by SS2 |date=1987 |number=140}}</ref> replaced EM and GS with SS2 and SS3 so these functions could be used in a 7-bit environment without resorting to [[ANSI_escape_sequence#Fe_Escape_sequences|escape sequences]].
* Some sets replaced FS with SS2,<ref>{{cite iso-ir |sponsor=ISO/TC 97/SC 2 |sponsor-link=ISO/IEC JTC 1/SC 2#History |title=The set of control characters of ISO 646, with IS4 replaced by Single Shift for G2 (SS2) |date=1977 |number=36}}</ref> (same as ANPA-1312).
* {{anchor|JIS C 6225|JIS X 0207}}The now-withdrawn JIS C 6225, designated JIS X 0207 in later sources.<ref name="wg6">{{cite web |url=http://original-jpeg.org/Document%20archive/wg8/wg8n0604.pdf |id=ISO/TC97/SC2/WG6 N317.rev |title=Liaison statement to ISO/TC97/SC2/WG8 and ISO/TC97/SC18/WG8 |author=ISO/TC97/SC2/WG6 |author-link=ISO/IEC JTC 1/SC 2#History |archive-url=https://web.archive.org/web/20201026055422/http://original-jpeg.org/Document%20archive/wg8/wg8n0604.pdf |archive-date=2020-10-26 |url-status=dead}}</ref> replaced FS with CEX or "Control Extension"<ref>{{cite iso-ir |sponsor=ISO/TC 97/SC 2 |sponsor-link=ISO/IEC JTC 1/SC 2#History |title=The C0 set of Control Characters of Japanese Standard JIS C 6225-1979 |date=1982 |number=74}}</ref> which introduces control sequences for vertical text behaviour, superscripts and subscripts<ref>{{citation|mode=cs1 |url=http://printronix.com/wp-content/uploads/manuals/PTX_PRM_OKI_N7_256482A.pdf#page=26 |title=OKI® Programmer's Reference Manual |page=26 |author=Printronix |year=2012}}</ref> and for transmitting [[Dynamically Redefined Character Set|custom character graphics]].<ref name="wg6" />
 
=== ReplacementAlternative C1 character sets ===
* A specialized C1 control code set is registered for bibliographic use (including string collation), such as by [[MARC-8#C1 control codes|MARC-8]].<ref name="din31626">{{cite iso-ir |number=40 |title=Additional Control Codes for Bibliographic Use according to German Standard DIN 31626 |date=1979-07-15 |sponsor=DIN |sponsor-link=DIN}}</ref><ref name="iso6630-old">{{cite iso-ir |number=67 |title=Additional Control Codes for Bibliographic Use according to International Standard ISO 6630 |date=1983-06-01 |sponsor=ISO/TC 46}}</ref><ref name="iso6630-1985">{{cite iso-ir |number=124 |title=Additional Control Codes for Bibliographic Use according to International Standard ISO 6630 |date=1986-02-01 |sponsor=ISO/TC 46}}</ref>
* Various specialised C1 control code sets are registered for use by [[Videotex character set#C1 control codes|Videotex]] formats.<ref name="iso-ir" />
* The [[Stratus VOS]] operating system uses a C1 set called the ''NLS control set''.<ref>{{cite manual |chapter-url=https://stratadoc.stratus.com/vos/19.3.1/r212-00/wwhelp/wwhimpl/js/html/wwhelp.htm?href=ch4r212-00b.html |chapter=Overview of NLS Strings |title=National Language Support User's Guide (R212) |author=Stratus Technologies Ireland, Ltd.}}</ref> It includes SS1 (Single-Shift 1) through SS15 (Single-Shift 15) controls,<ref>{{cite manual |chapter-url=https://stratadoc.stratus.com/vos/19.3.1/r281-16/wwhelp/wwhimpl/js/html/wwhelp.htm?context=r281-16&file=appar281-16.html |chapter=
* [[EBCDIC#Definitions of non-ASCII EBCDIC controls|EBCDIC]] defines up to 29 additional control codes besides those present in ASCII. When translating EBCDIC to [[Unicode]] (or to [[ISO 8859]]), these codes are mapped to C1 control characters in a manner specified by IBM's Character Data Representation Architecture (CDRA).<ref name="utr16cdra">{{cite web |url=https://www.unicode.org/reports/tr16/tr16-6.html#Step%202 |title=3.3 Step 2: Byte Conversion |work=UTF-EBCDIC |id=Unicode Technical Report #16 |last1=Umamaheswaran |first1=V.S. |publisher=[[Unicode Consortium]] |date=1999-11-08 |quotation=The 64 control characters […], the ASCII DELETE character (U+007F)[…] are mapped respecting EBCDIC conventions, as defined in IBM Character Data Representation Architecture, CDRA, with one exception -- the pairing of EBCDIC Line Feed and New Line control characters are swapped from their CDRA default pairings to ISO/IEC 6429 Line Feed (U+000A) and Next Line (U+0085) control characters}}</ref><ref name="ms037">{{citation|mode=cs1 |url=https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP037.TXT |title=cp037_IBMUSCanada to Unicode table |last1=Steele |first1=Shawn |publisher=[[Microsoft]]/[[Unicode Consortium]] |date=1996-04-24}}</ref> Although the New Line (NL) does translate to the ISO/IEC 6429 {{Control code link|internal=1|NEL}} (although it is often swapped with LF, following UNIX line ending convention),<ref name="utr16cdra" /> the remainder of the control codes do not correspond. For example, the EBCDIC control {{Control code link|SPS}} and the ECMA-48 control {{Control code link|internal=1|PLU}} are both used to begin a superscript or end a subscript, but are not mapped to one another. Extended-ASCII-mapped EBCDIC can therefore be regarded as having its own C1 set, although it is not registered with the [[ISO-IR]] registry for ISO/IEC 2022.<ref name="iso-ir">{{citation|title=ISO/IEC International Register of Coded Character Sets To Be Used With Escape Sequences |id=ISO-IR |publisher=ITSCJ/[[Information Processing Society of Japan|IPSJ]] |url=https://itscj.ipsj.or.jp/english/vbcqpr00000004qn-att/ISO-IR.pdf}}</ref>
The OpenVOS Internal Character Code Set |author=Stratus Technologies Ireland, Ltd. |title=OpenVOS System Administration: Administering and Customizing a System (R281)}}</ref> used to invoke individual characters from pre-defined supplementary character sets,<ref>{{cite manual |chapter-url=https://stratadoc.stratus.com/vos/19.3.1/r212-00/wwhelp/wwhimpl/js/html/wwhelp.htm?href=ch2r212-00c.html |chapter=The Supplementary Graphic Character Sets |title=National Language Support User's Guide (R212) |author=Stratus Technologies Ireland, Ltd.}}</ref> in a similar manner to the [[ISO/IEC_2022#Shift_functions|single-shift mechanism of ISO/IEC 2022]]. The only single-shift controls defined by ISO/IEC 2022 are SS2 and SS3; these are retained in the VOS set at their original code points and function the same way.
* [[EBCDIC#Definitions of non-ASCII EBCDIC controls|EBCDIC]] defines up to 29 additional control codes besides those present in ASCII. When translating EBCDIC to [[Unicode]] (or to [[ISO 8859]]), these codes are mapped to C1 control characters in a manner specified by IBM's Character Data Representation Architecture (CDRA).<ref name="utr16cdra">{{cite web |url=https://www.unicode.org/reports/tr16/tr16-6.html#Step%202 |title=3.3 Step 2: Byte Conversion |work=UTF-EBCDIC |id=Unicode Technical Report #16 |last1=Umamaheswaran |first1=V.S. |publisher=[[Unicode Consortium]] |date=1999-11-08 |quotation=The 64 control characters […], the ASCII DELETE character (U+007F)[…] are mapped respecting EBCDIC conventions, as defined in IBM Character Data Representation Architecture, CDRA, with one exception -- the pairing of EBCDIC Line Feed and New Line control characters are swapped from their CDRA default pairings to ISO/IEC 6429 Line Feed (U+000A) and Next Line (U+0085) control characters}}</ref><ref name="ms037">{{citation|mode=cs1 |url=https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP037.TXT |title=cp037_IBMUSCanada to Unicode table |last1=Steele |first1=Shawn |publisher=[[Microsoft]]/[[Unicode Consortium]] |date=1996-04-24}}</ref> Although the New Line (NL) does translate to the ISO/IEC 6429 {{Control code link|internal=1|NEL}} (although it is often swapped with LF, following UNIX line ending convention),<ref name="utr16cdra" /> the remainder of the control codes do not correspond. For example, the EBCDIC control {{Control code link|SPS}} and the ECMA-48 control {{Control code link|internal=1|PLU}} are both used to begin a superscript or end a subscript, but are not mapped to one another. Extended-ASCII-mapped EBCDIC can therefore be regarded as having its own C1 set, although it is not registered with the [[ISO-IR]] registry for ISO/IEC 2022.<ref name="iso-ir">{{citation |title=ISO/IEC International Register of Coded Character Sets To Be Used With Escape Sequences |id=ISO-IR |publisher=ITSCJ/[[Information Processing Society of Japan|IPSJ]] |url=https://itscj.ipsj.or.jp/english/vbcqpr00000004qn-att/ISO-IR.pdf |access-date=2023-05-13 |archive-date=2023-05-12 |archive-url=https://web.archive.org/web/20230512232742/https://itscj.ipsj.or.jp/english/vbcqpr00000004qn-att/ISO-IR.pdf |url-status=dead }}</ref>
 
==Unicode==
Line 249 ⟶ 253:
 
==See also==
* [[Control Pictures]] - Unicode graphical representation characters for the C0 control codes
* [[ANSI escape code]]
 
Line 257 ⟶ 261:
==References==
{{reflist}}
 
==External links==
* The Unicode Standard
** [https://www.unicode.org/charts/PDF/U0000.pdf C0 Controls and Basic Latin]
Line 265 ⟶ 271:
* [http://www.w3.org/People/cmsmcq/2007/C1.xml ''De litteris regentibus C1 quaestiones septem'' or ''Are C1 characters legal in XHTML 1.0?'']
* [http://www.w3.org/International/questions/qa-controls W3C I18N FAQ: HTML, XHTML, XML and Control Codes]
* [https://itscj.ipsj.or.jp/english/vbcqpr00000004qn-att/ISO-IR.pdf International register of coded character sets to be used with escape sequences] {{Webarchive|url=https://web.archive.org/web/20230512232742/https://itscj.ipsj.or.jp/english/vbcqpr00000004qn-att/ISO-IR.pdf |date=2023-05-12 }}
 
{{character encoding}}