Revision as of 07:46, 21 December 2020 edit HarJIT (talk \| contribs) Extended confirmed users 12,434 edits →Category "Cc" control codes (C0 and C1) Tags: Mobile edit Mobile web edit Advanced mobile edit ← Previous edit		Revision as of 10:16, 21 December 2020 edit undo HarJIT (talk \| contribs) Extended confirmed users 12,434 edits →Category "Cc" control codes (C0 and C1) Next edit →
Line 10: The [[ISO/IEC 8859]] series of encodings conforms to [[ISO/IEC 4873]] (ECMA-43) level 1, a subset of ISO/IEC 2022 designed for 8-bit character encodings, and therefore designates the range 0x80–0x9F for use by a C1 control code set such as ISO/IEC 6429. Unicode inherits its [[Basic Latin (Unicode block)\|first]] and [[Latin-1 Supplement (Unicode block)\|second]] blocks (comprising U+0000 through U+00FF) from ASCII and [[ISO/IEC 8859-1]], thus incorporating the C0 and C1 control code ranges (U+0000–U+001F, U+007F–U+009F). Category "Cc" control codes serve a variety of purposes, not limited to format effectors: for example, the default ASCII C0 set includes six format effectors ({{ctrl\|BS}}, {{ctrl\|HT}}, {{ctrl\|LF}}, {{ctrl\|VT}}, {{ctrl\|FF}} and {{ctrl\|CR}}), ten transmission controls, four device controls, four information separators and eight other control codes.<ref name="ir001">{{citation\|mode=cs1 \|author=ISO/TC 97/SC 2 \|author-link=ISO/IEC JTC 1/SC 2#History \|title=The set of control characters of the ISO 646 \|date=1975 \|publisher=ITSCJ/[[Information Processing Society of Japan\|IPSJ]] \|id=ISO-IR-1 \|url=https://www.itscj.ipsj.or.jp/iso-ir/001.pdf}}</ref> Most of these characters play no explicit role in Unicode text handling, and are used only by higher-level protocols such as those used by [[terminal emulator]]s. Certain characters are commonly used for formatting or sentinel purposes: Most of these characters play no explicit role in Unicode text handling, and are used only by higher-level protocols such as those used by [[terminal emulator]]s. The characters {{unichar\|0000\|note=NUL}}, {{unichar\|0009\|Horizontal tabulation\|nlink=tab key\|note=HT}}, {{unichar\|000A\|Line feed\|nlink=newline\|note=LF}}, {{unichar\|000D\|carriage return\|note=CR}}, and {{unichar\|0085\|NEL\|note=NEL}} are commonly used in text processing as formatting characters. Unicode only specifies semantics for U+0009—U+000D, U+001C—U+001F, and U+0085. The rest of the control characters are transparent to Unicode and their meanings are left to higher-level protocols, although interpretation as defined in ISO/IEC 6429 is suggested as a default.<ref name="unicode-23-1">{{cite book \|url=https://www.unicode.org/versions/Unicode12.0.0/ch23.pdf#page=3 \|title=23.1: Control Codes \|work=The Unicode Standard \|edition=12.0.0 \|date=2019 \|author=Unicode Consortium \|author-link=Unicode Consortium \|isbn=978-1-936213-22-1 \|pages=868–870}}</ref> Furthermore, certain specialised higher-level protocols, such as transcoded [[Teletext]], may include a [[Teletext character set#Control characters\|different interpretation]] of the entire C0 control code range.<ref>{{cite web \|url=https://corp.unicode.org/pipermail/unicode/2020-October/009120.html \|title=Teletext separated mosaic graphics \|work=Unicode Mailing List Archive \|last=Ewell \|first=Doug \|date=2020-10-16 \|publisher=[[Unicode Consortium]] \|quotation=I reiterate that it was UTC {{bracket\|[[Unicode Technical Committee]]}} and Script Ad Hoc who provided the guidance to the group writing the [[Symbols for Legacy Computing]] proposal (and there is a second on the way) that 0x00 through 0x1F in the original teletext set should map to U+0000 through U+001F when converting to Unicode.}}</ref>▼ * {{unichar\|0000\|\|note=NUL: NULL}} (used in [[null-terminated string]]s) * {{unichar\|0009\|\|note=HT: HORIZONTAL TABULATION}} (inserted by the [[tab key]]) * {{unichar\|000A\|\|note=LF: LINE FEED}} (used as a [[newline\|line break]]) * {{unichar\|000C\|\|note=FF: FORM FEED}} (denotes a [[page break]] in a plain text file) * {{unichar\|000D\|\|note=CR: CARRIAGE RETURN}} (used in some line-breaking conventions) * {{unichar\|0085\|\|note=NEL: NEXT LINE}} (sometimes used as a line break in text transcoded from [[EBCDIC]]) ▲~~Most~~Unicode ofonly ~~these~~specifies ~~characters~~semantics ~~play~~for noU+0009—U+000D, ~~explicit role in Unicode text handling~~U+001C—U+001F, and ~~are~~U+0085 ~~used~~(the ~~only~~ASCII byformat ~~higher-level~~effectors ~~protocols~~except ~~such as those used by [[terminal emulator]]s. The characters~~for {{~~unichar\|0000~~ctrl\|~~note=NUL~~BS}}, ~~{{unichar\|0009\|Horizontal~~the ~~tabulation\|nlink=tab key\|note=HT}},~~C1 {{~~unichar~~ctrl\|~~000A\|Line feed\|nlink=newline\|note=LF~~NEL}}~~, {{unichar\|000D\|carriage return\|note=CR}},~~ and ~~{{unichar\|0085\|NEL\|note=NEL}}~~the ~~are~~ASCII ~~commonly~~information ~~used in text processing as formatting characters. Unicode only specifies semantics for U+0009—U+000D, U+001C—U+001F, and U+0085~~separators). The rest of the control characters are transparent to Unicode and their meanings are left to higher-level protocols, although interpretation as defined in ISO/IEC 6429 is suggested as a default.<ref name="unicode-23-1">{{cite book \|url=https://www.unicode.org/versions/Unicode12.0.0/ch23.pdf#page=3 \|title=23.1: Control Codes \|work=The Unicode Standard \|edition=12.0.0 \|date=2019 \|author=Unicode Consortium \|author-link=Unicode Consortium \|isbn=978-1-936213-22-1 \|pages=868–870}}</ref> Furthermore, certain specialised higher-level protocols, such as transcoded [[Teletext]], may include a [[Teletext character set#Control characters\|different interpretation]] of the entire C0 control code range.<ref>{{cite web \|url=https://corp.unicode.org/pipermail/unicode/2020-October/009120.html \|title=Teletext separated mosaic graphics \|work=Unicode Mailing List Archive \|last=Ewell \|first=Doug \|date=2020-10-16 \|publisher=[[Unicode Consortium]] \|quotation=I reiterate that it was UTC {{bracket\|[[Unicode Technical Committee]]}} and Script Ad Hoc who provided the guidance to the group writing the [[Symbols for Legacy Computing]] proposal (and there is a second on the way) that 0x00 through 0x1F in the original teletext set should map to U+0000 through U+001F when converting to Unicode.}}</ref> == Unicode introduced separators ==

Unicode control characters: Difference between revisions