The Basic Latin (or C0 Controls and Basic Latin) Unicode block is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding, which is a United States national variant of ISO/IEC 646.
C0 Controls and Basic Latin | |
---|---|
Range | U+0000..U+007F (128 code points) |
Plane | BMP |
Scripts | Latin Common |
Major alphabets | English French Spanish German Vietnamese |
Symbol sets | Arabic numerals Punctuation |
Assigned | 128 code points 33 Control or Format |
Unused | 0 reserved code points |
Source standards | ISO/IEC 8859, ISO 646 |
Unicode version history | |
1.0.0 (1991) | 128 (+128) |
Unicode documentation | |
Code chart ∣ Web page | |
Note: [1][2] |
The letter U+005C (\) may show up as a Yen or Won sign in Japanese/Korean fonts mistaking Unicode (especially UTF-8) as a legacy character set which replaced the backslash with these signs.[3]
The Basic Latin block was included in its present from version 1.0.0 of the Unicode Standard, without addition or alteration of the character repertoire.[2]
The following table shows the contents of the block:
Code | Result | Description | Acronym |
---|---|---|---|
C0 controls | |||
U+0000 | Null character | NUL | |
U+0001 | Start of Heading | SOH | |
U+0002 | Start of Text | STX | |
U+0003 | End-of-text character | ETX | |
U+0004 | End-of-transmission character | EOT | |
U+0005 | Enquiry character | ENQ | |
U+0006 | Acknowledge character | ACK | |
U+0007 | Bell character | BEL | |
U+0008 | Backspace | BS | |
U+0009 | Horizontal tab | HT | |
U+000A | Line feed | LF | |
U+000B | Vertical tab | VT | |
U+000C | Form feed | FF | |
U+000D | Carriage return | CR | |
U+000E | Shift Out | SO | |
U+000F | Shift In | SI | |
U+0010 | Data Link Escape | DLE | |
U+0011 | Device Control 1 | DC1 | |
U+0012 | Device Control 2 | DC2 | |
U+0013 | Device Control 3 | DC3 | |
U+0014 | Device Control 4 | DC4 | |
U+0015 | Negative-acknowledge character | NAK | |
U+0016 | Synchronous Idle | SYN | |
U+0017 | End of Transmission Block | ETB | |
U+0018 | Cancel character | CAN | |
U+0019 | End of Medium | EM | |
U+001A | Substitute character | SUB | |
U+001B | Escape character | ESC | |
U+001C | File Separator | FS | |
U+001D | Group Separator | GS | |
U+001E | Record Separator | RS | |
U+001F | Unit Separator | US | |
ASCII Punctuation and Symbols | |||
U+0020 | Space | SP | |
U+0021 | ! | Exclamation mark | |
U+0022 | " | Quotation mark | |
U+0023 | # | Number sign | |
U+0024 | $ | Dollar sign | |
U+0025 | % | Percent sign | |
U+0026 | & | Ampersand | |
U+0027 | ' | Apostrophe | |
U+0028 | ( | Left parenthesis | |
U+0029 | ) | Right parenthesis | |
U+002A | * | Asterisk | |
U+002B | + | Plus sign | |
U+002C | , | Comma | |
U+002D | - | Hyphen-minus | |
U+002E | . | Full stop | |
U+002F | / | Slash | |
ACII Digits | |||
U+0030 | 0 | Digit Zero | |
U+0031 | 1 | Digit One | |
U+0032 | 2 | Digit Two | |
U+0033 | 3 | Digit Three | |
U+0034 | 4 | Digit Four | |
U+0035 | 5 | Digit Five | |
U+0036 | 6 | Digit Six | |
U+0037 | 7 | Digit Seven | |
U+0038 | 8 | Digit Eight | |
U+0039 | 9 | Digit Nine | |
ASCII Punctuation and Symbols | |||
U+003A | : | Colon | |
U+003B | ; | Semicolon | |
U+003C | < | Less-than sign | |
U+003D | = | Equal sign | |
U+003E | > | Greater-than sign | |
U+003F | ? | Question mark | |
U+0040 | @ | At sign | |
Uppercase Latin Alphabet | |||
U+0041 | A | Latin Capital letter A | |
U+0042 | B | Latin Capital letter B | |
U+0043 | C | Latin Capital letter C | |
U+0044 | D | Latin Capital letter D | |
U+0045 | E | Latin Capital letter E | |
U+0046 | F | Latin Capital letter F | |
U+0047 | G | Latin Capital letter G | |
U+0048 | H | Latin Capital letter H | |
U+0049 | I | Latin Capital letter I | |
U+004A | J | Latin Capital letter J | |
U+004B | K | Latin Capital letter K | |
U+004C | L | Latin Capital letter L | |
U+004D | M | Latin Capital letter M | |
U+004E | N | Latin Capital letter N | |
U+004F | O | Latin Capital letter O | |
U+0050 | P | Latin Capital letter P | |
U+0051 | Q | Latin Capital letter Q | |
U+0052 | R | Latin Capital letter R | |
U+0053 | S | Latin Capital letter S | |
U+0054 | T | Latin Capital letter T | |
U+0055 | U | Latin Capital letter U | |
U+0056 | V | Latin Capital letter V | |
U+0057 | W | Latin Capital letter W | |
U+0058 | X | Latin Capital letter X | |
U+0059 | Y | Latin Capital letter Y | |
U+005A | Z | Latin Capital letter Z | |
ASCII Punctuation and Symbols | |||
U+005B | [ | Left Square Bracket | |
U+005C | \ | Backslash | |
U+005D | ] | Right Square Bracket | |
U+005E | ^ | Circumflex accent | |
U+005F | _ | Low line | |
U+0060 | ` | Grave accent | |
Lowercase Latin Alphabet | |||
U+0061 | a | Latin Small Letter A | |
U+0062 | b | Latin Small Letter B | |
U+0063 | c | Latin Small Letter C | |
U+0064 | d | Latin Small Letter D | |
U+0065 | e | Latin Small Letter E | |
U+0066 | f | Latin Small Letter F | |
U+0067 | g | Latin Small Letter G | |
U+0068 | h | Latin Small Letter H | |
U+0069 | i | Latin Small Letter I | |
U+006A | j | Latin Small Letter J | |
U+006B | k | Latin Small Letter K | |
U+006C | l | Latin Small Letter L | |
U+006D | m | Latin Small Letter M | |
U+006E | n | Latin Small Letter N | |
U+006F | o | Latin Small Letter O | |
U+0070 | p | Latin Small Letter P | |
U+0071 | q | Latin Small Letter Q | |
U+0072 | r | Latin Small Letter R | |
U+0073 | s | Latin Small Letter S | |
U+0074 | t | Latin Small Letter T | |
U+0075 | u | Latin Small Letter U | |
U+0076 | v | Latin Small Letter V | |
U+0077 | w | Latin Small Letter W | |
U+0078 | x | Latin Small Letter X | |
U+0079 | y | Latin Small Letter Y | |
U+007A | z | Latin Small Letter Z | |
ASCII Punctuation and Symbols | |||
U+007B | { | Left Curly Bracket | |
U+007C | | | Vertical bar | |
U+007D | } | Right Curly Bracket | |
U+007E | ~ | Tilde | |
Control Character | |||
U+007F | Delete | DEL |
Subheadings
The C0 Controls and Basic Latin block contains 6 subheadings.[4]
C0 Controls
The C0 Controls, referred to as C0 ASCII control codes in version 1.0, are inherited from ASCII and other 7- and 8-bit encoding schemes. The Alias names for C0 controls are taken from the ISO/IEC 6429:1992 standard.[4]
ASCII Punctuation and Symbols
This subheading refers to standard punctuation characters, simple mathematical operators, and symbols like the dollar sign, percent, ampersand, underscore, and pipe.[4]
ASCII Digits
The ASCII Digits subheading contains the standard European number characters 1-9 and 0.[4]
Uppercase Latin Alphabet
The Uppercase Latin Alphabet subheading contains the standard 26 letter unaccented Latin alphabet in the majescule.[4]
Lowercase Latin Alphabet
The Lowercase Latin Alphabet subheading contains the standard 26 letter unaccented Latin alphabet in the miniscule.[4]
Control Character
The Control Character subheading contains the "Delete" character.[4]
See also
References
- ^ [www.unicode.org "Unicode character database"]. The Unicode Standard. Retrieved 22 March 2013.
{{cite web}}
: Check|url=
value (help) - ^ a b The Unicode Standard Version 1.0, Volume 1. Addison-Wesley Publishing Company, Inc. 1990. ISBN 0-201-56788-1.
- ^ Sorting it all Out : When is a backslash not a backslash?
- ^ a b c d e f g "Unicode 6.2 code charts" (PDF). The Unicode Standard. Retrieved 1 April 2013.