Basic Latin (Unicode block): Difference between revisions

Content deleted Content added
Undid revision 1279543928 by 2600:6C5D:57F0:1F0:7509:4501:D19C:1EFB (talk) revert vandalism
 
(338 intermediate revisions by 79 users not shown)
Line 1:
{{Infobox Unicode block
{{movenotice|C0 Controls and Basic Latin|Talk:C0 controls and basic Latin#Unicode block names}}
|blockname = Basic Latin<br/>{{nobold|1=''or''}}<br/>C0 Controls and Basic Latin
The '''Basic Latin''' unicode block (full name: '''C0 Controls and Basic Latin''') is the first block of the [[Unicode]] standard, and the only block which is encoded in one byte in [[UTF-8]]. The block contains all the letters and control codes of the [[ASCII]] encoding, which is a [[United States]] national variant of [[ISO/IEC 646]].
|rangestart = 0000
|rangeend = 007F
|script1 = {{nowrap|[[Latin script|Latin]] (52 characters)}}
|script2 = {{nowrap|[[Script (Unicode)#Special script property values|Common]] (76 characters)}}
|symbols = [[Arabic numerals]]<br />[[Punctuation]]
|alphabets = [[English language|English]]<br />[[French language|French]]<br />[[German language|German]]<br />[[Spanish language|Spanish]]<br />[[Vietnamese language|Vietnamese]]
|1_0_0 = 128
|controls = 33
|sources = [[ISO/IEC 8859]], [[ISO 646]]
|note = <ref>{{cite web|url=https://www.unicode.org/ucd/|title=Unicode character database|work=The Unicode Standard|accessdate=2023-07-26}}</ref><ref>{{cite web|url=https://www.unicode.org/versions/enumeratedversions.html|title=Enumerated Versions of The Unicode Standard|work=The Unicode Standard|accessdate=2023-07-26}}</ref>
}}
 
The '''Basic Latin''' [[Unicode block]],<ref>{{cite web|url=https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt|title=block.txt|accessdate=2023-03-23|publisher=The Unicode Consortium}}</ref> sometimes informally called '''C0 Controls and Basic Latin''',<ref>{{cite web|url=https://www.unicode.org/charts/PDF/U0000.pdf|title=C0 Controls and Basic Latin|work=The Unicode Standard, Version 15.0|publisher=[[Unicode Consortium|Unicode, Inc.]]|year=2022|access-date=March 22, 2023}}</ref> is the first block of the [[Unicode]] standard, and the only block which is encoded in one byte in [[UTF-8]]. The block contains all the [[ISO basic Latin alphabet|letters]] and [[ASCII control character|control codes]] of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the [[C0 controls]], ASCII [[punctuation]] and [[symbol]]s, [[ASCII]] [[numerical digit|digits]], both the [[uppercase]] and [[lowercase]] of the [[English alphabet]] and a [[control character]].
The letter U+005C (\) may show up as a Yen or Won sign in Japanese/Korean fonts for compatibility reasons with legacy character sets which replaced the backslash with these signs.<ref>[http://blogs.msdn.com/michkap/archive/2005/09/17/469941.aspx Sorting it all Out : When is a backslash not a backslash?]</ref>
 
The Basic Latin block was included in its present form from version 1.0.0 of the Unicode Standard, without addition or alteration of the character repertoire.<ref name=Unicode1.0>{{cite book|title=The Unicode Standard Version 1.0, Volume 1|year=1990|publisher=Addison-Wesley Publishing Company, Inc.|isbn=0-201-56788-1}}</ref> Its block name in Unicode 1.0 was '''ASCII'''.<ref>{{cite web |url=https://www.unicode.org/versions/Unicode1.0.0/CodeCharts2.pdf |work=The Unicode Standard |version=version 1.0 |title=3.8: Block-by-Block Charts |publisher=[[Unicode Consortium]]}}</ref>
The following table shows the contents of the block:
 
==Table of characters==
{| class="wikitable sortable"
{| class="wikitable collapsible"
!Code
!Result
!Description
!Acronym
|-
| colspan=4 | C0 controls
|-
| U+0000
Line 132 ⟶ 146:
| ETB
|-
| U+0018
|
| [[Cancel character]]
Line 172 ⟶ 186:
| US
|-
| colspan=4 | ASCII punctuation and symbols
| U+0020
|-
| &nbsp;
|U+0020
| [[Space (punctuation)|Space]]
|&nbsp;
| SP
|[[Space (punctuation)|Space]]
|SP
|-
|U+0021
|!
|[[Exclamation mark]]
| EXC
|
|-
|U+0022
|"
|[[Quotation mark]]
| QUO
|
|-
|U+0023
|#
|[[Number sign]]
|
|-
|U+0024
Line 214 ⟶ 230:
|U+0028
|(
|[[Bracket#Parentheses ( )|Left parenthesis]]
|
|-
|U+0029
|)
|[[Bracket#Parentheses ( )|Right parenthesis]]
|
|-
Line 228 ⟶ 244:
|-
|U+002B
|{{+}}
|<nowiki>+</nowiki>
|[[Plus sign]]
|
Line 238 ⟶ 254:
|-
|U+002D
| -
|<nowiki>-</nowiki>
|[[Hyphen-minus]]
|
Line 244 ⟶ 260:
|U+002E
|.
|[[Full stop|Full stop '''''or''''' period]]
|
|-
|U+002F
|/
|[[Slash (punctuation)|Solidus '''''or''''' Slash]]
|
|-
| colspan=4 | ASCII digits
|-
|U+0030
Line 301 ⟶ 319:
|[[9 (number)|Digit Nine]]
|
|-
| colspan=4 | ASCII punctuation and symbols
|-
|U+003A
Line 334 ⟶ 354:
|U+0040
|@
|[[At sign|At sign '''''or''''' Commercial at]]
|
|-
| colspan=4 | Uppercase Latin alphabet
|-
|U+0041
Line 466 ⟶ 488:
|[[Z|Latin Capital letter Z]]
|
|-
| colspan=4 | ASCII punctuation and symbols
|-
|U+005B
|&#91;
|[[Bracket#BoxSquare brackets or square brackets .5B .5D|Left Square Bracket]]
|
|-
|U+005C
|\
|[[Backslash]] {{ref label|backslash|A|A}}
|
|-
|U+005D
|&#93;
|[[Bracket#BoxSquare brackets or square brackets .5B .5D|Right Square Bracket]]
|
|-
Line 496 ⟶ 520:
|[[Grave accent]]
|
|-
| colspan=4 | Lowercase Latin alphabet
|-
|U+0061
|a
|[[a|Latin Small Letter A]]
|
|-
|U+0062
|b
|[[b|Latin Small Letter B]]
|
|-
|U+0063
|c
|[[c|Latin Small Letter C]]
|
|-
|U+0064
|d
|[[d|Latin Small Letter D]]
|
|-
|U+0065
|e
|[[e|Latin Small Letter E]]
|
|-
|U+0066
|f
|[[f|Latin Small Letter F]]
|
|-
|U+0067
|g
|[[g|Latin Small Letter G]]
|
|-
|U+0068
|h
|[[h|Latin Small Letter H]]
|
|-
|U+0069
|i
|[[i|Latin Small Letter I]]
|
|-
|U+006A
|j
|[[j|Latin Small Letter J]]
|
|-
|U+006B
|k
|[[k|Latin Small Letter K]]
|
|-
|U+006C
|l
|[[l|Latin Small Letter L]]
|
|-
|U+006D
|m
|[[m|Latin Small Letter M]]
|
|-
|U+006E
|n
|[[n|Latin Small Letter N]]
|
|-
|U+006F
|o
|[[o|Latin Small Letter O]]
|
|-
|U+0070
|p
|[[p|Latin Small Letter P]]
|
|-
|U+0071
|q
|[[q|Latin Small Letter Q]]
|
|-
|U+0072
|r
|[[r|Latin Small Letter R]]
|
|-
|U+0073
|s
|[[s|Latin Small Letter S]]
|
|-
|U+0074
|t
|[[t|Latin Small Letter T]]
|
|-
|U+0075
|u
|[[u|Latin Small Letter U]]
|
|-
|U+0076
|v
|[[v|Latin Small Letter V]]
|
|-
|U+0077
|w
|[[w|Latin Small Letter W]]
|
|-
|U+0078
|x
|[[x|Latin Small Letter X]]
|
|-
|U+0079
|y
|[[y|Latin Small Letter Y]]
|
|-
|U+007A
|z
|[[z|Latin Small Letter Z]]
|
|-
| colspan=4 | ASCII punctuation and symbols
|-
|U+007B
|{
|[[Bracket#Curly brackets or braces .7B .7D|Left Curly Bracket]]
|
|-
|U+007C
|{{vertical bar}}
|<nowiki>|</nowiki>
|[[Vertical bar]]
|
|-
|U+007D
| }
|<nowiki>}</nowiki>
|[[Bracket#Curly brackets or braces .7B .7D|Right Curly Bracket]]
|
|-
Line 646 ⟶ 674:
|[[Tilde]]
|
|-
| colspan=4 | Control character
|-
| U+007F
|
| [[Delete keycharacter|Delete]]
| DEL
|}
:{{note label|backslash|A|A}} The letter U+005C (\) may show up as a Yen(¥) or Won(₩) sign in Japanese/Korean fonts mistaking Unicode (especially [[UTF-8]]) as a legacy character set which replaced the backslash with these signs.<ref>{{Cite web
|title = When is a backslash not a backslash?
|work = Sorting it all Out
|author = Michael S. Kaplan
|publisher = Microsoft
|date = 2005-09-17
|url = http://blogs.msdn.com/b/michkap/archive/2005/09/17/469941.aspx
|url-status = dead
|archive-url = https://web.archive.org/web/20100612050134/http://blogs.msdn.com/b/michkap/archive/2005/09/17/469941.aspx
|archive-date = 2010-06-12
}} Also available at: http://archives.miloush.net/michkap/archive/2005/09/17/469941.html </ref>
 
==Subheadings==
The C0 Controls and Basic Latin block contains six subheadings.<ref name=charts>{{cite web|title=Unicode 6.2 code charts|url=https://www.unicode.org/Public/6.2.0/charts/CodeCharts.pdf|work=The Unicode Standard|accessdate=1 April 2013}}</ref>
 
===C0 controls===
The [[C0 and C1 control codes|C0 Controls]], referred to as C0 ASCII control codes in version 1.0, are inherited from ASCII and other 7-bit and 8-bit encoding schemes. The Alias names for C0 controls are taken from the [[ISO/IEC 6429|ISO/IEC 6429:1992]] standard.<ref name=charts />
 
===ASCII punctuation and symbols===
This subheading refers to standard punctuation characters, simple [[Operation (mathematics)|mathematical operators]], and symbols like the dollar sign, percent, ampersand, underscore, and pipe.<ref name=charts />
 
===ASCII digits===
The ASCII Digits subheading contains the standard European number characters 1–9 and 0.<ref name=charts />
 
===Uppercase Latin alphabet===
The Uppercase Latin alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the [[majuscule]].<ref name=charts />
 
===Lowercase Latin alphabet===
The Lowercase Latin Alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the [[Lower case|minuscule]].<ref name=charts />
 
===Control character===
The Control Character subheading contains the [[Delete character|"Delete" character]].<ref name=charts />
 
==Number of symbols, letters and control codes==
The table below shows the number of [[Letter (alphabet)|letter]]s, symbols and control codes in each of the subheadings in the C0 Controls and Basic Latin block.
{| class="wikitable"
!Subheading!!Number of symbols!!Range of characters
|-
|'''C0 controls'''||32 control codes|| U+0000 to U+001F
|-
|'''ASCII punctuation and symbols'''||33 punctuation marks and symbols||U+0020 to U+002F, U+003A to U+0040, U+005B to U+0060 and U+007B to U+007E
|-
|'''ASCII digits'''||10 digits||U+0030 to U+0039
|-
|'''Uppercase Latin Alphabet'''|| 26 unaccented Latin letters in the majuscule.||U+0041 to U+005A
|-
|'''Lowercase Latin Alphabet'''|| 26 unaccented Latin letters in the minuscule.||U+0061 to U+007A
|-
|'''Control character'''|| 1 control code containing the "Delete" character.|| U+007F
|}
 
==Chart==
{{Unicode chart C0 Controls and Basic Latin}}
 
==Variants==
Several of the characters are defined to render as a [[Variant form (Unicode)|standardized variant]] if followed by variant indicators.
 
A variant is defined for a zero with a short diagonal stroke: U+0030 DIGIT ZERO, U+FE00 VS1 (0&#xfe00;).<ref>{{cite web|url=https://www.unicode.org/L2/L2015/15268-slashed-zero.pdf|title=L2/15-268: Proposal to Represent the Slashed Zero Variant of Empty Set|date=2015-10-30|first1=Barbara|last1=Beeton|first2=Asmus|last2=Freytag|first3=Laurențiu|last3=Iancu|first4=Murray|last4=Sargent}}</ref><ref name="uts51"/>
 
Twelve characters (#, *, and the digits) can be followed by U+FE0E VS15 or U+FE0F VS16 to create [[emoji]] variants.<ref>{{cite web|url=https://www.unicode.org/L2/L2011/11438-emoji-var.pdf|title=L2/11-438: Emoji Variation Sequences (Revision of L2/11-429)|date=2011-12-22|first=Peter|last=Edberg}}</ref><ref>{{cite web|url=https://www.unicode.org/L2/L2015/15301-emoji-sequences.pdf|title=L2/15-301: A proposal for 278 standardized variation sequences for emoji|date=2015-11-01|first=Roozbeh|last=Pournader}}</ref><ref name="UTR51">{{Cite web|url=http://unicode.org/reports/tr51/|title=UTR #51: Unicode Emoji|publisher=Unicode Consortium|date=2023-09-05}}</ref><ref name="EmojiData">{{Cite web|url=https://unicode.org/Public/UNIDATA/emoji/emoji-data.txt|title=UCD: Emoji Data for UTR #51|publisher=Unicode Consortium|date=2023-02-01}}</ref>
They are [[keycap]] base characters, for example #️⃣ (U+0023 NUMBER SIGN U+FE0F VS16 U+20E3 COMBINING ENCLOSING KEYCAP). The VS15 version is "text presentation" while the VS16 version is "emoji-style".<ref name="uts51">{{Cite web|url=https://unicode.org/Public/UNIDATA/emoji/emoji-variation-sequences.txt|title=UTS #51 Emoji Variation Sequences | publisher=The Unicode Consortium}}</ref>
 
{| class="wikitable nounderlines" style="border-collapse:collapse;background:#FFFFFF;font-size:large;text-align:center"
|+style="font-size:small" | Emoji variation sequences
|-style="background:#F8F8F8;font-size:small"
| style="text-align:right" | U+ || 0023 || 002A || 0030 || 0031 || 0032 || 0033 || 0034 || 0035 || 0036 || 0037 || 0038 || 0039
|-
| style="background:#F8F8F8;font-size:small;text-align:left" | base || # || * || 0 || 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9
|-
| style="background:#F8F8F8;font-size:small;text-align:left" | base+VS15+keycap || #&#xfe0e;&#x20e3; || *&#xfe0e;&#x20e3; || 0&#xfe0e;&#x20e3; || 1&#xfe0e;&#x20e3; || 2&#xfe0e;&#x20e3; || 3&#xfe0e;&#x20e3; || 4&#xfe0e;&#x20e3; || 5&#xfe0e;&#x20e3; || 6&#xfe0e;&#x20e3; || 7&#xfe0e;&#x20e3; || 8&#xfe0e;&#x20e3; || 9&#xfe0e;&#x20e3;
|-
| style="background:#F8F8F8;font-size:small;text-align:left" | base+VS16+keycap || #&#xfe0f;&#x20e3; || *&#xfe0f;&#x20e3; || 0&#xfe0f;&#x20e3; || 1&#xfe0f;&#x20e3; || 2&#xfe0f;&#x20e3; || 3&#xfe0f;&#x20e3; || 4&#xfe0f;&#x20e3; || 5&#xfe0f;&#x20e3; || 6&#xfe0f;&#x20e3; || 7&#xfe0f;&#x20e3; || 8&#xfe0f;&#x20e3; || 9&#xfe0f;&#x20e3;
|}
 
==History==
The following Unicode-related documents record the purpose and process of defining specific characters in the Basic Latin block:
 
{{sticky header}}
{| class="wikitable sticky-header"
|-
! [[Unicode#Versions|Version]] !! {{nobr|Final code points<ref group=lower-alpha name=final/>}} !! Count !! [[Unicode Consortium|UTC]]&nbsp;ID !! [[International Committee for Information Technology Standards|L2]]&nbsp;ID !! [[ISO/IEC JTC 1/SC 2|WG2]]&nbsp;ID !! Document
|-
| rowspan="18" | 1.0.0 || rowspan="18" | U+0000..007F || rowspan="18" | 128 || || || || (to be determined)
|-
| {{nobr|[https://www.unicode.org/L2/L1999-UTC/u1999-013.htm UTC/1999-013]}} || || || {{Citation|title=Tildes and micro sign decompositions|date=1999-05-27|first=Kent|last=Karlsson}}
|-
| || {{nobr|[https://www.unicode.org/L2/L1999/99176.htm L2/99-176R]}} || || {{Citation|title=Minutes from the joint UTC/L2 meeting in Seattle, June 8-10, 1999|date=1999-11-04|first=Lisa|last=Moore|section=Micro Sign Case Mappings}}
|-
| || {{nobr|[https://www.unicode.org/L2/L2004/04145-cstroke-note.pdf L2/04-145]}} || || {{Citation|title=C with stroke character examples from BAE report 1884 (Dorsey)|date=2004-04-30|first=David|last=Starner}}
|-
| || {{nobr|[https://www.unicode.org/L2/L2004/04202-slash-c-feedback.txt L2/04-202]}} || || {{Citation|title=Slashed C Feedback|date=2004-06-07|first=Deborah|last=Anderson}}
|-
| || || [https://www.unicode.org/wg2/docs/n3046.pdf N3046] || {{Citation|title=Improving formal definition for control characters|date=2006-02-22|first=Michel|last=Suignard}}
|-
| || || {{nobr|[https://www.unicode.org/wg2/docs/n3103.pdf N3103 (pdf],}} [https://www.unicode.org/wg2/docs/n3103.doc doc]) || {{Citation|title=Unconfirmed minutes of WG 2 meeting 48, Mountain View, CA, USA; 2006-04-24/27|date=2006-08-25|first=V. S.|last=Umamaheswaran|section=M48.33}}
|-
| || {{nobr|[https://www.unicode.org/L2/L2011/11043-modletcase.pdf L2/11-043]}} || || {{Citation|title=Proposal to correct mistakes and inconsistencies in certain property assignments for super and subscripted letters|date=2011-02-02|first1=Asmus|last1=Freytag|first2=Kent|last2=Karlsson}}
|-
| || {{nobr|[https://www.unicode.org/L2/L2011/11160-pri181.pdf L2/11-160]}} || || {{Citation|title=PRI #181 Changing General Category of Twelve Characters|date=2011-05-02}}
|-
| || {{nobr|[https://www.unicode.org/L2/L2011/11261.htm L2/11-261R2]}} || || {{Citation|title=UTC #128 / L2 #225 Minutes|date=2011-08-16|first=Lisa|last=Moore|section=Consensus 128-C3|quote=Accept Ken Whistler's recommendations in L2/11-281 on name aliases for control characters with the addition of the abbreviations BEL and NUL.}}
|-
| || {{nobr|[https://www.unicode.org/L2/L2011/11438-emoji-var.pdf L2/11-438]<ref group=lower-alpha name=also10458/><ref group=lower-alpha name=emojidocs/>}} || [https://www.unicode.org/wg2/docs/n4182.pdf N4182] || {{Citation|title=Emoji Variation Sequences (Revision of L2/11-429)|date=2011-12-22|first=Peter|last=Edberg}}
|-
| || {{nobr|[https://www.unicode.org/L2/L2015/15107.htm L2/15-107]}} || || {{Citation|title=UTC #143 Minutes|date=2015-05-12|first=Lisa|last=Moore|section=Consensus 143-C5|quote=Add the 12 keycap sequences in emoji-data.txt as provisional named sequences in Unicode 8.0.}}
|-
| || {{nobr|[https://www.unicode.org/L2/L2015/15268-slashed-zero.pdf L2/15-268]}} || || {{Citation|title=Proposal to Represent the Slashed Zero Variant of Empty Set|date=2015-10-30|first1=Barbara|last1=Beeton|first2=Asmus|last2=Freytag|first3=Laurențiu|last3=Iancu|first4=Murray|last4=Sargent}}
|-
| || {{nobr|[https://www.unicode.org/L2/L2015/15301-emoji-sequences.pdf L2/15-301]<ref group=lower-alpha name=also15198/><ref group=lower-alpha name=emojidocs/>}} || || {{Citation|title=A proposal for 278 standardized variation sequences for emoji|date=2015-11-01|first=Roozbeh|last=Pournader}}
|-
| || {{nobr|[https://www.unicode.org/L2/L2015/15254.htm L2/15-254]}} || || {{Citation|title=UTC #145 Minutes|date=2015-11-16|first=Lisa|last=Moore|section=B.12.1.2 Proposal to Represent the Slashed Zero Variant of Empty Set}}
|-
| || {{nobr|[https://www.unicode.org/L2/L2017/17294-fullwidth-slashed-zero.pdf L2/17-294]}} || [https://www.unicode.org/wg2/docs/n4914-17294-fullwidth-slashed-zero.pdf N4914] || {{Citation|title=Proposal to add standardized variation sequence for U+FF10 FULLWIDTH DIGIT ZERO|date=2017-08-14|first=Ken|last=Lunde|author-link=Ken Lunde}}
|-
| || {{nobr|[https://www.unicode.org/L2/L2022/22019-utc170-properties-recs.pdf L2/22-019]}} || || {{Citation|title=UTC #170 properties feedback & recommendations|date=2022-01-19|first1=Markus|last1=Scherer|display-authors=etal|section=F.2 F4: U+0019 in ISO vs. NameAliases.txt vs. chart/NamesList.txt}}
|-
| || {{nobr|[https://www.unicode.org/L2/L2022/22016.htm L2/22-016]}} || || {{Citation|title=UTC #170 Minutes|date=2022-04-21|first=Peter|last=Constable|section=Consensus 170-C24|quote=For U+0019, add a Name alias "EM" of type abbreviation, for Unicode version 15.0.}}
|- class="sortbottom"
| colspan="7" | {{reflist|group=lower-alpha|refs=
<ref name=final>Proposed code points and characters names may differ from final code points and names</ref>
<ref name=also10458>See also [https://www.unicode.org/L2/L2010/10458-emoji-var.pdf L2/10-458], [https://www.unicode.org/L2/L2011/11414-emoji-var-seq.pdf L2/11-414], [https://www.unicode.org/L2/L2011/11415-unified-emoji-ref.pdf L2/11-415], and [https://www.unicode.org/L2/L2011/11429-emoji-var-seq-list.pdf L2/11-429]</ref>
<ref name=emojidocs>Refer to the [[Miscellaneous Symbols and Pictographs#History|history section]] of the Miscellaneous Symbols and Pictographs block for additional emoji-related documents</ref>
<ref name=also15198>See also [https://www.unicode.org/L2/L2015/15198-varseq-text-emoji.pdf L2/15-198] and [https://www.unicode.org/L2/L2015/15275-more-var-seqs-for-text-vs-emoji.pdf L2/15-275]</ref>}}
|}
 
==See also==
{{portal|Internet|Language}}
*[[Latin script in Unicode]]
*[[Latin-1 Supplement]]
*[[Character encoding]]
*[[ISO/IEC 8859-1]]
*[[Latin script]]
*[[ISO basic Latin alphabet]]
 
==References==
Line 657 ⟶ 820:
 
==External links==
{{Spoken Wikipedia|date=2023-11-08|En-Basic Latin (Unicode block)-article.ogg}}
* [http://www.unicode.org/charts/PDF/U0000.pdf Unicode chart U0000 (pdf)]
{{sister project links|Unicode}}
 
{{Unicode navigation}}
 
{{authority control}}
[[Category:Unicode blocks]]
 
[[Category:Latin-script Unicode blocks]]
[[de:Unicode-Block Basis-Lateinisch]]
[[Category:Unicode blocks]]