Halfwidth and Fullwidth Forms (Unicode block): Difference between revisions

Content deleted Content added
Reverted to revision 625693190 by Drmccreedy (talk): Demerge per Talk:Halfwidth and fullwidth forms (TW)
Tags: Removed redirect Undo
m Block: {{not a typo}}
 
(32 intermediate revisions by 19 users not shown)
Line 11:
|1_1 = 7
|3_2 = 2
|note = <ref>{{cite web|url=httphttps://www.unicode.org/versions/Unicode1.0.0/Notice.pdf|title=Unicode character1.0.1 databaseAddendum|work=The Unicode Standard|accessdatedate=22 March 20131992-11-03|access-date=2016-07-09}}</ref><ref>{{cite bookweb|url=https://www.unicode.org/ucd/|title=Unicode character database|work=The Unicode Standard|access-date=2023-07-26}}</ref><ref>{{cite Version 1web|url=https://www.0, Volume 1unicode.org/versions/enumeratedversions.html|yeartitle=1990,Enumerated 1991Versions of The Unicode Standard|publisherwork=Addison-WesleyThe Publishing Company,Unicode Inc.Standard|isbnaccess-date=02023-20107-56788-126}}</ref>
}}
 
'''Halfwidth and Fullwidth Forms''' is a [[Unicode block]] U+FF00&ndash;FFEF, provided so that older encodings containing both [[Halfwidth and fullwidth forms|halfwidth and fullwidth]] characters can have lossless translation to/from Unicode. It is the second-to-last block of the [[Basic Multilingual Plane]], followed only by the short [[Specials (Unicode block)|Specials]] block at U+FFF0&ndash;FFFF. Its block name in Unicode 1.0 was '''Halfwidth and Fullwidth Variants'''.<ref>{{cite web |url=https://www.unicode.org/versions/Unicode1.0.0/CodeCharts2.pdf |work=The Unicode Standard |version=version 1.0 |title=3.8: Block-by-Block Charts |publisher=[[Unicode Consortium]]}}</ref>
'''Halfwidth and Fullwidth Forms''' is a [[Unicode block]] containing Latin, [[Katakana]], and [[Hangul]] jamo characters for compatibility with East Asian character sets.
 
Range U+FF01&ndash;FF5E reproduces the characters of [[ASCII]] 21 to 7E as fullwidth forms. U+FF00 does not correspond to a fullwidth ASCII 20 (space character), since that role is already fulfilled by U+3000 "[[ideographic space]]".
 
Range U+FF61&ndash;FF9F encodes halfwidth forms of [[katakana]] and related punctuation in a transposition of A1 to DF in the [[JIS X 0201]] encoding – see [[half-width kana]].
 
The range U+FFA0&ndash;FFDC encodes halfwidth forms of [[Hangul Compatibility Jamo|compatibility jamo]] characters for [[Hangul]], in a transposition of their [[KS C 5601#1974|1974 standard]] layout. It is used in the mapping of some IBM encodings for Korean, such as IBM code page 933, which allows the use of the [[Shift Out and Shift In characters]] to shift to a double-byte character set.<ref name="ibm933">{{cite web|url=http://demo.icu-project.org/icu-bin/convexp?conv=ibm-933|title=ICU Demonstration - Converter Explorer|website=demo.icu-project.org|access-date=7 May 2018}}</ref> Since the double-byte character set could contain compatibility jamo, halfwidth variants are needed to provide round-trip compatibility.<ref name=hwfwblame>{{Cite web|url=https://harjit.moe/hwfwblame.html|title=Halfwidth and Fullwidth blame}}</ref><ref>{{Cite web|url=http://userguide.icu-project.org/conversion/data|title=Conversion Data - Old ___location of the ICU User Guide}}</ref>
 
Range U+FFE0&ndash;FFEE includes fullwidth and halfwidth symbols.
 
==Block==
{{Unicode chart Halfwidth and Fullwidth Forms}}
 
The block has [[Variant form (Unicode)|variation sequences]] defined for East Asian punctuation positional variants.<ref>{{cite web|url=https://www.unicode.org/L2/L2017/17436r-sv-eastsian-punct.pdf|title=L2/17-436: Proposal to add standardized variation sequences for fullwidth East Asian punctuation|date=2018-01-21|first=Ken|last=Lunde}}</ref><ref name="stdvar">{{cite web|url=https://www.unicode.org/Public/UNIDATA/StandardizedVariants.txt|title=Unicode Character Database: Standardized Variation Sequences | publisher=The Unicode Consortium}}</ref> They use {{sc2|U+FE00 VARIATION SELECTOR-1}} (VS01) and {{sc2|U+FE01 VARIATION SELECTOR-2}} (VS02):
{| class="wikitable nounderlines" style="border-collapse:collapse;background:#FFFFFF;font-size:large;text-align:center"
|+style="font-size:small" | Variation sequences for punctuation alignment
|-style="background:#F8F8F8;font-size:small"
| style="text-align:right" | U+ || FF01 || FF0C || FF0E || FF1A || FF1B || FF1F || style="background:#F8F8F8;font-size:small;text-align:left" | Description
|-
| style="background:#F8F8F8;font-size:small;text-align:left" | base&nbsp;code&nbsp;point || ! || , || . || : || ; || ? || style="font-size:small;text-align:left" |
|-
| style="background:#F8F8F8;font-size:small;text-align:left" | base + VS01 || !&#xfe00; || ,&#xfe00; || .&#xfe00; || :&#xfe00; || ;&#xfe00; || ?&#xfe00; || style="font-size:small;text-align:left" | corner-justified form
|-
| style="background:#F8F8F8;font-size:small;text-align:left" | base + VS02 || !&#xfe01; || ,&#xfe01; || .&#xfe01; || :&#xfe01; || ;&#xfe01; || ?&#xfe01; || style="font-size:small;text-align:left" | centered form
|}
 
An additional variant is defined for a fullwidth [[slashed zero|zero with a short diagonal stroke]]: U+FF10 FULLWIDTH DIGIT ZERO, U+FE00 VS1 ({{not a typo|0&#xfe00;}}).<ref>{{cite web|url=https://www.unicode.org/L2/L2015/15268-slashed-zero.pdf|title=L2/15-268: Proposal to Represent the Slashed Zero Variant of Empty Set|date=2015-10-30|first1=Barbara|last1=Beeton|first2=Asmus|last2=Freytag|first3=Laurențiu|last3=Iancu|first4=Murray|last4=Sargent}}</ref><ref name="stdvar"/>
 
==History==
The following Unicode-related documents record the purpose and process of defining specific characters in the Halfwidth and Fullwidth Forms block:
 
{{sticky header}}
{| class="wikitable collapsible sticky-header"
|-
! [[Unicode#Versions|Version]] !! {{nobr|Final code points<ref group=lower-alpha name=final/>}} !! Count !! [[International Committee for Information Technology Standards|L2]]&nbsp;ID !! [[ISO/IEC JTC 1/SC 2|WG2]]&nbsp;ID !! Document
|-
| rowspan="9" | 1.0.0 || rowspan="9" width="180" | U+FF01..FF5E, FF61..FFBE, FFC2..FFC7, FFCA..FFCF, FFD2..FFD7, FFDA..FFDC, FFE0..FFE6 || rowspan="9" | 216 || || || (to be determined)
|-
| || {{nobr|[https://www.unicode.org/wg2/docs/n4403.pdf N4403 (pdf],}} [https://www.unicode.org/wg2/docs/n4403.doc doc]) || {{Citation|title=Unconfirmed minutes of WG 2 meeting 61, Holiday Inn, Vilnius, Lithuania; 2013-06-10/14|date=2014-01-28|first=V. S.|last=Umamaheswaran|section=Resolution M61.01}}
|-
| {{nobr|[https://www.unicode.org/L2/L2017/17056-sv-western-vs-eastasian.pdf L2/17-056]}} || || {{Citation|title=Proposal to add standardized variation sequences|date=2017-02-13|first=Ken|last=Lunde|author-link=Ken Lunde}}
|-
| {{nobr|[https://www.unicode.org/L2/L2017/17436r-sv-eastsian-punct.pdf L2/17-436]}} || || {{Citation|title=Proposal to add standardized variation sequences for fullwidth East Asian punctuation|date=2018-01-21|first=Ken|last=Lunde}}
|-
| {{nobr|[https://www.unicode.org/L2/L2018/18039-script-adhoc-rec.pdf L2/18-039]}} || || {{Citation|title=Recommendations to UTC #154 January 2018 on Script Proposals|date=2018-01-19|first1=Deborah|last1=Anderson|first2=Ken|last2=Whistler|first3=Roozbeh|last3=Pournader|first4=Lisa|last4=Moore|first5=Hai|last5=Liang|first6=Richard|last6=Cook|section=24. Fullwidth East Asian Punctuation}}
|-
| {{nobr|[https://www.unicode.org/L2/L2017/17362.htm L2/17-362]}} || || {{Citation|title=UTC #153 Minutes|date=2018-02-02|first=Lisa|last=Moore|section=B.4.1 New Proposal to add standardized variation sequence for U+FF10 FULL WIDTH DIGIT ZERO}}
|-
| {{nobr|[https://www.unicode.org/L2/L2018/18115.htm L2/18-115]}} || || {{Citation|title=UTC #155 Minutes|date=2018-05-09|first=Lisa|last=Moore|section=Consensus 154-C17|quote=Add 16 standardized variation sequences based on L2/17-436R, for Unicode 12.0.}}
|-
| {{nobr|[https://www.unicode.org/L2/L2019/19055-segment-fullwd-digits.txt L2/19-055]}} || || {{Citation|title=Proposed Changes in the Segmentation Property Values for Fullwidth Digits|date=2019-01-14|first=Laurențiu|last=Iancu}}
|-
| {{nobr|[https://www.unicode.org/L2/L2019/19008.htm L2/19-008]}} || || {{Citation|title=UTC #158 Minutes|date=2019-02-08|first=Lisa|last=Moore|section=B.11.11.1.2 Proposed changes in the segmentation property values for fullwidth digits}}
|-
| 1.1 || U+FFE8..FFEE || 7 || || || (to be determined)
|-
| rowspan="11" | 3.2 || rowspan="11" | U+FF5F..FF60 || rowspan="11" | 2 || {{nobr|[https://www.unicode.org/L2/L1999/99052.htm L2/99-052]}} || || {{Citation|title=The math pieces from the symbol font|date=1999-02-05|first=Asmus|last=Freytag}}
|-
| {{nobr|[https://www.unicode.org/L2/L2001/01033-addbrackets.htm L2/01-033]}} || || {{Citation|title=Disunify braces/brackets for math, computing science, and Z notation from similar-looking CJK braces/brackets|date=2001-01-16|first1=Kent|last1=Karlsson|first2=Asmus|last2=Freytag}}
|-
| {{nobr|[https://www.unicode.org/L2/L2001/01159-N2344-MathAdHoc.pdf L2/01-159]}} || [https://www.unicode.org/wg2/docs/n2344.pdf N2344] || {{Citation|title=Ad-hoc report on Mathematical Symbols|date=2001-04-03}}
|-
| {{nobr|[https://www.unicode.org/L2/L2001/01157-N2345R-brackets.pdf L2/01-157]}} || [https://www.unicode.org/wg2/docs/n2345r.pdf N2345R] || {{Citation|title=Proposal to disunify certain fencing CJK punctuation marks from similar-looking Math fences|date=2001-04-04|first=Kent|last=Karlsson}}
|-
| {{nobr|[https://www.unicode.org/L2/L2001/01168-hell.txt L2/01-168]}} || || {{Citation|title=Bracket Disunification & Normalization Hell|date=2001-04-10|first=Ken|last=Whistler}}
|-
| {{nobr|[https://www.unicode.org/L2/L2001/01012.htm L2/01-012R]}} || || {{Citation|title=Minutes UTC #86 in Mountain View, Jan 2001|date=2001-05-21|first=Lisa|last=Moore|section=Disunifying Braces and Brackets}}
|-
| {{nobr|[https://www.unicode.org/L2/L2001/01223.htm L2/01-223]}} || || {{Citation|title=Discussion of Issues Regarding Bracket Disunification|date=2001-05-23|first=Michel|last=Suignard}}
|-
| {{nobr|[https://www.unicode.org/L2/L2001/01184.htm L2/01-184R]}} || || {{Citation|title=Minutes from the UTC/L2 meeting|date=2001-06-18|first=Lisa|last=Moore|section=Motion 87-M21|quote=Reverse the decision made in motion 86-M6 not to disunify brackets.}}
|-
| {{nobr|[https://www.unicode.org/L2/L2001/01317-bracket.htm L2/01-317]}} || || {{Citation|title=Bracket Disunification & Normalization|date=2001-08-14|first=Michel|last=Suignard}}
|-
| {{nobr|[https://www.unicode.org/consortium/utc-minutes/UTC-088-200108.html L2/01-295R]}} || || {{Citation|title=Minutes from the UTC/L2 meeting #88|date=2001-11-06|first=Lisa|last=Moore|section=Bracket Disunification and Normalization}}
|-
| {{nobr|[https://www.unicode.org/L2/L2002/02154-n2403-minutes.pdf L2/02-154]}} || [https://www.unicode.org/wg2/docs/n2403.pdf N2403] || {{Citation|title=Draft minutes of WG 2 meeting 41, Hotel Phoenix, Singapore, 2001-10-15/19|date=2002-04-22|first=V. S.|last=Umamaheswaran|section=Resolution M41.1}}
|- class="sortbottom"
| colspan="6" | {{reflist|group=lower-alpha|refs=<ref name=final>Proposed code points and characters names may differ from final code points and names</ref>}}
|}
 
== See also ==
 
* [[CJK Symbols and Punctuation (Unicode block)]]
* [[Hangul Jamo (Unicode block)]]
* [[Katakana (Unicode block)]]
* [[Latin script in Unicode]]
* [[Enclosed Alphanumerics]] - bullet point sequences, some appear as full width (e.g. ⒈,⓵,⑴,⒜,ⓐ)
 
== References ==
{{reflistReflist}}
[[Category:Unicode blocks]]
 
{{Unicode navigation}}
{{writingsystem-stub}}
 
[[Category:Unicode blocks]]
[[Category:Latin-script Unicode blocks]]
[[Category:Kana]]
[[Category:Hangul jamo|*Halfwidth]]