Halfwidth and fullwidth forms: Difference between revisions

Content deleted Content added
Liliana-60 (talk | contribs)
Undid revision 156347834 by 4.229.39.224 (talk)
Undid revision 1295152914 by 27.55.70.178 (talk) Unexplained removal of content
 
(192 intermediate revisions by more than 100 users not shown)
Line 1:
{{Short description|Alternative width characters in East Asian typography}}
'''Halfwidth and Fullwidth Forms''' is the name of [[Unicode]] block U+FF00–FFEF, the last of the [[Basic Multilingual Plane]] excepting the short "[[Unicode Specials|Specials]]" block at U+FFF0–FFFF.
{{For|the Unicode chart block|Halfwidth and Fullwidth Forms (Unicode block)}}
[[File:Command Prompt on Windows XP (Korean).png|thumb|349px|A command prompt ([[cmd.exe]]) with Korean localisation, showing halfwidth and fullwidth characters]]
In [[CJK characters|CJK]] (Chinese, Japanese, and Korean) computing, [[graphic character]]s are traditionally classed into '''fullwidth'''{{efn|In [[Taiwan]] and [[Hong Kong]]: [[wikt:全形|全形]]; in CJK: [[wikt:全角|全角]].}} and '''halfwidth'''{{efn|In [[Taiwan]] and [[Hong Kong]]: [[wikt:半形|半形]]; in CJK: [[wikt:半角|半角]].}} characters. Unlike [[monospaced font]]s, a halfwidth character occupies half the width of a fullwidth character, hence the name.
 
''[[Halfwidth and Fullwidth Forms (Unicode block)|Halfwidth and Fullwidth Forms]]'' is also the name of a [[Unicode block]] U+FF00–FFEF, provided so that older encodings containing both halfwidth and fullwidth characters can have lossless translation to and from Unicode.
U+FF01–FF5E reproduce the characters of [[ASCII]] 21 to 7E as [[fullwidth forms]] ([[zenkaku]]), that is, as [[monospace]] glyphs with the same width as a fullwidth [[Kanji]]. This is useful for typesetting Latin characters in a [[CJK]] environment. U+FF00 does not correspond to a fullwith ASCII 20 (space character), since that role is already fulfilled by U+3000 "ideographic space".
 
==Rationale==
U+FF65–FFDC encode [[halfwidth forms]] ([[hankaku]]), of [[Katakana]] and [[Hangul]] characters. U+FFE0–FFEE are fullwidth and halfwidth symbols.
{{More citations needed section|date=April 2021}}
[[File:Alternative names of JIS X 0213.svg|thumb|Characters which appear in both [[JIS X 0201]] (single byte) and [[JIS X 0208]] / [[JIS X 0213]] (double byte) have both a halfwidth and a fullwidth form in [[Shift JIS]].|class=skin-invert]]
In the days of [[text mode]] computing, Western characters were normally laid out in a grid on the screen, often 80 columns by 24 or 25 lines. Each character was displayed as a small [[dot matrix]], often about 8 [[pixel]]s wide, and an [[SBCS]] (single-byte character set) was generally used to encode characters of Western languages.
 
For aesthetic reasons and readability, it is preferable for [[Chinese characters]] to be approximately square-shaped, therefore twice as wide as these fixed-width SBCS characters. As these were typically encoded in a [[double-byte character set|DBCS]] (double-byte character set), this also meant that their width on screen in a [[duospaced font]] was proportional to their byte length. Some terminals and editing programs could not deal with double-byte characters starting at odd columns, only even ones (some could not even put double-byte and single-byte characters in the same line). So the DBCS sets generally included Roman characters and digits also, for use alongside the CJK characters in the same line.
==Chart==
 
{{Unicode chart Halfwidth and Fullwidth Forms}}
On the other hand, early Japanese computing used a single-byte code page called [[JIS X 0201]] for [[katakana]]. These would be rendered at the same width as the other single-byte characters, making them [[half-width kana]] characters rather than normally proportioned kana. Although the JIS X 0201 standard itself did not specify half-width display for katakana, this became the visually distinguishing feature in [[Shift JIS]] between the single-byte JIS X 0201 and double-byte [[JIS X 0208]] katakana. Some IBM code pages used a similar treatment for [[Hangul#Letters|Korean jamo]],<ref name="ibm933">{{cite web |url=http://demo.icu-project.org/icu-bin/convexp?conv=ibm-933 |title=ICU Demonstration - Converter Explorer |website=demo.icu-project.org |access-date=7 May 2018}}</ref> based on the [[KS C 5601#1974|N-byte Hangul code]] and its [[EBCDIC]] translation.
 
==In Unicode==
{{see also|Halfwidth and Fullwidth Forms (Unicode block)}}
For compatibility with existing character sets that contained both half- and fullwidth versions of the same character, [[Unicode]] allocated a single block at U+FF00&ndash;FFEF containing the necessary "alternative width" characters. This includes a fullwidth version of all the [[ASCII]] characters and some non-ASCII punctuation such as the Yen sign, halfwidth versions of katakana and [[hangul]], and halfwidth versions of some other symbols such as circles. Only characters needed for lossless round trip to existing character sets were allocated, rather than (for instance) making a fullwidth version of every Latin accented character.
 
Unicode assigns ''every'' code point an "East Asian width" [[Unicode character property|property]]. This may be:<ref name="uax11">{{cite web |url=https://unicode.org/reports/tr11/ |title=Unicode® Standard Annex #11: East Asian Width |last1=Lunde |first1=Ken |author-link=Ken Lunde |publisher=[[Unicode Consortium]] |date=2019-01-25}}</ref>
 
{|class=wikitable
|+Unicode character properties based on width
|-
!scope="col"|Abbreviation
!scope="col"|Name
!scope="col"|Description
|-
!scope="row"|W
|Wide||Naturally wide character, e.g. [[Hiragana]].
|-
!scope="row"|Na
|Narrow||Naturally narrow character, e.g. [[ISO Basic Latin alphabet]].
|-
!scope="row"|F
|Fullwidth||Wide variant with [[NFKC|compatibility normalisation]] to naturally narrow character, e.g. fullwidth Latin script.
|-
!scope="row"|H
|Halfwidth||Narrow variant with [[NFKC|compatibility normalisation]] to naturally wide character, e.g. [[half-width kana]]. Includes U+20A9 ([[won sign|₩]]) as an exception.
|-
!scope="row"|A
|Ambiguous||Characters included in East Asian DBCS codes but also in European SBCS codes, e.g. [[Greek alphabet]]. Duospaced behaviour can consequently vary.
|-
!scope="row"|N
|Neutral||Characters which do not appear in East Asian DBCS codes, e.g. [[Devanagari]].
|}
 
[[Terminal emulator]]s can use this property to decide whether a character should consume one or two "columns" when figuring out tabs and cursor position.
 
==In OpenType==
[[OpenType]] has the <code>fwid</code>, <code>halt</code>, <code>hwid</code>, and <code>vhal</code> feature tags to be used to reproduce fullwidth or halfwidth form of a character. [[CSS]] provides control over these features using <code>font-variant-east-asian</code> and <code>font-feature-settings</code> properties.<ref>{{cite web |url=https://helpx.adobe.com/fonts/using/open-type-syntax.html |title=Syntax for OpenType features in CSS |publisher=[[Adobe Inc.|Adobe]] |access-date=2023-09-20}}</ref>
 
==See also==
* [[CJK Symbols and Punctuation (Unicode block)|East Asian punctuation]]
*[[CJK]]
* [[Em size]] – full width forms
*[[Han unification]]
* [[Enclosed Alphanumerics]] – bullet point sequences; some appear as fullwidth (e.g. ⒈, ⓵, ⑴, ⒜, ⓐ)
*[[Monospace]]
* [[East AsianHan Punctuationunification]]
* [[Hangul Jamo (Unicode block)]]
* [[Katakana (Unicode block)]]
* [[Latin script in Unicode]]
 
==ChartNotes==
{{Notelist}}
 
==References==
{{Reflist}}
 
==External links==
* [https://www.unicode.org/reports/tr11/tr11-31.html East Asian Width] Unicode Standard Annex #11
*http://www.alanwood.net/unicode/halfwidth_and_fullwidth_forms.html
 
*http://everything2.com/index.pl?node=Halfwidth%20and%20Fullwidth%20Forms
{{Unicode navigation}}
 
[[Category:UnicodeEast Asian typography]]
[[Category:Kana]]
[[Category:Hangul jamo|*Halfwidth]]