Japanese language and computers: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 10:28, 6 November 2022 edit BIL (talk \| contribs) Extended confirmed users, Pending changes reviewers 17,412 edits →Character encodings ← Previous edit		Latest revision as of 14:16, 18 August 2025 edit undo 210.17.224.90 (talk) Question DTP software for vertical Japanese support
(16 intermediate revisions by 14 users not shown)
Line 5: ==Character encodings== There are several standard methods to [[character encoding\|encode]] Japanese characters for use on a computer, including [[JIS encoding\|JIS]], [[Shift-JIS]], [[Extended Unix Code\|EUC]], and [[Unicode]]. While mapping the set of [[kana]] is a simple matter, [[kanji]] has proven more difficult. Despite efforts, none of the encoding schemes have become the de facto standard, and multiple encoding standards were in use by the 2000s. As of 2017, the share of [[UTF-8]] traffic on the Internet has expanded to over 90 % worldwide, and only 1.2% was for using Shift-JIS and EUC. Yet, a few popular websites including [[2channel]] and [[kakaku.com]] are still using Shift-JIS.<ref>{{Cite web\|url=https://internet.watch.impress.co.jp/docs/yajiuma/1086378.html\|title=【やじうまWatch】ウェブサイトにおける文字コードの割合、UTF-8が90％超え。Shift_JISやEUC-JPは？ - INTERNET Watch\|date=2017-10-17\|website=INTERNET Watch\|access-date=2019-05-11}}</ref> Until 2000s, most Japanese [[email]]s were in [[ISO-2022-JP]] ("JIS encoding") and [[web page]]s in [[Shift-JIS]] and mobile phones in Japan usually used some form of [[Extended Unix Code]].<ref>{{Cite web\|url=http://ash.jp/code/code.htm\|title=文字コードについて\|date=2002\|publisher=ASH Corporation\|access-date=2019-05-14}}</ref> If a program fails to determine the encoding scheme employed, it can cause {{Nihongo3\|"misconverted garbled/garbage characters"\|文字化け\|''[[mojibake]]''\|literally "transformed characters"}} and thus unreadable text on computers. [[File:PC-9801F Kanji ROM board.jpg\|thumb\|Kanji [[Read-only memory\|ROM]] card installed in [[PC-9800 series\|PC-98]], which stored about 3000 glyphs, and enabled a quick display. It also had a [[Random-access memory\|RAM]] to store gaiji.]] [[File:Control panel of public background music system.jpg\|thumb\|Embedded devices are still using [[half-width kana]].]] The first encoding to become widely used was [[JIS X 0201]], which is a [[ISO 646\|single-byte encoding]] that only covers standard 7-bit [[ASCII]] characters with [[Half-width kana\|half-width katakana]] extensions. This was widely used in systems that were neither powerful enough nor had the storage to handle kanji (including old embedded equipment such as cash registers) because Kana-Kanji conversion required a complicated process, and output in kanji required much memory and high resolution. This means that only katakana, not kanji, was supported using this technique. Some embedded displays still have this limitation. Line 47: \| a2 \|} This can happen for example in the [[C (programming language)\|C]] programming language, when having Shift-JIS in text strings. It does not happen in HTML since ASCII 0x00–0x3F (which includes ", %, & and some other used escape characters and string separators) do not appear as second byte in Shift-JIS, and backslash is not an escape characters there. But it can happen for [[JavaScript]] which can be embedded in HTML pages. [[Extended Unix Code\|EUC]], on the other hand, is handled much better by parsers that have been written for 7-bit ASCII (and thus [[Extended Unix Code\|EUC]] encodings are used on UNIX, where much of the file-handling code was historically only written for English encodings). But EUC is not backwards compatible with JIS X 0201, the first main Japanese encoding. Further complications arise because the original Internet e-mail standards only support 7-bit transfer protocols. Thus {{IETF RFC\|1468}} ("[[ISO-2022-JP]]", often simply called [[JIS encoding]]) was developed for sending and receiving e-mails.[[File:Japanese TV closed caption using gaiji.jpg\|thumb\|[[Gaiji]] is used in closed caption of Japanese TV broadcasting.]] In [[character set]] standards such as [[JIS X 0208\|JIS]], not all required characters are included, so [[gaiji]] ({{lang\|ja\|外字}} "external characters") are sometimes used to supplement the character set. Gaiji may come in the form of external font packs, where normal characters have been replaced with new characters, or the new characters have been added to unused character positions. However, gaiji are not practical in [[Internet]] environments since the font set must be transferred with text to use the gaiji. As a result, such characters are written with similar or simpler characters in place, or the text may need to be encoded using a larger character set (such as Unicode) that supports the required character.<ref>{{Cite web\|url=http://heicyann.com/pc/20160218a/\|title=住基ネット統一文字コードによる外字の統一について\|last=兵ちゃん\|date=2016-02-18\|access-date=2019-05-14\|archive-date=2020-08-02\|archive-url=https://web.archive.org/web/20200802022153/http://heicyann.com/pc/20160218a/\|url-status=dead}}</ref> [[Unicode]] was intended to solve all encoding problems over all languages. The [[UTF-8]] encoding used to encode Unicode in web pages does not have the disadvantages that Shift-JIS has. Unicode is supported by international software, and it eliminates the need for gaiji. There are still controversies, however. For Japanese, the kanji characters have been [[Han unification\|unified]] with Chinese; that is, a character considered to be the same in both Japanese and Chinese is given a single number, even if the appearance is actually somewhat different, with the precise appearance left to the use of a locale-appropriate font. This process, called [[Han unification]], has caused controversy.{{cn\|date=October 2020}} The previous encodings in Japan, [[Free area of the Republic of China\|Taiwan Area]], [[Mainland China]] and [[Korea]] have only handled one language and Unicode should handle all. The handling of Kanji/Chinese have however been designed by a committee composed of representatives from all four countries/areas.{{cn\|date=October 2020}} Line 61: == Direction of text == [[File:LibreOffice Writer 6.2.3.2 vertical text.png\|thumb\|[[LibreOffice Writer]] supports downward text option.]] Japanese can be written in [[Horizontal and vertical writing in East Asian scripts\|two directions]]. ''Yokogaki'' style writes left-to-right, top-to-bottom, as with English. ''Tategaki'' style writes first top-to-bottom, and then moves right-to-left. To compete with [[Ichitaro (word processor)\|Ichitaro]], Microsoft provided several updates for early Japanese versions of [[Microsoft Word]] including support for downward text, such as Word 5.0 Power Up Kit and Word 98.<ref>{{Cite journal\|year=1994\|title=ASCII EXPRESS : マイクロソフトが「Access」と「Word 5.0 Power Up Kit」を発売\|journal=[[ASCII (magazine)\|ASCII]]\|volume=18\|issue=1}}</ref><ref>{{Cite web\|archive-url=https://web.archive.org/web/20010801160800/http://www.microsoft.com/japan/office/previous/office97/\|title=Microsoft Office 97 Powered by Word 98 製品情報\|date=2001-08-01~~\|website=web.archive.org~~\|publisher=[[Microsoft]]\|url=http://www.microsoft.com/japan/office/previous/office97/\|archive-date=2001-08-01\|access-date=2019-05-14}}</ref> [[QuarkXPress]] was the most popular DTP software in Japan in 1990s, even it had a long development cycle. However, due to lacking support for downward text, it was surpassed by [[Adobe InDesign]] which had strong support for downward text through several updates.<ref>{{Cite web\|url=https://www.edit-u.com/conte/dtp04.html\|title=DTPって何よ（4）［編集って何よ］\|last=エディット-U\|access-date=2019-05-14}}</ref><ref>{{Cite web\|url=https://news.mynavi.jp/article/QuarkXPress_top10-3/\|title=アンチQuarkユーザーが気になるQuarkXPress 8の機能トップ10(3) 縦書きの組版が面倒だったけどどうなのよ?\|date=2008-07-04\|website=MyNavi News\|access-date=2019-05-14}}</ref> At present,{{when\|date=March 2019}} handling of downward text is incomplete. For example, [[HTML]] has no support for ''tategaki'' and Japanese users must use HTML tables to simulate it. However, [[Cascading Style Sheets\|CSS]] level 3 includes a property "<code>writing-mode</code>" which can render ''tategaki'' when given the value "<code>vertical-rl</code>" (i.e. top to bottom, right to left). Word processors and [[Desktop publishing\|DTP]] software{{which\|date=August 2025}} have more complete support for it. == Historical development == The lack of proper Japanese character support on computers limited the influence of large American firms in the Japanese market during the 1980s. Japan, which had been the world's second largest market for computers after the [[United States]] at the time, was dominated by domestic hardware and software makers such as [[NEC]] and [[Fujitsu]].<ref>http://www.hardcoregaming101.net/JPNcomputers/PAC-111.PDF {{Bare URL PDF\|date=July 2025}}</ref><ref>{{cite web \| url=https://www.nytimes.com/1991/07/19/business/company-news-compaq-set-to-invade-japan-market.html \| title=COMPANY NEWS; Compaq Set to Invade Japan Market \| work=The New York Times \| date=19 July 1991 \| last1=Sanger \| first1=David E. }}</ref> [[Microsoft Windows 3.1]] offered improved Japanese language support which played a part in reducing the grip of domestic PC makers throughout the 1990s.<ref>{{Cite web \|title=Windows 95 launches in Japan - UPI Archives \|url=https://www.upi.com/Archives/1995/11/23/Windows-95-launches-in-Japan/7028817102800/ \|access-date=2024-11-21 \|website=UPI \|language=en}}</ref> == See also == Line 74 ⟶ 77: [[Japanese writing system]] [[Japanese language]] [[Chinese input methods for computers]] [[CJK characters]] [[Korean language and computers]] [[Vietnamese language and computers]] *[[Ghost characters]] - Erroneous kanji ==References== Line 92 ⟶ 97: [[Category:Encodings of Japanese]] [[Category:Natural language and computing]] ~~[[Category:Japanese-language computing\| ]]~~