Windows code page: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 04:42, 11 October 2024 edit 14.203.10.202 (talk) →History: Fix plurality ← Previous edit		Latest revision as of 07:11, 15 August 2025 edit undo HarJIT (talk \| contribs) Extended confirmed users 12,434 edits →EBCDIC code pages: Seems to have been placed in the wrong row (when compared with Microsoft documentation)
(13 intermediate revisions by 8 users not shown)
Line 59: == History == ~~Initially,~~Early computer systems had limited storage and ~~system~~restricted ~~programming~~the ~~languages~~number ~~did~~of ~~not~~[[bit]]s ~~make~~available ato ~~distinction~~encode ~~between~~a [[character (computing)\|character]]. Although earlier proprietary encodings had fewer, the [[ASCII\|American Standard Code for Information Interchange]] (ASCII) settled on seven bits: this was sufficient to encode a 96 member subset of the characters used in the US. As eight-bit [[byte]]s came to predominate, Microsoft (and others) expanded the repertoire to 224, to handle a variety of other uses such a box-drawing symbols. The need to provide [[~~byte~~precomposed character]]s for the Western European and South American markets required a different character set: Microsoft established the principle of code pages, one for each alphabet. For the [[List of writing systems#Segmental script\|segmental scripts]] used in most of Africa, the Americas, southern and south-east Asia, the Middle East and Europe, a character needs just one byte, but two or more bytes are needed for the [[ideographic]] sets used in the rest of the world. ~~This~~The ~~subsequently~~code-page ~~led~~model towas ~~much confusion. Microsoft software and systems prior~~unable to ~~the [[Windows NT]] line are examples of~~handle this~~, because they use the OEM and ANSI code pages that do not make the~~ ~~distinction~~challenge. Since the late 1990s, software and systems have adopted [[Unicode]] as their preferred ~~storage~~character encoding format;: ~~this~~Unicode ~~trend~~is ~~has~~designed ~~been~~to ~~improved~~handle bymillions ~~the~~of ~~widespread~~characters. ~~adoption~~All ofcurrent Microsoft products and [[~~XML~~application program interfaces]] ~~which~~use ~~defaults~~Unicode tointernally,{{cn\|date=October ~~[[UTF-8]]~~2020}} but ~~also~~some ~~provides~~applications acontinue ~~mechanism~~to ~~for labelling~~use the default encoding{{clarify\|date=October ~~used~~2024}} of the computer's 'locale' when reading and writing text data to files or standard output.~~<ref>~~{{~~cite~~cn\|date=October ~~web~~2020}} Therefore, files may still be encountered that are legible and intelligible in one part of the world but unintelligible [[mojibake]] in another. ~~\|url=http://www.w3.org/TR/xml11/#charencoding~~ ~~\|title=Extensible Markup Language (XML) 1.1 (Second Edition): Character encodings~~ ~~\|publisher=[[W3C]]~~ ~~\|date=29 September 2006~~ ~~\|access-date=5 October 2020~~ \|archive-date=19 April 2021▼ ~~\|archive-url=https://web.archive.org/web/20210419133700/https://www.w3.org/TR/xml11/#charencoding~~ \|url-status=live}}</ref> All current Microsoft products and [[application program interfaces]] use Unicode internally,{{cn\|date=October 2020}} but some applications continue to use the default encoding of the computer's 'locale' when reading and writing text data to files or standard output.{{cn\|date=October 2020}} Therefore, files may still be encountered that are legible and intelligible in one part of the world but unintelligible [[mojibake]] in another. === UTF-8, UTF-16 === Line 135 ⟶ 127: * {{anchor\|CP437}}[[Code page 437\|437]] – IBM PC US, 8-bit [[SBCS]] [[extended ASCII]].<ref name="IBM_CP437">{{cite web\|author=IBM\|title=SBCS code page information document - CPGID 00437\|url=http://www-01.ibm.com/software/globalization/cp/cp00437.html\|access-date=2014-07-04\|archive-date=2016-06-09\|archive-url=https://web.archive.org/web/20160609084933/https://www-01.ibm.com/software/globalization/cp/cp00437.html\|url-status=live}}</ref> Known as OEM-US, the encoding of the primary built-in font of VGA graphics cards.<!-- Windows 7 Ultimate --><!-- [[Windows 1.0\|1.00]]-[[Windows ME\|4.90]] --> * [[Code page 708\|708]] – Arabic, extended [[ISO 8859-6]] (ASMO 708)<!-- Windows 7 Ultimate --> * ~~[[Code page~~ 720~~\|720]]~~ – Arabic, retaining box drawing characters in their usual locations<!-- Windows 7 Ultimate --> * [[Code page 737\|737]] – "MS-DOS Greek". Retains all box drawing characters. More popular than 869.<!-- Windows 7 Ultimate --> * ~~[[Code page~~ 775~~\|775]]~~ – "MS-DOS Baltic Rim"<!-- Windows 7 Ultimate --> * [[Code page 850\|850]] – "MS-DOS Latin 1". Full (re-arranged) repertoire of [[ISO 8859-1]].<!-- Windows 7 Ultimate --> * [[Code page 852\|852]] – "MS-DOS Latin 2"<!-- Windows 7 Ultimate --> * ~~[[Code page~~ 855~~\|855]]~~ – "MS-DOS Cyrillic". Mainly used for [[South Slavic languages]]. Includes (re-arranged) repertoire of [[ISO-8859-5]]. Not to be confused with cp866.<!-- Windows 7 Ultimate --> * ~~[[Code page~~ 857~~\|857]]~~ – "MS-DOS Turkish"<!-- Windows 7 Ultimate --> * [[Code page 858\|858]] – Western European with euro sign<!-- Windows 7 Ultimate --> * ~~[[Code page~~ 860~~\|860]]~~ – "MS-DOS Portuguese"<!-- Windows 7 Ultimate --> * [[Code page 861\|861]] – "MS-DOS Icelandic"<!-- Windows 7 Ultimate --> * [[Code page 862\|862]] – "MS-DOS Hebrew"<!-- Windows 7 Ultimate --> Line 198 ⟶ 190: \|first=Alexandre \|last=Julliard ▲\|~~archive-~~date=1911 ~~April~~March 2021 \|publisher=[[Wine (software)\|Wine Project]] \|access-date=2021-03-14 Line 261 ⟶ 254: \|[[Code page 20423\|20423]]\|\|423\|\|EBCDIC Greek with Extended Latin<!-- Windows 7 Ultimate --> \|- \|[[Code page 20424\|20424]]\|\|-424\|\|x-EBCDIC~~-KoreanExtended~~ Hebrew<!-- Windows 7 Ultimate --> \|- \|[[Code page 20833\|20833]]\|\|833\|\|Korean EBCDIC for N-Byte Hangul; {{code\|x-EBCDIC-KoreanExtended}}<!-- Windows 7 Ultimate --> \|- \|[[Code page 20838\|20838]]\|\|838\|\|EBCDIC Thai<!-- Windows 7 Ultimate --> Line 411 ⟶ 404: * The use of code pages limits the set of characters that may be used. * Characters expressed in an unsupported code page may be converted to question marks (?) or other [[replacement character]]s, or to a simpler version (such as removing accents from a letter). In either case, the original character may be lost. ~~== See also ==~~ * [[AppLocale]] – a utility to run non-Unicode (code page-based) applications in a locale of the user's choice. ==References==