Windows code page: Difference between revisions

Content deleted Content added
History: Fix plurality
EBCDIC code pages: Seems to have been placed in the wrong row (when compared with Microsoft documentation)
 
(13 intermediate revisions by 8 users not shown)
Line 59:
 
== History ==
Initially,Early computer systems had limited storage and systemrestricted programmingthe languagesnumber didof not[[bit]]s makeavailable ato distinctionencode betweena [[character (computing)|character]]. Although earlier proprietary encodings had fewer, the [[ASCII|American Standard Code for Information Interchange]] (ASCII) settled on seven bits: this was sufficient to encode a 96 member subset of the characters used in the US. As eight-bit [[byte]]s came to predominate, Microsoft (and others) expanded the repertoire to 224, to handle a variety of other uses such a box-drawing symbols. The need to provide [[byteprecomposed character]]s for the Western European and South American markets required a different character set: Microsoft established the principle of code pages, one for each alphabet. For the [[List of writing systems#Segmental script|segmental scripts]] used in most of Africa, the Americas, southern and south-east Asia, the Middle East and Europe, a character needs just one byte, but two or more bytes are needed for the [[ideographic]] sets used in the rest of the world. ThisThe subsequentlycode-page ledmodel towas much confusion. Microsoft software and systems priorunable to the [[Windows NT]] line are examples ofhandle this, because they use the OEM and ANSI code pages that do not make the distinctionchallenge.
 
Since the late 1990s, software and systems have adopted [[Unicode]] as their preferred storagecharacter encoding format;: thisUnicode trendis hasdesigned beento improvedhandle bymillions theof widespreadcharacters. adoptionAll ofcurrent Microsoft products and [[XMLapplication program interfaces]] whichuse defaultsUnicode tointernally,{{cn|date=October [[UTF-8]]2020}} but alsosome providesapplications acontinue mechanismto for labellinguse the default encoding{{clarify|date=October used2024}} of the computer's 'locale' when reading and writing text data to files or standard output.<ref>{{citecn|date=October web2020}} Therefore, files may still be encountered that are legible and intelligible in one part of the world but unintelligible [[mojibake]] in another.
|url=http://www.w3.org/TR/xml11/#charencoding
|title=Extensible Markup Language (XML) 1.1 (Second Edition): Character encodings
|publisher=[[W3C]]
|date=29 September 2006
|access-date=5 October 2020
|archive-date=19 April 2021
|archive-url=https://web.archive.org/web/20210419133700/https://www.w3.org/TR/xml11/#charencoding
|url-status=live}}</ref> All current Microsoft products and [[application program interfaces]] use Unicode internally,{{cn|date=October 2020}} but some applications continue to use the default encoding of the computer's 'locale' when reading and writing text data to files or standard output.{{cn|date=October 2020}} Therefore, files may still be encountered that are legible and intelligible in one part of the world but unintelligible [[mojibake]] in another.
 
=== UTF-8, UTF-16 ===
Line 135 ⟶ 127:
* {{anchor|CP437}}[[Code page 437|437]] – IBM PC US, 8-bit [[SBCS]] [[extended ASCII]].<ref name="IBM_CP437">{{cite web|author=IBM|title=SBCS code page information document - CPGID 00437|url=http://www-01.ibm.com/software/globalization/cp/cp00437.html|access-date=2014-07-04|archive-date=2016-06-09|archive-url=https://web.archive.org/web/20160609084933/https://www-01.ibm.com/software/globalization/cp/cp00437.html|url-status=live}}</ref> Known as OEM-US, the encoding of the primary built-in font of VGA graphics cards.<!-- Windows 7 Ultimate --><!-- [[Windows 1.0|1.00]]-[[Windows ME|4.90]] -->
* [[Code page 708|708]] – Arabic, extended [[ISO 8859-6]] (ASMO 708)<!-- Windows 7 Ultimate -->
* [[Code page 720|720]] – Arabic, retaining box drawing characters in their usual locations<!-- Windows 7 Ultimate -->
* [[Code page 737|737]] – "MS-DOS Greek". Retains all box drawing characters. More popular than 869.<!-- Windows 7 Ultimate -->
* [[Code page 775|775]] – "MS-DOS Baltic Rim"<!-- Windows 7 Ultimate -->
* [[Code page 850|850]] – "MS-DOS Latin 1". Full (re-arranged) repertoire of [[ISO 8859-1]].<!-- Windows 7 Ultimate -->
* [[Code page 852|852]] – "MS-DOS Latin 2"<!-- Windows 7 Ultimate -->
* [[Code page 855|855]] – "MS-DOS Cyrillic". Mainly used for [[South Slavic languages]]. Includes (re-arranged) repertoire of [[ISO-8859-5]]. Not to be confused with cp866.<!-- Windows 7 Ultimate -->
* [[Code page 857|857]] – "MS-DOS Turkish"<!-- Windows 7 Ultimate -->
* [[Code page 858|858]] – Western European with euro sign<!-- Windows 7 Ultimate -->
* [[Code page 860|860]] – "MS-DOS Portuguese"<!-- Windows 7 Ultimate -->
* [[Code page 861|861]] – "MS-DOS Icelandic"<!-- Windows 7 Ultimate -->
* [[Code page 862|862]] – "MS-DOS Hebrew"<!-- Windows 7 Ultimate -->
Line 198 ⟶ 190:
|first=Alexandre
|last=Julliard
|archive-date=1911 AprilMarch 2021
|publisher=[[Wine (software)|Wine Project]]
|access-date=2021-03-14
Line 261 ⟶ 254:
|[[Code page 20423|20423]]||423||EBCDIC Greek with Extended Latin<!-- Windows 7 Ultimate -->
|-
|[[Code page 20424|20424]]||-424||x-EBCDIC-KoreanExtended Hebrew<!-- Windows 7 Ultimate -->
|-
|[[Code page 20833|20833]]||833||Korean EBCDIC for N-Byte Hangul; {{code|x-EBCDIC-KoreanExtended}}<!-- Windows 7 Ultimate -->
|-
|[[Code page 20838|20838]]||838||EBCDIC Thai<!-- Windows 7 Ultimate -->
Line 411 ⟶ 404:
* The use of code pages limits the set of characters that may be used.
* Characters expressed in an unsupported code page may be converted to question marks (?) or other [[replacement character]]s, or to a simpler version (such as removing accents from a letter). In either case, the original character may be lost.
 
== See also ==
* [[AppLocale]]&nbsp;– a utility to run non-Unicode (code page-based) applications in a locale of the user's choice.
 
==References==