Content deleted Content added
Improve english; users'; use Unicode as touted |
intro edits; slight reorg; added section titles so TOC wouldn't be halfway down the page |
||
Line 1:
'''Code page''' is the traditional [[International Business Machines|IBM]] term used for a specific [[character encoding]] table: a mapping in which a sequence of [[bit]]s, usually a single octet representing integer values 0 through 255, is associated with a specific character. A few code pages use more than 8 bits per character and thus encode more than 256 characters.
Although IBM created and maintained
The basis of many code pages is [[ASCII]], originally a 7-bit code representing, at most, 128 characters. 8-bit representations of ASCII typically either set the top bit to zero, or used it as a [[parity bit]] in network data transmissions. When this bit was instead made available for representing character data, another 128 characters could be represented. IBM used this extended range to encode characters used by various languages. No formal standard existed for these ‘[[Extended ASCII|extended character sets]]’; IBM merely referred to the variants as code pages, as it had always done for variants of [[EBCDIC]] encodings.▼
To this day, it is typical for PC hardware to support a single 8-bit code page that is, by default, for a particular regional market, and to make available mechanisms for operating systems to switch to other code pages. However, it is now commonplace for operating system vendors to provide their own character encoding and rendering systems that bypass the hardware code pages entirely. These alternative character encodings are sometimes called code pages as well.
▲Although IBM created and maintained a myriad of code pages, the term came to be associated primarily with character maps used by the [[IBM PC]] and compatible platforms. Typically, one code page for a particular regional market is supported in hardware, but mechanisms were available for operating systems to enable the use of other code pages.
== Relationship to ASCII ==
▲The basis of many code pages is [[ASCII]], originally a 7-bit code representing, at most, 128 characters. In the past, 8-bit representations of ASCII typically either set the top bit to zero, or used it as a [[parity bit]] in network data transmissions. When this bit was instead made available for representing character data, another 128 characters could be represented. IBM used this extended range to encode characters used by various languages. No formal standard existed for these ‘[[Extended ASCII|extended character sets]]’; IBM merely referred to the variants as code pages, as it had always done for variants of [[EBCDIC]] encodings.
== Partial List of IBM Code Pages ==
Since the original IBM PC code page (number 437) was not really designed for international use, several incompatible variants emerged. Examples include:
Line 19 ⟶ 25:
* 1114 -- [[Taiwan]]
* 1252 -- Superset of [[ISO 8859-1]], used by [[Microsoft Windows]]
In modern applications, operating systems and programming languages, the IBM code pages have been rendered obsolete by international standards, such as [[ISO 8859-1]] and [[Unicode]].▼
Other code pages of note are:
Line 27 ⟶ 31:
* 12000 -- [[Unicode]] [[little-endian]], 12001 [[big-endian]]
* 20000 -- CNS Taiwan, followed by other national character sets
▲In modern applications, operating systems and programming languages, the IBM code pages have been rendered obsolete by international standards, such as [[ISO 8859-1]] and [[Unicode]].
== Microsoft code pages ==
|