Code page

This is an old revision of this page, as edited by Patrick (talk | contribs) at 10:07, 7 July 2004 (Microsoft code pages: the euro sign, and a few other special characters,). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Code page is the traditional IBM term used for a specific character encoding table: a mapping in which a sequence of bits, usually a single octet representing integer values 0 through 255, is associated with a specific character.

The basis of many code pages is ASCII, originally a 7-bit code representing, at most, 128 characters. 8-bit representations of ASCII typically either set the top bit to zero, or used it as a parity bit in network data transmissions. When this bit was instead made available for representing character data, another 128 characters could be represented. IBM used this extended range to encode characters used by various languages. No formal standard existed for these ‘extended character sets’; IBM merely referred to the variants as code pages, as it had always done for variants of EBCDIC encodings.

Although IBM created and maintained a myriad of code pages, the term came to be associated primarily with character maps used by the IBM PC and compatible platforms. Typically, one code page for a particular regional market is supported in hardware, but mechanisms were available for operating systems to enable the use of other code pages.

Since the original IBM PC code page (number 437) was not really designed for international use, several incompatible variants emerged. Examples include:

In modern applications, operating systems and programming languages, the IBM code pages have been rendered obsolete by international standards, such as ISO 8859-1 and Unicode.

Other code pages of note are:

Microsoft code pages

Microsoft defined a number of proprietary code page extensions which were subtly incompatible with those by other vendors.

The most notable of these is the windows-1252 code page, which contains a range of typographical punctuation characters, the euro sign, and a few other special characters, in character positions which were reserved for control characters in the ISO 8859-1 "latin-1" code page.

Many Microsoft products produce characters in these ranges automatically, notably with ‘smartquotes’. This means that other software has to choose between

  • not interoperating with documents produced with Microsoft applications
  • mis-rendering the text in question
  • adding full support for the Microsoft code pages, in effect making Microsoft's implementation a de facto standard.

These code pages are often viewed as part of Microsoft’s embrace, extend and extinguish strategy towards open standards. Fortunately, the transition to full Unicode support now offers standards-based applications the possibility of full interoperability with the character repertoire of these documents without giving up standards compliance on output.

See also