Code page is the traditional IBM term used for a specific character encoding table (a mapping of binary integer values, often 0 through 255, to a specific character or glyph) for computers.
Although the basis of many character sets is ASCII, ASCII was a seven bit code, with the 8-bit representation of ASCII typically either setting the top bit to zero or using it as a parity bit. Using this bit for data doubled the size of the possible character set, allowing another 128 characters to be added. No standard existed for these ‘extended character sets’, and IBM referred to the variants as code pages (as it had always done for variants of EBCDIC encodings).
Although IBM maintained a myriad of code pages, the term has in wider use taken on the meaning character coding for the IBM PC. Since the original IBM PC code page was not really designed for international use, several incompatible variants emerged:
- 437 -- Original PC extended character set
- 737 -- Greek characters
- 850 -- Multilingual, most European languages
- 858 -- Multilingual with euro symbol
- 860 -- Portugal
- 863 -- French Canadian
- 865 -- Nordic
- 868 -- Urdu
- 899 -- Symbol
- 904 -- Taiwan
- 1088 -- Revised Korean
- 1114 -- Taiwan
- 1252 -- Superset of ISO 8859-1, used by Microsoft Windows
The IBM PC code pages have now been rendered obsolete by international standards, specifically ISO 8859-1 and Unicode.
Other code pages of note are:
- 10000 -- Macintosh Roman (followed by several other Mac character sets)
- 12000 -- Unicode little-endian, 12001 big-endian
- 20000 -- CNS Taiwan, followed by other national character sets
Microsoft code pages
Microsoft defined a number of proprietary code page extensions which were subtly incompatible with those by other vendors.
The most notable of these is the windows-1252 code page, which contains a range of typographical punctuation characters in character positions which were reserved for control characters in the ISO 8859-1 "latin-1" code page.
Many Microsoft products produce characters in these ranges automatically, notably with ‘smartquotes’. This means that other software has to choose between
- not interoperating with documents produced with Microsoft applications
- mis-rendering the text in question
- adding full support for the Microsoft code pages, in effect making Microsoft's implementation a de facto standard.
These code pages are often viewed as part of Microsoft’s embrace, extend and extinguish strategy towards open standards. Fortunately, the transition to full Unicode support now offers standards-based applications the possibility of full interoperability with the character repertoire of these documents without giving up standards compliance on output.