Windows code page: Difference between revisions

Content deleted Content added
m Note that the issue applies to Java too
m Removed mention, its not very relevant to the article. Left note at :Talk
Line 1:
Recent [[Microsoft]] products and APIs use [[Unicode]] internally, but many applications and APIs (including [[Java]]) continue to use the default [[character encodings| encoding]] of the computer's ''locale'' when reading and writing text data to files, files or standard output, by default.
 
Initially, computer systems and sytem programming languages did not make a distinction between characters and bytes. This has lead to many confusions subsequently. Microsoft software and systems from the 1980s will tend to use the IBM-derived OEM '''code pages'''. Software and systems from the 1990s (pre-Unicode [[Microsoft Windows]] applications) will tend to use extended versions of the national or international standard sets, so-called ANSI. Since the the late 1990s, software and systems are increasingly adopting more direct encodings of [[Unicode]], in particular UTF-8 and UTF-16; this trend has been improved by by the widespread adoption of [[XML]] which provides a more adequate mechanism for labelling the encoding used.
 
The [[original equipment manufacturer|OEM]] code pages are used in console windows and can be considered a holdover from [[DOS]] and the original [[IBM PC]] architecture. The [[ANSI]] code pages are used for non-[[Unicode]] applications using the Windows [[graphical user interface|GUI]]. Two single-byte, fixed-width code pages (874 for [[Thai language|Thai]] and 1258 for [[Vietnamese language|Vietnamese]]) and four multibyte [[CJK]] code pages ([[Shift JIS|932]], [[GBK|936]], [[Hangul|949]], [[Big5|950]]) are used as both ANSI and OEM code pages. Both ANSI and OEM code pages are [[extended ASCII]] code pages.