Windows code page: Difference between revisions

Content deleted Content added
History: rewrote to be at least somewhere close to reality. Plenty of room for improvement.
History: tweak for clarity
Line 59:
 
== History ==
Early computer systems had limited storage and restricted the number of [[bit]]s available to encode a [[character (computing)|character]]. Although earlier proprietary encodings had fewer, the [[ASCII|American Standard Code for Information Interchange]] (ASCII) settled on seven bits: this was sufficient to encode a minimal96 member subset of the characters used in the US. As eight-bit [[byte]]s came to predominate, Microsoft (and others) expanded theirthe repertoire to 224, to handle a variety of other uses such a box-drawing symbols. The need to provide [[precomposed character]]s for the Western European and South American markets required a different character set: Microsoft established the principle of code pages, one for each alphabet. For the [[List of writing systems#Segmental script|segmental scripts]] used in most of Africa, the Americas, southern and south-east Asia, the Middle East and Europe, a character needs just one byte but two or more bytes are needed for the [[ideographic]] sets used in the rest of the world. The code-page model was unable to handle this challenge.
 
Since the late 1990s, software and systems have adopted [[Unicode]] as their preferred character encoding format: Unicode is designed to handle millions of characters. All current Microsoft products and [[application program interfaces]] use Unicode internally,{{cn|date=October 2020}} but some applications continue to use the default encoding{{clarify|date=October 2024}} of the computer's 'locale' when reading and writing text data to files or standard output.{{cn|date=October 2020}} Therefore, files may still be encountered that are legible and intelligible in one part of the world but unintelligible [[mojibake]] in another.