Revision as of 10:19, 11 October 2024 edit AnomieBOT (talk \| contribs) Bots 6,857,773 edits m Dating maintenance tags: {{Clarify}} ← Previous edit		Revision as of 15:49, 13 October 2024 edit undo JMF (talk \| contribs) Extended confirmed users 61,398 edits →History: rewrote to be at least somewhere close to reality. Plenty of room for improvement. Next edit →
Line 59: == History == ~~Initially,~~Early computer systems had limited storage and ~~system~~restricted ~~programming~~the ~~languages~~number ~~did~~of ~~not~~[[bit]]s ~~make~~available ato ~~distinction~~encode ~~between~~a [[character (computing)\|character]]. Although earlier proprietary encodings had fewer, the [[ASCII\|American Standard Code for Information Interchange]] (ASCII) settled on seven bits: this was sufficient to encode a minimal subset of the characters used in the US. As eight-bit [[byte]]s came to predominate, Microsoft (and others) expanded their repertoire to 224, to handle a variety of other uses such a box-drawing symbols. The need to provide [[~~byte~~precomposed character]]s for the Western European and South American markets required a different character set: Microsoft established the principle of code pages, one for each alphabet. For the [[List of writing systems#Segmental script\|segmental scripts]] used in most of Africa, the Americas, southern and south-east Asia, the Middle East and Europe, a character needs just one byte, but two or more bytes are needed for the [[ideographic]] sets used in the rest of the world. ~~This~~The ~~subsequently~~code-page ~~led~~model towas ~~much confusion. Microsoft software and systems prior~~unable to ~~the [[Windows NT]] line are examples of~~handle this~~, because they use the OEM and ANSI code pages that do not make the~~ ~~distinction~~challenge. Since the late 1990s, software and systems have adopted [[Unicode]] as their preferred character encoding format;: ~~this~~Unicode ~~trend~~is ~~has~~designed ~~been~~to ~~improved~~handle bymillions ~~the~~of ~~widespread~~characters. ~~adoption~~All ofcurrent Microsoft products and [[~~XML~~application program interfaces]] ~~which~~use ~~defaults~~Unicode tointernally,{{cn\|date=October ~~[[UTF-8]]~~2020}} but ~~also~~some ~~provides~~applications acontinue ~~mechanism~~to ~~for labelling~~use the default encoding{{clarify\|date=October ~~used~~2024}} of the computer's 'locale' when reading and writing text data to files or standard output.~~<ref>~~{{~~cite~~cn\|date=October ~~web~~2020}} Therefore, files may still be encountered that are legible and intelligible in one part of the world but unintelligible [[mojibake]] in another. ~~\|url=http://www.w3.org/TR/xml11/#charencoding~~ ~~\|title=Extensible Markup Language (XML) 1.1 (Second Edition): Character encodings~~ ~~\|publisher=[[W3C]]~~ ~~\|date=29 September 2006~~ ~~\|access-date=5 October 2020~~ ~~\|archive-date=19 April 2021~~ ~~\|archive-url=https://web.archive.org/web/20210419133700/https://www.w3.org/TR/xml11/#charencoding~~ \|url-status=live}}</ref> All current Microsoft products and [[application program interfaces]] use Unicode internally,{{cn\|date=October 2020}} but some applications continue to use the default encoding{{clarify\|date=October 2024}} of the computer's 'locale' when reading and writing text data to files or standard output.{{cn\|date=October 2020}} Therefore, files may still be encountered that are legible and intelligible in one part of the world but unintelligible [[mojibake]] in another. === UTF-8, UTF-16 ===

Windows code page: Difference between revisions