Revision as of 21:05, 27 July 2024 edit Spitzak (talk \| contribs) Extended confirmed users 10,500 edits →Processing time ← Previous edit		Revision as of 01:07, 5 August 2024 edit undo Padgriffin (talk \| contribs) Extended confirmed users, Page movers, New page reviewers, Pending changes reviewers, Rollbackers 20,935 edits →Compatibility issues: Copyedit extremely confusing sentence Tags: Mobile edit Mobile web edit Advanced mobile edit Next edit →
Line 8: A [[UTF-8]] file that contains only [[ASCII]] characters is identical to an ASCII file. Legacy programs can generally handle UTF-8 encoded files, even if they contain non-ASCII characters. For instance, the [[C (programming language)\|C]] [[printf]] function can print a UTF-8 string because it only looks for the ASCII '%' character to define a formatting string. All other bytes are printed unchanged. [[UTF-16]] and [[UTF-32]] are incompatible with ASCII files, and thus require [[Unicode]]-aware programs to display, print, and manipulate them even if the file is known to contain only characters in the ASCII subset. Because they contain many zero bytes, character strings representing such files cannot be manipulated by common [[null-terminated string]] handling logic.{{efn\|ASCII software ''not'' using null characters to terminate strings would handle UTF-16 and UTF-32 encoded files correctly (such files, if containing only ASCII-subset characters, would appear as normal ASCII padded with [[null character]]s), but such software is not common.{{cn\|date=July 2024}}}} The prevalence of string handling using this logic means that, even in the context of UTF-16 systems such as [[Windows]] and [[Java (software platform)\|Java]], UTF-16 text files are not commonly used. Rather, older 8-bit encodings such as ASCII or [[ISO-8859-1]] are still used, forgoing Unicode support entirely, or UTF-8 is used for Unicode.{{cn\|date=July 2024}} One rare counter-example is the "strings" file introduced in [[Mac OS X Panther\|Mac OS X 10.3 Panther]], ~~and~~which ~~later "strings" file~~is used by applications to lookup internationalized versions of messages. By default, this file is ~~encode~~encoded in UTF-16, with "files encoded using UTF-8 ... not guaranteed to work."<ref>{{Cite web\|url=https://developer.apple.com/documentation/MacOSX/Conceptual/BPInternational/Articles/StringsFiles.html\|title=Apple Developer Connection: Internationalization Programming Topics: Strings Files}}</ref> [[XML]] is [[de facto\|conventionally]] encoded as UTF-8,{{cn\|date=July 2024}}, and all XML processors must at least support UTF-8 and UTF-16.<ref>{{cite web \|url=http://www.w3.org/TR/xml/#charencoding \|title=Character Encoding in Entities

Comparison of Unicode encodings: Difference between revisions