Revision as of 08:51, 4 November 2021 edit AnomieBOT (talk \| contribs) Bots 6,856,177 edits m Dating maintenance tags: {{Citation needed}} ← Previous edit		Revision as of 01:03, 12 November 2021 edit undo 207.151.52.220 (talk) →Character encodings Next edit →
Line 24: == Character encodings == Null-terminated strings require that the encoding does not use a zero byte (0x00) anywhere,; therefore it is not possible to store every possible [[ASCII]] or [[UTF-8]] string.<ref>{{cite web\|title=UTF-8, a transformation format of ISO 10646\|url=http://tools.ietf.org/html/rfc3629#section-3\|access-date=19 September 2013}}</ref><ref><!-- This is the encoding table provided as a resource by the Unicode consortium: http://www.unicode.org/resources/utf8.html -->{{cite web\|title=Unicode/UTF-8-character table\|url=http://www.utf8-chartable.de/\|access-date=13 September 2013}}</ref><ref>{{cite web\|last=Kuhn\|first=Markus\|title=UTF-8 and Unicode FAQ\|url=http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8\|access-date=13 September 2013}}</ref> However, it is common to store the subset of ASCII or UTF-8 – every character except NUL – in null-terminated strings. Some systems use "[[modified UTF-8]]" which encodes NUL as two non-zero bytes (0xC0, 0x80) and thus allow all possible strings to be stored. This is not allowed by the UTF-8 standard, because it is an [[UTF-8#Overlong encodings\|overlong encoding]], and it is seen as a security risk. Some other byte may be used as end of string instead, like 0xFE or 0xFF, which are not used in UTF-8. [[UTF-16]] uses 2-byte integers and as either byte may be zero (and in fact ''every other'' byte is, when representing ASCII text), cannot be stored in a null-terminated byte string. However, some languages implement a string of 16-bit [[UTF-16]] characters, terminated by a 16-bit NUL

Null-terminated string: Difference between revisions