Revision as of 04:57, 30 April 2020 edit 88.219.179.109 (talk) →Character encodings ← Previous edit		Revision as of 04:58, 30 April 2020 edit undo 88.219.179.109 (talk) →Character encodings Next edit →
Line 23: == Character encodings == Null-terminated strings require that the encoding does not use a zero byte (0x00) anywhere, therefore it is not possible to store every possible [[ASCII]] or [[UTF-8]] string.<ref>{{cite web\|title=UTF-8, a transformation format of ISO 10646\|url=http://tools.ietf.org/html/rfc3629#section-3\|accessdate=19 September 2013}}</ref><ref><!-- This is the encoding table provided as a resource by the Unicode consortium: http://www.unicode.org/resources/utf8.html -->{{cite web\|title=Unicode/UTF-8-character table\|url=http://www.utf8-chartable.de/\|accessdate=13 September 2013}}</ref><ref>{{cite web\|last=Kuhn\|first=Markus\|title=UTF-8 and Unicode FAQ\|url=http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8\|accessdate=13 September 2013}}</ref> However, it is common to store the subset of ASCII or UTF-8 – every character except the NUL character – in null-terminated strings. Some systems use "[[modified UTF-8]]" which encodes the NUL character as two non-zero bytes (0xC0, 0x80) and thus allow all possible strings to be stored. This is not allowed by the UTF-8 standard, because it is an [[UTF-8#Overlong_encodings\|overlong encoding]], and it is seen as a security risk. A 0xC0, 0x80 NUL might be seen as a string terminator in security validation and as a character when used. Some other byte may be used as end of string instead, like 0xFE or 0xFF, which are not used in UTF-8 (but are also invalid code units!). [[UTF-16]] uses 2-byte integers and as either byte may be zero (and in fact ''every other'' byte is, when representing ASCII text), cannot be stored in a null-terminated byte string. However, some languages implement a string of 16-bit [[UTF-16]] characters, terminated by a 16-bit NUL character. (Again the NUL character, which encodes as a single zero code unit, is the only character that cannot be stored. UTF-16 does not have any alternative encoding of zero).

Null-terminated string: Difference between revisions