Revision as of 10:21, 12 November 2023 edit Onel5969 (talk \| contribs) Autopatrolled, Extended confirmed users, Page movers, New page reviewers, Pending changes reviewers, Rollbackers 992,412 edits m Disambiguating links to C string (link changed to C string handling) using DisamAssist. ← Previous edit		Revision as of 08:18, 10 January 2025 edit undo Citation bot (talk \| contribs) Bots 5,863,304 edits Add: date, authors 1-1. Removed URL that duplicated identifier. \| Use this bot. Report bugs. \| Suggested by Dominic3203 \| Category:String data structures \| #UCB_Category 1/16 Next edit →
Line 15: This had some influence on CPU [[instruction set]] design. Some CPUs in the 1970s and 1980s, such as the [[Zilog Z80]] and the [[Digital Equipment Corporation\|DEC]] [[VAX]], had dedicated instructions for handling length-prefixed strings. However, as the null-terminated string gained traction, CPU designers began to take it into account, as seen for example in IBM's decision to add the "Logical String Assist" instructions to the [[IBM ES/9000 family\|ES/9000]] 520 in 1992 and the vector string instructions to the [[IBM z13 (microprocessor)\|IBM z13]] in 2015.<ref name=pop>[http://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf IBM z/Architecture Principles of Operation]</ref> [[FreeBSD]] developer [[Poul-Henning Kamp]], writing in ''[[ACM Queue]]'', referred to the victory of null-terminated strings over a 2-byte (not one-byte) length as "the most expensive one-byte mistake" ever.<ref>{{citation \|last=Kamp \|first=Poul-Henning \|date=25 July 2011 \|title=The Most Expensive One-byte Mistake \|journal=ACM Queue \|volume=9 \|number=7 \|pages=40–43 \|doi=10.1145/2001562.2010365 \|s2cid=30282393 \|issn=1542-7730~~\|url=http://queue.acm.org/detail.cfm?id=2010365~~ \|doi-access=free }}</ref> == Limitations == Line 27: == Character encodings == Null-terminated strings require that the encoding does not use a zero byte (0x00) anywhere; therefore it is not possible to store every possible [[ASCII]] or [[UTF-8]] string.<ref>{{cite web\|title=UTF-8, a transformation format of ISO 10646\|date=November 2003 \|url=http://tools.ietf.org/html/rfc3629#section-3\|access-date=19 September 2013 \|last1=Yergeau \|first1=François }}</ref><ref><!-- This is the encoding table provided as a resource by the Unicode consortium: http://www.unicode.org/resources/utf8.html -->{{cite web\|title=Unicode/UTF-8-character table\|url=http://www.utf8-chartable.de/\|access-date=13 September 2013}}</ref><ref>{{cite web\|last=Kuhn\|first=Markus\|title=UTF-8 and Unicode FAQ\|url=http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8\|access-date=13 September 2013}}</ref> However, it is common to store the subset of ASCII or UTF-8 – every character except NUL – in null-terminated strings. Some systems use "[[modified UTF-8]]" which encodes NUL as two non-zero bytes (0xC0, 0x80) and thus allow all possible strings to be stored. This is not allowed by the UTF-8 standard, because it is an [[UTF-8#Overlong encodings\|overlong encoding]], and it is seen as a security risk. Some other byte may be used as end of string instead, like 0xFE or 0xFF, which are not used in UTF-8. [[UTF-16]] uses 2-byte integers and as either byte may be zero (and in fact ''every other'' byte is, when representing ASCII text), cannot be stored in a null-terminated byte string. However, some languages implement a string of 16-bit [[UTF-16]] characters, terminated by a 16-bit NUL (0x0000).

Null-terminated string: Difference between revisions