Content deleted Content added
→Character encodings: Clarifications. |
→History: removed dead link |
||
(49 intermediate revisions by 36 users not shown) | |||
Line 1:
{{Short description|Data structure}}
{{Redirect|CString||C string (disambiguation)}}
{{see also|String (computer science)#Null-terminated}}
In [[computer programming]], a '''null-terminated string''' is a [[character string]] stored as an [[Array data structure|array]] containing the characters and terminated with a [[null character]] (<code>'\0'</code>, called NUL in [[ASCII]]). Alternative names are '''[[C string]]''', which refers to the [[C (programming language)|C programming language]] and '''ASCIIZ''' (note that C strings do not imply the use of ASCII).▼
▲In [[computer programming]], a '''null-terminated string''' is a [[character string]] stored as an [[Array data structure|array]] containing the characters and terminated with a ''[[null character]]
The length of a C string is found by searching for the (first) NUL byte. This can be slow as it takes O(''n'') ([[linear time]]) with respect to the string length. It also means that a NUL cannot be inside the string, as the only NUL is the one marking the end.▼
▲The length of a
== History ==
Null-terminated strings were produced by the <code>.ASCIZ</code> directive of the [[PDP-11]] [[assembly language]]s and the <code>ASCIZ</code> directive of the [[MACRO-10]] macro assembly language for the [[PDP-10]]. These predate the development of the C programming language, but other forms of strings were often used.
At the time C (and the languages that it was derived from) was developed, memory was extremely limited, so using only one byte of overhead to store the length of a string was attractive. The only popular alternative at that time,
This had some influence on CPU [[instruction set]] design. Some CPUs in the 1970s and 1980s, such as the [[Zilog Z80]] and the [[Digital Equipment Corporation|DEC]] [[VAX]], had dedicated instructions for handling length-prefixed strings. However, as the NUL-terminated string gained traction, CPU designers began to take it into account, as seen for example in IBM's decision to add the "Logical String Assist" instructions to the [[IBM ES/9000 family|ES/9000]] 520 in 1992.▼
▲This had some influence on CPU [[instruction set]] design. Some CPUs in the 1970s and 1980s, such as the [[Zilog Z80]] and the [[Digital Equipment Corporation|DEC]] [[VAX]], had dedicated instructions for handling length-prefixed strings. However, as the
[[FreeBSD]] developer [[Poul-Henning Kamp]], writing in ''[[ACM Queue]]'', would later refer to the victory of null-terminated strings over a 2-byte (not one-byte) length as "the most expensive one-byte mistake" ever.<ref>{{citation |last=Kamp |first=Poul-Henning |date=25 July 2011 |title=The Most Expensive One-byte Mistake |journal=ACM Queue |volume=9 |number=7 |issn=1542-7730 |accessdate=2 August 2011 |url=http://queue.acm.org/detail.cfm?id=2010365 }}</ref>▼
▲[[FreeBSD]] developer [[Poul-Henning Kamp]], writing in ''[[ACM Queue]]'',
== Limitations ==
While simple to implement, this representation has been prone to errors and performance problems.
The inability to store a
The speed problems with finding the length can usually be mitigated by combining it with another operation that is O(''n'') anyway, such as in <code>[[strlcpy]]</code>. However, this does not always result in an intuitive [[API]].
== Character encodings ==
Null-terminated strings require that the encoding does not use a zero byte (0x00) anywhere
[[UTF-16]] uses 2-byte integers and as either byte may be zero (and in fact ''every other'' byte is, when representing ASCII text), cannot be stored in a null-terminated byte string. However, some languages implement a string of 16-bit [[UTF-16]] characters, terminated by a 16-bit NUL
== Improvements ==
Many attempts to make C string handling less error prone have been made. One strategy is to add safer functions such as <code>[[strdup]]</code> and <code>[[strlcpy]]</code>, whilst [[C standard library#Buffer overflow vulnerabilities
Most modern libraries replace C strings with a structure containing a 32-bit or larger length value (far more than were ever considered for length-prefixed strings), and often add another pointer, a reference count, and even a NUL to speed up conversion back to a C string
==See also==
*[[Empty string]]
*[[Sentinel value]]
==References==
Line 57 ⟶ 45:
{{CProLang}}
{{Data types}}
▲{{Use dmy dates|date=January 2011}}
[[Category:String data structures]]
|