String (computer science)

A string in computing is an important datatype commonly found in many programming languages to represent text. The exact semantics of the datatype vary from language to language but it usually has the core semantics of represented a finite ordered sequence of characters (a group a characters "strung" together, hence the name). It may also refer to strings of other datatypes other than characters (such as strings of numbers or vectors). In the theory of computation strings are also often called words, and the letters are elements from an arbitrary finite set.

= Representations

A common representation is an array of characters. The length can be stored implicitly by using a special terminating character (often NUL) (the programming language C uses this convention), or explicitly (for example by treating the first byte of the string as its length, a convention used in Pascal).

Here is a NUL terminated string stored in a 10 byte buffer.

x x x x x x x x x x
F R A N K 0 k f f w
x x x x x x x x x x

The above example is how "FRANK" would look in a 10 byte NUL terminated string. Characters after the 0 do not form part of the representation.

Of course, other representations are possible. Using trees and lists make it easier to insert characters in the middle of the string.

String Processing

Strings are such a useful datatype that several languages have grown up that were designed in order to make string processing applications easy to right. Examples include:

Many UNIX utilities perform simple string manipulations and can be used to easily program some powerful string processing algorithms. Files and finite streams may be viewed as strings.

/Talk