Array (data structure)

This is an old revision of this page, as edited by Jumbuck (talk | contribs) at 12:31, 23 November 2004 (robot Adding:it). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computer programming, an array, also known as a vector or list, is one of the simplest data structures. Arrays hold a fixed number of equally-sized data elements, generally of the same data type. Individual elements are accessed by index using a consecutive range of integers, as opposed to an associative array. Some arrays are multi-dimensional, meaning they are indexed by a fixed number of integers, for example by a tuple of two integers; one- and two-dimensional arrays are the most common.

Most programming languages have arrays as a built-in data type. Some programming languages (such as APL and J) generalize the available operations and functions to work transparently over arrays as well as scalars, providing a higher-level manipulation than most other languages, which require loops over all the individual members of the arrays.

Advantages and disadvantages

Arrays permit efficient (constant time, O(1)) random access but not efficient insertion and deletion of elements (which are O(n), where n is the size of the array). Linked lists have the opposite trade-off. Consequently, arrays are most appropriate for storing a fixed amount of data which will be accessed in an unpredictable fashion, and linked lists are best for a list of data which will be accessed sequentially and updated often with insertions or deletions.

Another advantage of arrays that has become very important on modern architectures is that iterating through an array has good locality of reference, and so is much faster than iterating through (say) a linked list of the same size, which tends to jump around in memory. However, an array can also be accessed in a random way, as is done with large hash tables, and in this case this is not a benefit.

Arrays also are among the most compact data structures; storing 100 integers in an array takes only 100 times the space required to store an integer, plus perhaps a few bytes of overhead for the whole array. Any pointer-based data structure, on the other hand, must keep its pointers somewhere, and these occupy additional space. This extra space becomes more significant as the data elements become smaller. For example, an array of characters usually takes up one byte per character, while on a 32-bit platform, which has 4-byte pointers, an aligned doubly-linked list takes up nine bytes per character. Conversely, for very large elements, the space difference becomes a negligible fraction of the total space.

A drawback of the simplicity of arrays is the possibility of referencing a non-existent element by using an index outside the range of valid indices. This is known as exceeding the array bounds. The result is, at best, a program working with incorrect data. At worst, the whole system can crash (it is important to note that in cases where data integrity is critical, such as online banking, a crash may be preferable to working with incorrect data). This problem is particularly felt in the C programming language, the powerful syntax of which is unfortunately prone to this kind of error. Some other languages as Java have built-in bounds checking, and will refuse to index an array outside of its permitted range - although there may be alternative access functions which bypass the checks. Other languages, such as C++, offer facilities to create a data type that functions like a standard array with bounds checking, or offer such a data structure in their standard library. The advantage of providing both checked and unchecked versions, whichever way round the problem is approached, is that you can choose the unchecked version in the rare applications where bounds checking would impose an unreasonable performance penalty, without sacrificing safety in cases where performance is less critical.

Uses

Although useful in their own right, arrays also form the basis for several more complex data structures, such as heaps, hash tables, and VLists, and can be used to represent strings, stacks and queues. They also play a more minor role in many other data structures. All of these applications benefit from the compactness and locality of arrays.

One of the disadvantages of an array is that it has a single fixed size, and although its size can be altered in many environments, this is an expensive operation. Growable arrays or dynamic arrays are arrays which automatically perform this resizing as late as possible, when the programmer attempts to add an element to the end of the array and there is no more space. To average the high cost of resizing over a long period of time (we say it is an amortized cost), they expand by a large amount, and when the programmer attempts to expand the array again, it just uses more of this reserved space.

In the C programming language, one-dimensional character arrays are used to store null-terminated strings, so called because the end of the string is indicated with a special reserved character called a null character ('\0') (see also C string).

Finally, in some applications where the data are the same or are missing for most values of the indexes, or for large ranges of indexes, space is saved by not storing an array at all, but having an associative array with integer keys. There are many specialized data structures specifically for this purpose, such as Patricia tries and Judy arrays. Example applications include address translation tables and routing tables.

Multi-dimensional arrays

Ordinary arrays are indexed by a single integer. Also useful, particularly in numerical and graphics applications, is the concept of a multi-dimensional array, in which we index into the array using an ordered list of integers, such as in a[3,4,5]. The number of integers in the list used to index into it is always the same and is referred to as the array's dimensionality, and the bounds on each of these are called the array's dimensions. An array with dimensionality k is often called k-dimensional. One-dimensional arrays correspond to the simple arrays discussed thus far; two-dimensional arrays are a particularly common representation for matrices. In practice, the dimensionality of an array rarely exceeds three.

Mapping a one-dimensional array into memory is obvious, since memory is logically itself a (very large) one-dimensional array. When we reach higher-dimensional arrays, however, the problem is no longer obvious. Suppose we want to represent this simple two-dimensional array:

 

A few common representations include:

  • Row-major order. Used most notably by statically-declared arrays in C. The elements of each row are stored in order.
1 2 3 4 5 6 7 8 9
  • Column-major order. Used most notably in Fortran. The elements of each column are stored in order.
1 4 7 2 5 8 3 6 9
  • Arrays of arrays. Multi-dimensional arrays are represented by one-dimensional arrays of pointers to other one-dimensional arrays. The subarrays can be either the rows or columns.

A two-dimensional array stored as a one-dimensional array of one-dimensional arrays.

The first two forms are more compact and have potentially better locality of reference, but are also more limiting; the arrays must be rectangular, meaning that no row can contain more elements than any other. Arrays of arrays, on the other hand, allow the creation of ragged arrays, in which the valid range of one index depends on the value of another, or in this case, simply that different rows can be different sizes. Arrays of arrays are also of value in programming languages that only supply one-dimensional arrays as primitives.

In many applications, such as numerical applications working with matrices, we iterate over rectangular two-dimensional arrays, especially two-dimensional arrays, in interesting ways. For example, computing an element of the matrix product AB involves iterating over a row of A and a column of B simultaneously. In mapping the individual array indexes into memory, we wish to exploit locality of reference as much as we can. A compiler can sometimes automatically choose the layout for an array so that sequentially accessed elements are stored sequentially in memory; in our example, it might choose row-major order for A, and column-major order for B. Even more exotic orderings can be used, for example if we iterate over the main diagonal of a matrix.

See also