Content deleted Content added
Johnmorgan (talk | contribs) No edit summary |
Johnmorgan (talk | contribs) Moved links out of summary per the Guide to layout. |
||
Line 1:
{{context}}
'''Data Structure Alignment''' is the way
Although Data Structure Alignment is a fundamental issue for all modern computers, many computer languages and computer language implementations handle data alignment automatically. Certain C and C++ implementations and assembly language allow at least partial control of data structure padding, which may be useful in certain special circumstances.
== Definitions ==
A [[computer memory|memory]] address ''a'', is said to be ''n-byte aligned'' when ''n'' is a power of two and ''a'' is a multiple of ''n'' [[byte|bytes]]. In this context a byte is the smallest unit of memory access, i.e. each memory address specifies a different byte. An ''n''-byte aligned address would have ''log<sub>2</sub> n'' least-significant zeros when expressed in [[Binary numeral system|binary]].
A memory access is said to be ''aligned'' when the [[Data (computing)|datum]] being accessed is ''n'' bytes long and the datum address is ''n''-byte aligned. When a memory access is not aligned, it is said to be ''misaligned''. Note that by definition byte memory accesses are always aligned.
A memory pointer that refers to primitive data that is ''n'' bytes long is said to be ''aligned'' if it is only allowed to contain addresses that are ''n''-byte aligned, otherwise it is said to be ''unaligned''. A memory pointer that refers to a data aggregate (a data structure or array) is ''aligned'' if (and only if) each primitive datum in the aggregate is aligned.
Line 15:
== Problems ==
A computer accesses memory a single memory word at a time. As long as the memory word size is at least as large as the largest primitive data type supported by the computer, aligned accesses will always access a single memory word. This may not be true for misaligned data accesses.
If the highest and lowest bytes in a datum are not within the same memory word the computer must split the datum access into multiple memory accesses. This requires a lot of complex circuitry to generate the memory accesses and coordinate them. To handle the case where the memory words are in different memory pages the processor must either verify that both pages are present before executing the instruction or be able to handle a [[translation lookaside buffer|TLB]] miss or a [[page fault]] on any memory access during the instruction execution.
Line 22:
==Data Structure Padding==
Although the
Although C and C++ do not allow the compiler to reorder structure members to save space, other languages might. It is also possible to tell most C and C++ compilers to "pack" the members of a structure to a certain level of alignment, e.g. "pack(2)" means align data members larger than a byte to a two-byte boundary so that any padding members are at most one byte long.
Although use of "packed" structures is most frequently used to conserve memory space, it may also be used to format a data structure for transmission using a standard protocol. Since this depends upon the native byte ordering ([[endinanness]]) for the processor matching the byte ordering of the protocol, this usage is not recommended.▼
One use for such "packed" structures is to conserve memory. For example, a structure containing a single byte and a four-byte integer would require three additional bytes of padding. A large array of such structures would use 37.5% less memory if they are packed, although accessing each structure might take longer. This compromise may be considered a form of [[space-time tradeoff]].
▲Although use of "packed" structures is most frequently used to conserve memory space, it may also be used to format a data structure for transmission using a standard protocol. Since this depends upon the native byte ordering ([[
==Unaligned Pointer Support==
|