Data structure alignment: Difference between revisions

Content deleted Content added
No edit summary
Moved links out of summary per the Guide to layout.
Line 1:
{{context}}
 
'''Data Structure Alignment''' is the way [[Data (computing)|data]] is arranged and accessed in [[computer memory]]. It consists of two separate but related issues: ''Data Alignment'' and ''Data Structure Padding''. Data Alignment is the offset of a particular datum in computer memory from boundaries that depend on the datum type and processor characteristics. Aligning data usually refers to allocating memory addresses for data such that each primitive datum is assigned a memory address that is a multiple of its size. Data Structure Padding is the insertion of unnamed members in a data structure to preserve the relative alignment of the other structure members.
 
Although Data Structure Alignment is a fundamental issue for all modern computers, many computer languages and computer language implementations handle data alignment automatically. Certain C and C++ implementations and assembly language allow at least partial control of data structure padding, which may be useful in certain special circumstances.
Although many computer languages and computer language implementations handle data alignment automatically, others allow at least partial control of data structure padding. In certain circumstances this might allow a program to use less memory for storing data in return for slower access to it. This compromise may be considered a form of [[space-time tradeoff]].
 
== Definitions ==
A [[computer memory|memory]] address ''a'', is said to be ''n-byte aligned'' when ''n'' is a power of two and ''a'' is a multiple of ''n'' [[byte|bytes]]. In this context a byte is the smallest unit of memory access, i.e. each memory address specifies a different byte. An ''n''-byte aligned address would have ''log<sub>2</sub> n'' least-significant zeros when expressed in [[Binary numeral system|binary]].
 
A memory access is said to be ''aligned'' when the [[Data (computing)|datum]] being accessed is ''n'' bytes long and the datum address is ''n''-byte aligned. When a memory access is not aligned, it is said to be ''misaligned''. Note that by definition byte memory accesses are always aligned.
 
A memory pointer that refers to primitive data that is ''n'' bytes long is said to be ''aligned'' if it is only allowed to contain addresses that are ''n''-byte aligned, otherwise it is said to be ''unaligned''. A memory pointer that refers to a data aggregate (a data structure or array) is ''aligned'' if (and only if) each primitive datum in the aggregate is aligned.
Line 15:
 
== Problems ==
A computer accesses memory a single memory word at a time. As long as the memory word size is at least as large as the largest primitive data type supported by the computer, aligned accesses will always access a single memory word. This may not be true for misaligned data accesses.
 
If the highest and lowest bytes in a datum are not within the same memory word the computer must split the datum access into multiple memory accesses. This requires a lot of complex circuitry to generate the memory accesses and coordinate them. To handle the case where the memory words are in different memory pages the processor must either verify that both pages are present before executing the instruction or be able to handle a [[translation lookaside buffer|TLB]] miss or a [[page fault]] on any memory access during the instruction execution.
Line 22:
 
==Data Structure Padding==
Although the language translator ([[compiler]] (or [[interpreter (computing)|interpreter]]) normally allocates individual data items on aligned boundaries, data structures often have members with different alignment requirements. To maintain proper alignment the translator normally inserts additional unnamed data members so that each member is properly aligned. In addition the data structure as a whole may be padded with a final unnamed member. This allows each member of an entire array of structures to be properly aligned.
 
MembersPadding ofis only inserted when a datastructure member is followed by a member with a larger alignment requirement or at the end of the structure. canBy bechanging arrangedthe toordering of members in a structure it is possible minimizechange the amount of padding required to maintain alignment. For example, if members may beare sorted intoby ascending or descending alignment requirements. Althougha Cminimal andamount C++of dopadding notis allowrequired. theThe compilerminimal toamount doof suchpadding reordering, other languages might. Itrequired is alsoalways possibleless tothan tellthe mostlargest Calignment andin C++the compilersstructure. to "pack"Computing the membersmaximum amount of apadding structurerequired tois amore certaincomplicated, levelbut ofis alignment,always e.g.less "pack(2)"than meanssum alignof datathe membersalignment largerrequirements thanfor aall bytemembers tominus atwice two-bytethe boundarysum soof thatthe anyalignment paddingrequirements membersfor arethe atleast mostaligned onehalf bytethe longstructure members.
 
Although C and C++ do not allow the compiler to reorder structure members to save space, other languages might. It is also possible to tell most C and C++ compilers to "pack" the members of a structure to a certain level of alignment, e.g. "pack(2)" means align data members larger than a byte to a two-byte boundary so that any padding members are at most one byte long.
Although use of "packed" structures is most frequently used to conserve memory space, it may also be used to format a data structure for transmission using a standard protocol. Since this depends upon the native byte ordering ([[endinanness]]) for the processor matching the byte ordering of the protocol, this usage is not recommended.
 
One use for such "packed" structures is to conserve memory. For example, a structure containing a single byte and a four-byte integer would require three additional bytes of padding. A large array of such structures would use 37.5% less memory if they are packed, although accessing each structure might take longer. This compromise may be considered a form of [[space-time tradeoff]].
 
Although use of "packed" structures is most frequently used to conserve memory space, it may also be used to format a data structure for transmission using a standard protocol. Since this depends upon the native byte ordering ([[endinannessendianness]]) for the processor matching the byte ordering of the protocol, this usage is not recommended.
 
==Unaligned Pointer Support==