Data structure alignment: Difference between revisions

Content deleted Content added
provide context in introduction
Added detail about Data Structure Padding and revised Problems.
Line 1:
{{context}}
 
'''Data structure alignment''' is the way [[Data (computing)|data]] is arranged and accessed in [[computer memory]]. AlignmentsIt areconsists chosenof astwo someseparate compromisebut betweenrelated theissues: conflicting''data goalsalignment'' ofand maximum''data efficiencystructure withpadding''. which theAlthough hardwaremany maycomputer accesslanguages theand computer language implementations handle data andalignment minimumautomatically, spaceothers usedallow at least partial control of data structure padding. In certain circumstances this might allow a program to storeuse theless memory for storing data in return for slower access to it. This compromise ismay be considered a form of [[space-time tradeoff]].
 
== Definitions ==
Line 13:
 
== Problems ==
A computer accesses memory a single memory word at a time. IfAs long as the highestmemory andword lowestsize bytesis inat aleast datumas arelarge not withinas the samelargest memoryprimitive worddata the computertype mustsupported splitby the datumcomputer, accessaligned intoaccesses multiplewill memoryalways accesses. This requiresaccess a lot of complex circuitry to generate thesingle memory accesses and coordinate themword.
 
If the highest and lowest bytes in a datum are not within the same memory word the computer must split the datum access into multiple memory accesses. This requires a lot of complex circuitry to generate the memory accesses and coordinate them. To handle the case where the memory words are in different memory-management pages the problemsprocessor canmust beeither evenverify worsethat becauseboth accessingpages eitherare pagepresent couldbefore resultexecuting inthe instruction or be able to handle a [[translation lookaside buffer|TLB]] miss or a [[page fault]] on any memory access during the instruction execution.
 
When a single memory word is accessed the operation is atomic, i.e. the whole memory word is read or written at once and other devices must wait until the read or write operation completes before they can access it. This may not be true for unaligned accesses to multiple memory words, e.g. the first word might be read by one device, both words written by another device and then the second word read by the first device so that the value read is neither the original value nor the updated value. Although such failures are rare, they can be very difficult to identify.
As long as the memory word size is at least as large as the largest primitive data type supported by the computer, aligned accesses will always access a single memory word.
 
==Data Structure Padding==
Some of the problems caused by misaligned access are:
Although the language translator (compiler or interpreter) normally allocates individual data items on aligned boundaries, data structures often have members with different alignment requirements. To maintain proper alignment the translator normally inserts additional unnamed data members so that each member is properly aligned. In addition the data structure as a whole may be padded with a final unnamed member. This allows each member of an entire array of structures to be properly aligned.
* Extra logic on the CPU is required to support accesses which are not word-aligned.
 
* Accesses across [[cache-lines]] may require evicting two cache-lines.
Members of a data structure can be arranged to minimize the amount of padding. For example, members may be sorted into ascending or descending alignment requirements. Although C and C++ do not allow the compiler to do such reordering, other languages might. It is also possible to tell most C and C++ compilers to "pack" the members of a structure to a certain level of alignment, e.g. "pack(2)" means align data members larger than a byte to a two-byte boundary so that any padding members are at most one byte long.
* Accesses across page boundaries can incur two TLB misses and could even require swapping in both pages from disk
 
Although use of "packed" structures is most frequently used to conserve memory space, it may also be used to format a data structure for transmission using a standard protocol. Since this depends upon the native byte ordering ([[endinanness]]) for the processor matching the byte ordering of the protocol, this usage is not recommended.
 
==Unaligned Pointer Support==