Data structure alignment: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 20:34, 9 September 2020 edit Pgan002 (talk \| contribs) Extended confirmed users 15,367 edits mNo edit summary Tag: Visual edit ← Previous edit		Latest revision as of 04:18, 21 August 2025 edit undo 67.164.20.69 (talk) typo
(47 intermediate revisions by 35 users not shown)
Line 1: {{Short description\|Way in which data is arranged and accessed in computer memory}} {{Redirect\|Memory layout\|broader usage\|memory management}} {{multiple issues\| {{original research\|date=January 2018}} Line 5 ⟶ 7: {{anchor\|1\|2\|4\|8\|16\|256\|4096\|Page\|Inpage}}<!-- parked anchors for common alignments/boundaries to improve incoming redirects --> {{Use dmy dates\|date=January 2020\|cs1-dates=y}} '''Data structure alignment''' is the way data is arranged and accessed in [[computer memory]]. It consists of three separate but related issues: '''data alignment''', '''data structure padding''', and '''packing'''. The [[~~Central~~central processing unit\|CPU]] in modern computer hardware performs reads and writes to memory most efficiently when the data is ''naturally aligned'', which generally means that the data's memory address is a multiple of the data size. For instance, in a 32-bit architecture, the data may be aligned if the data is stored in four consecutive bytes and the first byte lies on a 4-byte boundary. ''Data alignment'' is the aligning of elements according to their natural alignment. To ensure natural alignment, it may be necessary to insert some ''padding'' between structure elements or after the last element of a structure. For example, on a 32-bit machine, a data structure containing a 16-bit value followed by a 32-bit value could have 16 bits of ''padding'' between the 16-bit value and the 32-bit value to align the 32-bit value on a 32-bit boundary. Alternatively, one can ''pack'' the structure, omitting the padding, which may lead to slower access, but ~~uses~~saves ~~half~~16 asbits ~~much~~of memory. Although data structure alignment is a fundamental issue for all modern computers, many computer languages and computer language implementations handle data alignment automatically. [[Fortran]], [[Ada (programming language)\|Ada]],<ref>{{cite book \| ~~sectionurl~~section-url = http://docs.adacore.com/gnat_rm-docs/html/gnat_rm/gnat_rm/representation_clauses_and_pragmas.html \| title = GNAT Reference Manual 7.4.0w documentation \| section = Ada Representation Clauses and Pragmas \| ~~accessdate~~access-date = 2015-08-30 }}</ref><ref>{{cite book \| title = SPARCompiler Ada Programmer's Guide \| section = F.8 Representation Clauses \| url = http://docs.oracle.com/cd/E19957-01/802-3641/802-3641.pdf \| ~~accessdate~~access-date = 2015-08-30 }}</ref> [[PL/I]],<ref>{{cite book \| url = http://www.bitsavers.org/pdf/ibm/360/pli/C28-6571-3_PL_I_Language_Specifications_Jul66.pdf \| title = IBM System/360 Operating System PL/I Language Specifications \| date = July 1966 \| pages = ~~55-56~~55–56 \| id = C28-6571-3 \| publisher = [[IBM]]}}</ref> [[Pascal (programming language)\|Pascal]],<ref>{{cite web Line 31 ⟶ 34: \| title = The Programming Language Pascal (Revised Report) \| author = Niklaus Wirth \| ~~authorlink~~author-link = Niklaus Wirth \| date = July 1973 \| page = 12 }}</ref> certain [[C (programming language)\|C]] and [[C++]] implementations, [[D (programming language)\|D]],<ref>{{cite web \| url = http://dlang.org/attribute.html#align \| title = Attributes -– D Programming Language: Align Attribute \| ~~accessdate~~access-date = 2012-04-13 }}</ref> [[Rust (programming language)\|Rust]],<ref>{{cite web \| url = https://doc.rust-lang.org/nomicon/other-reprs.html \| title = The Rustonomicon -– Alternative Representations \| ~~accessdate~~access-date = 2016-06-19 }}</ref> [[C Sharp (programming language)\|C#]],<ref>{{Cite web\|url=https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.layoutkind\|title=LayoutKind Enum (System.Runtime.InteropServices)\|website=docs.microsoft.com\|language=en-us\|access-date=2019-04-01}}</ref> and [[assembly language]] allow at least partial control of data structure padding, which may be useful in certain special circumstances. ==Definitions== A memory address ''a'' is said to be ''n-[[byte]] aligned'' when ''a'' is a multiple of ''n'' ~~[[byte]]s~~ (where ''n'' is a power of 2). In this context, a byte is the smallest unit of memory access, i.e. each memory address specifies a different byte. An ''n''-byte aligned address would have a minimum of ''{{math\|log<sub>2</sub>(''n)'')}} least-significant zeros when expressed in [[binary numeral system\|binary]]. The alternate wording ''b-bit aligned'' designates a ''b/8 byte aligned'' address (ex. [[64-bit]] aligned is 8 bytes aligned). Line 58 ⟶ 61: ==Problems== The CPU accesses memory by a single [[memory word]] at a time. As long as the memory word size is at least as large as the largest [[primitive data type]] supported by the computer, aligned accesses will always access a single memory word. This may not be true for misaligned data accesses. If the highest and lowest bytes in a datum are not within the same memory word, the computer must split the datum access into multiple memory accesses. This requires a lot of complex circuitry to generate the memory accesses and coordinate them. To handle the case where the memory words are in different memory pages the processor must either verify that both pages are present before executing the instruction or be able to handle a [[translation lookaside buffer\|TLB]] miss or a [[page fault]] on any memory access during the instruction execution. Some processor designs deliberately avoid introducing such complexity, and instead yield alternative behavior in the event of a misaligned memory access. For example, implementations of the ARM architecture prior to the ARMv6 ISA require mandatory aligned memory access for all multi-byte load and store instructions.<ref>{{Cite web\|url=https://medium.com/@iLevex/the-curious-case-of-unaligned-access-on-arm-5dd0ebe24965\|title=The curious case of unaligned access on ARM\|last=Kurusa\|first=Levente\|date=2016-12-27\|website=Medium\|language=en\|access-date=2019-08-07}}</ref> Depending on which specific instruction was issued, the result of attempted misaligned access might be to round down the least significant bits of the offending address turning it into an aligned access (sometimes with additional caveats), or to throw an [[memory management unit\|MMU]] exception (if MMU hardware is present), or to silently yield other potentially unpredictable results. ~~Starting~~The ~~with~~ARMv6 ~~the~~and ~~ARMv6~~later ~~architecture,~~architectures support ~~was added to handle~~ unaligned access in many circumstances, but not necessarily all ~~circumstances~~. When a single memory word is accessed the operation is atomic, i.e. the whole memory word is read or written at once and other devices must wait until the read or write operation completes before they can access it. This may not be true for unaligned accesses to multiple memory words, e.g. the first word might be read by one device, both words written by another device and then the second word read by the first device so that the value read is neither the original value nor the updated value. Although such failures are rare, they can be very difficult to identify. Line 69 ⟶ 72: Although the [[compiler]] (or [[interpreter (computing)\|interpreter]]) normally allocates individual data items on aligned boundaries, data structures often have members with different alignment requirements. To maintain proper alignment the translator normally inserts additional unnamed data members so that each member is properly aligned. In addition, the data structure as a whole may be padded with a final unnamed member. This allows each member of an [[array of structures]] to be properly aligned. Padding is only inserted when a [[record (computer science)\|structure]] member is followed by a member with a larger alignment requirement or at the end of the structure. By changing the ordering of members in a structure, it is possible to change the amount of padding required to maintain alignment. For example, if members are sorted by descending alignment requirements a minimal amount of padding is required. The minimal amount of padding required is always less than the largest alignment in the structure. Computing the maximum amount of padding required is more complicated, but is always less than the sum of the alignment requirements for all members minus twice the sum of the alignment requirements for the least aligned half of the structure members. Although C and C++ do not allow the compiler to reorder structure members to save space, other languages might. It is also possible to tell most C and C++ compilers to "pack" the members of a structure to a certain level of alignment, e.g. "pack(2)" means align data members larger than a byte to a two-byte boundary so that any padding members are at most one byte long. Likewise, in [[PL/I]] a structure may be declared <code>UNALIGNED</code> to eliminate all padding except around bit strings. One use for such "packed" structures is to conserve memory. For example, a structure containing a single byte (such as a char) and a four-byte integer (such as uint32_t) would require three additional bytes of padding. A large array of such structures would use 37.5% less memory if they are packed, although accessing each structure might take longer. This compromise may be considered a form of [[space–time tradeoff]]. Although use of "packed" structures is most frequently used to conserve [[memory space (computational resource)\|memory space]], it may also be used to format a data structure for transmission using a standard protocol. However, in this usage, care must also be taken to ensure that the values of the struct members are stored with the [[endianness]] required by the protocol (often [[network byte order]]), which may be different from the endianness used natively by the host machine. ===Computing padding=== The following formulas provide the number of padding bytes required to align the start of a data structure (where ''mod'' is the [[~~Modulo~~modulo ~~operation\|modulo~~operator]] ~~operator~~): padding = (align - (offset mod align)) mod align aligned = offset + padding Line 85 ⟶ 89: For example, the padding to add to offset 0x59d for a 4-byte aligned structure is 3. The structure will then start at 0x5a0, which is a multiple of 4. However, when the alignment of ''offset'' is already equal to that of ''align'', the second modulo in ''(align - (offset mod align)) mod align'' will return zero, therefore the original value is left unchanged. Since the alignment is by [[#Definitions\|definition]] a power of two,{{efn\|On modern computers where the target alignment is a power of two. This might not be true, for example, on a system using 9-bit bytes or 60-bit words.}} the modulo operation can be reduced to a [[bitwise ~~boolean~~ AND]] operation. The following formulas produce the ~~aligned~~correct ~~offset~~values (where ''&'' is a [[bitwise AND]] and ''~'' is a [[bitwise NOT]]) – providing the offset is unsigned or the system uses [[two's complement]] arithmetic:▼ ▲The following formulas produce the aligned offset (where ''&'' is a [[bitwise AND]] and ''~'' a [[bitwise NOT]]): padding = (align - (offset & (align - 1))) & (align - 1) = (-offset & (align - 1)) aligned = (offset + (align - 1)) & ~(align - 1) = (offset + (align - 1)) & -align ==Typical alignment of C structs on x86== [[Data structure]] members are stored sequentially in memory so that, in the structure below, the member Data1 will always precede Data2; and Data2 will always precede Data3: Line 110 ⟶ 114: The type of each member of the structure usually has a default alignment, meaning that it will, unless otherwise requested by the programmer, be aligned on a pre-determined boundary. The following typical alignments are valid for compilers from [[Microsoft]] ([[Visual C++]]), [[Borland]]/[[CodeGear]] ([[C++Builder]]), [[Digital Mars]] (DMC), and [[GNU]] ([[GNU Compiler Collection\|GCC]]) when compiling for 32-bit x86: * A '''char''' (one byte) will be 1-byte aligned. * A '''short''' (two bytes) will be 2-byte aligned. * An '''int''' (four bytes) will be 4-byte aligned. * A '''long''' (four bytes) will be 4-byte aligned. * A '''float''' (four bytes) will be 4-byte aligned. * A '''double''' (eight bytes) will be 8-byte aligned on Windows and 4-byte aligned on Linux (8-byte with ''-malign-double'' compile time option). * A '''long long''' (eight bytes) will be 8-byte aligned on Windows and 4-byte aligned on Linux (8-byte with ''-malign-double'' compile time option). * A '''long double''' (ten bytes with C++Builder and DMC, eight bytes with Visual C++, twelve bytes with GCC) will be 8-byte aligned with C++Builder, 2-byte aligned with DMC, 8-byte aligned with Visual C++, and 4-byte aligned with GCC. * Any '''pointer''' (four bytes) will be 4-byte aligned. (e.g.: char, int) The only notable differences in alignment for an [[LP64]] 64-bit system when compared to a 32-bit system are: * A '''long''' (eight bytes) will be 8-byte aligned. * A '''double''' (eight bytes) will be 8-byte aligned. * A '''long long''' (eight bytes) will be 8-byte aligned. * A '''long double''' (eight bytes with Visual C++, sixteen bytes with GCC) will be 8-byte aligned with Visual C++ and 16-byte aligned with GCC. * Any '''pointer''' (eight bytes) will be 8-byte aligned. Some data types are dependent on the implementation. Line 148 ⟶ 152: { char Data1; /* 1 byte / char Padding1[1]; / 1 byte for the following 'short' to be aligned on a 2 -byte boundary assuming that the address where structure begins is an even number / short Data2; / 2 bytes / Line 157 ⟶ 161: </syntaxhighlight> The compiled size of the structure is now 12 bytes. The compiled size of the structure is now 12 bytes. It is important to note that the last member is padded with the number of bytes required so that the total size of the structure should be a multiple of the largest alignment of any structure member (alignment(int) in this case, which = 4 on linux-32bit/gcc){{Citation needed\|reason=see Talk:Data structure alignment (least common multiple requires at least 2 args) and Talk:Data structure alignment (calculation of whole structure alignment and paddings).\|date=February 2011}}.▼ ▲The ~~compiled size of the structure is now 12 bytes. It is important to note that the~~ last member is padded with the number of bytes required so that the total size of the structure should be a multiple of the largest alignment of any structure member (~~alignment~~{{mono\|1=alignof(int)}} in this case, which = 4 on linux-32bit/gcc){{Citation needed\|reason=see Talk:Data structure alignment (least common multiple requires at least 2 args) and Talk:Data structure alignment (calculation of whole structure alignment and paddings).\|date=February 2011}}. In this case 3 bytes are added to the last member to pad the structure to the size of a 12 bytes (alignment(int) × 3).▼ ▲In this case 3 bytes are added to the last member to pad the structure to the size of a 12 bytes (~~alignment~~{{mono\|1=alignof(int) × 3}}). <syntaxhighlight lang="c"> Line 168 ⟶ 174: </syntaxhighlight> In this example the total size of the structure {{mono\|1=[[sizeof]](FinalPad) == 8}}, not 5 (so that the size is a multiple of 4 (~~alignment of~~ {{mono\|1=alignof(float)}})). <syntaxhighlight lang="c"> Line 177 ⟶ 183: </syntaxhighlight> In this example the total size of the structure {{mono\|1=[[sizeof]](FinalPadShort) == 6}}, not 5 (not 8 either) (so that the size is a multiple of 2 (~~alignment~~{{mono\|1=alignof(short) == 2}} on linux-32bit/gcc)). It is possible to change the alignment of structures to reduce the memory they require (or to conform to an existing format) by reordering structure members or changing the compiler's alignment (or “packing”) of structure members. Line 195 ⟶ 201: The alternative method of enforcing the {{mono\|1=MixedData}} structure to be aligned to a one byte boundary will cause the pre-processor to discard the pre-determined alignment of the structure members and thus no padding bytes would be inserted. While there is no standard way of defining the alignment of structure members (while C and C++ allow using the {{mono\|1=alignas}} specifier for this purpose it can be used only for specifying a stricter alignment), some compilers use {{mono\|1=#pragma}} directives to specify packing inside source files. Here is an example: <syntaxhighlight lang="c"> #pragma pack(push) /* push current alignment to stack / #pragma pack(1) / set alignment to 1 -byte boundary / struct MyPackedData Line 232 ⟶ 238: \| work = [[MSDN Library]] \| publisher = Microsoft \| ~~accessdate~~access-date = 2011-01-11 }}</ref> This leads to interoperability problems with library headers which use, for example, {{mono\|1=#pragma pack(8)}}, if the project packing is smaller than this. For this reason, setting the project packing to any value other than the default of 8 bytes would break the {{mono\|1=#pragma pack}} directives used in library headers and result in binary incompatibilities between structures. This limitation is not present when compiling for x86. ==Allocating memory aligned to cache lines== It would be beneficial to allocate memory aligned to [[cache line]]s. If an array is partitioned for more than one thread to operate on, having the sub-array boundaries unaligned to cache lines could lead to performance degradation. Here is an example to allocate memory (double array of size 10) aligned to cache of 64 bytes. <syntaxhighlight lang="c"> #include <stdlib.h> double foo(void) { //create array of size 10 double ~~var;//create~~ array ~~of size 10~~; okif (0 == posix_memalign((void)&~~var~~array, 64, 10sizeof(double));)▼ ~~int ok;~~ return ~~NULL~~array;▼ ▲ ok = posix_memalign((void*)&var, 64, 10sizeof(double)); ~~if (ok != 0)~~ ▲ return NULL; return ~~var~~NULL; } </syntaxhighlight> Line 255 ⟶ 258: Alignment concerns can affect areas much larger than a C structure when the purpose is the efficient mapping of that area through a hardware [[CPU cache#Address translation\|address translation]] mechanism (PCI remapping, operation of a [[memory management unit\|MMU]]). For instance, on a 32-bit operating system, a 4 [[~~kibibyte\|~~KiB]] (4096 ~~Bytes~~ bytes) page is not just an arbitrary 4 KiB chunk of data. Instead, it is usually a region of memory that's aligned on a 4 KiB boundary. This is because aligning a page on a page-sized boundary lets the hardware map a virtual address to a physical address by substituting the higher bits in the address, rather than doing complex arithmetic. Example: Assume that we have a TLB mapping of virtual address 0x2CFC7000 to physical address 0x12345000. (Note that both these addresses are aligned at 4 KiB boundaries.) Accessing data located at virtual address va=0x2CFC7ABC causes a TLB resolution of 0x2CFC7 to 0x12345 to issue a physical access to pa=0x12345ABC. Here, the 20/12-bit split luckily matches the hexadecimal representation split at 5/3 digits. The hardware can implement this translation by simply combining the first 20 bits of the physical address (0x12345) and the last 12 bits of the virtual address (0xABC). This is also referred to as virtually indexed (ABC) physically tagged (12345). A block of data of size 2<sup>(n+1)</sup> -− 1 always has one sub-block of size 2<sup>n</sup> aligned on 2<sup>n</sup> bytes. This is how a dynamic allocator that has no knowledge of alignment, can be used to provide aligned buffers, at the price of a factor two in space loss. <syntaxhighlight lang="c"> // Example: get 4096 bytes aligned on a 4096 -byte buffer with malloc() // unaligned pointer to large area Line 279 ⟶ 282: #define aligntonext(p, bits) alignto(((p) + (1 << bits) - 1), bits) </syntaxhighlight> ==Notes== {{notelist}} ==References== Line 285 ⟶ 291: ==Further reading== * {{cite book \|author-last1=Bryant \|author-first1=Randal E. \|author-link1=Randal Bryant \|author-last2=David \|author-first2=O'Hallaron \|title=Computer Systems: A Programmer's Perspective \|place=Upper Saddle River, New Jersey, USA \|publisher=[[Pearson Education]] \|date=2003 \|edition=2003 \|url=http://csapp.cs.cmu.edu/ \|isbn=0-13-034074-X}} * {{cite book \|title=8086 Family Utilities -– User's Guide for 8080/8085-Based Development Systems \|chapter=1. Introduction: Segment Alignment \|id=Order Number: 9800639-04 \|edition=A620/5821 6K DD \|version=Revision E \|date=May 1982 \|orig-year=1980, 1978 \|publisher=[[Intel Corporation]] \|___location=Santa Clara, California, USA \|pages=((1-6, 3-5)) \|url=http://bitsavers.trailing-edge.com/pdf/intel/ISIS_II/9800639-04E_8086_Famility_Utilities_Users_Guide_May82.pdf \|access-date=2020-02-29 \|url-status=live \|archive-url=https://web.archive.org/web/20200229032355/http://bitsavers.trailing-edge.com/pdf/intel/ISIS_II/9800639-04E_8086_Famility_Utilities_Users_Guide_May82.pdf \|archive-date=2020-02-29 \|quote=[…] A segment can have one (and in the case of the inpage attribute, two) of five alignment attributes: […] Byte, which means a segment can be located at any address. […] Word, which means a segment can only be located at an address that is a multiple of two, starting from address 0H. […] Paragraph, which means a segment can only be located at an address that is a multiple of 16, starting from address 0. […] Page, which means a segment can only be located at an address that is a multiple of 256, starting from address 0. […] Inpage, which means a segment can be located at whichever of the preceding attributes apply plus must be located so that it does not cross a page boundary […] The alignment codes are: […] B -– byte […] W -– word […] G -– paragraph […] xR -– inpage […] P -– page […] A -– absolute […] the x in the inpage alignment code can be any other alignment code. […] a segment can have the inpage attribute, meaning it must reside within a 256 byte page and can have the word attribute, meaning it must reside on an even numbered byte. […]}} ==External links== * [~~http~~https://~~www~~developer.ibm.com/~~developerworks/library~~articles/pa-dalign/ IBM ~~developerWorks~~Developer article on data alignment] * [https://fylux.github.io/2017/07/11/Memory_Alignment/ Article on data alignment and performance] * [~~http~~https://~~msdn2~~learn.microsoft.com/en-us/~~library~~previous-versions/ms253949(v=vs.~~aspx~~90) Microsoft ~~MSDN~~Learn article on data alignment] * [~~http~~https://www.codesynthesis.com/~boris/blog/2009/04/06/cxx-data-alignment-portability/ Article on data alignment and data portability] * [~~http~~https://www.eventhelix.com/~~RealtimeMantra~~embedded/byte-alignment-and-ordering/~~ByteAlignmentAndOrdering.htm~~ Byte Alignment and Ordering] * ~~[http~~{{webarchive\|url=https://web.archive.org/web/20181229171433/https://www.cpu2.net/stackalign.html \|title=Stack Alignment in 64-bit Calling Conventions]}} -– discusses stack alignment for x86-64 calling conventions * [http://www.catb.org/esr/structure-packing/ The Lost Art of Structure Packing by Eric S. Raymond] {{application binary interface}} {{DEFAULTSORT:Data Structure Alignment}} [[Category:Articles with example C code]] [[Category:Compiler construction]] [[Category:Composite data types]]