Data structure alignment: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 16:02, 20 September 2024 edit 2001:861:8c85:c680:c4f7:58ea:eb76:80ea (talk) →Problems ← Previous edit		Latest revision as of 04:18, 21 August 2025 edit undo 67.164.20.69 (talk) typo
(12 intermediate revisions by 8 users not shown)
Line 1: {{Short description\|Way in which data is arranged and accessed in computer memory}} {{Redirect\|Memory layout\|broader usage\|memory management}} {{multiple issues\| {{original research\|date=January 2018}} Line 9 ⟶ 10: '''Data structure alignment''' is the way data is arranged and accessed in [[computer memory]]. It consists of three separate but related issues: '''data alignment''', '''data structure padding''', and '''packing'''. The [[central processing unit\|CPU]] in modern computer hardware performs reads and writes to memory most efficiently when the data is ''naturally aligned'', which generally means that the data's memory address is a multiple of the data size. For instance, in a 32-bit architecture, the data may be aligned if the data is stored in four consecutive bytes and the first byte lies on a 4-byte boundary. ''Data alignment'' is the aligning of elements according to their natural alignment. To ensure natural alignment, it may be necessary to insert some ''padding'' between structure elements or after the last element of a structure. For example, on a 32-bit machine, a data structure containing a 16-bit value followed by a 32-bit value could have 16 bits of ''padding'' between the 16-bit value and the 32-bit value to align the 32-bit value on a 32-bit boundary. Alternatively, one can ''pack'' the structure, omitting the padding, which may lead to slower access, but ~~uses~~saves ~~three~~16 ~~quarters~~bits ~~as much~~of memory. Although data structure alignment is a fundamental issue for all modern computers, many computer languages and computer language implementations handle data alignment automatically. [[Fortran]], [[Ada (programming language)\|Ada]],<ref>{{cite book Line 38 ⟶ 39: }}</ref> certain [[C (programming language)\|C]] and [[C++]] implementations, [[D (programming language)\|D]],<ref>{{cite web \| url = http://dlang.org/attribute.html#align \| title = Attributes -– D Programming Language: Align Attribute \| access-date = 2012-04-13 }}</ref> [[Rust (programming language)\|Rust]],<ref>{{cite web \| url = https://doc.rust-lang.org/nomicon/other-reprs.html \| title = The Rustonomicon -– Alternative Representations \| access-date = 2016-06-19 }}</ref> [[C Sharp (programming language)\|C#]],<ref>{{Cite web\|url=https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.layoutkind\|title=LayoutKind Enum (System.Runtime.InteropServices)\|website=docs.microsoft.com\|language=en-us\|access-date=2019-04-01}}</ref> and [[assembly language]] allow at least partial control of data structure padding, which may be useful in certain special circumstances. Line 64 ⟶ 65: If the highest and lowest bytes in a datum are not within the same memory word, the computer must split the datum access into multiple memory accesses. This requires a lot of complex circuitry to generate the memory accesses and coordinate them. To handle the case where the memory words are in different memory pages the processor must either verify that both pages are present before executing the instruction or be able to handle a [[translation lookaside buffer\|TLB]] miss or a [[page fault]] on any memory access during the instruction execution. Some processor designs deliberately avoid introducing such complexity, and instead yield alternative behavior in the event of a misaligned memory access. For example, implementations of the ARM architecture prior to the ARMv6 ISA require mandatory aligned memory access for all multi-byte load and store instructions.<ref>{{Cite web\|url=https://medium.com/@iLevex/the-curious-case-of-unaligned-access-on-arm-5dd0ebe24965\|title=The curious case of unaligned access on ARM\|last=Kurusa\|first=Levente\|date=2016-12-27\|website=Medium\|language=en\|access-date=2019-08-07}}</ref> Depending on which specific instruction was issued, the result of attempted misaligned access might be to round down the least significant bits of the offending address turning it into an aligned access (sometimes with additional caveats), or to throw an [[~~Memory~~memory management unit\|MMU]] exception (if MMU hardware is present), or to silently yield other potentially unpredictable results. The ARMv6 and later architectures support unaligned access in many circumstances, but not necessarily all. When a single memory word is accessed the operation is atomic, i.e. the whole memory word is read or written at once and other devices must wait until the read or write operation completes before they can access it. This may not be true for unaligned accesses to multiple memory words, e.g. the first word might be read by one device, both words written by another device and then the second word read by the first device so that the value read is neither the original value nor the updated value. Although such failures are rare, they can be very difficult to identify. Line 81 ⟶ 82: ===Computing padding=== The following formulas provide the number of padding bytes required to align the start of a data structure (where ''mod'' is the [[modulo operator]]): padding = (align - (offset mod align)) mod align aligned = offset + padding Line 89 ⟶ 91: Since the alignment is by [[#Definitions\|definition]] a power of two,{{efn\|On modern computers where the target alignment is a power of two. This might not be true, for example, on a system using 9-bit bytes or 60-bit words.}} the modulo operation can be reduced to a [[bitwise AND]] operation. The following formulas produce the correct values (where ''&'' is a bitwise AND and ''~'' is a [[bitwise NOT]]) --– providing the offset is unsigned or the system uses [[two's complement]] arithmetic: padding = (align - (offset & (align - 1))) & (align - 1) = -offset & (align - 1) Line 96 ⟶ 99: ==Typical alignment of C structs on x86== [[Data structure]] members are stored sequentially in memory so that, in the structure below, the member Data1 will always precede Data2; and Data2 will always precede Data3: Line 112 ⟶ 114: The type of each member of the structure usually has a default alignment, meaning that it will, unless otherwise requested by the programmer, be aligned on a pre-determined boundary. The following typical alignments are valid for compilers from [[Microsoft]] ([[Visual C++]]), [[Borland]]/[[CodeGear]] ([[C++Builder]]), [[Digital Mars]] (DMC), and [[GNU]] ([[GNU Compiler Collection\|GCC]]) when compiling for 32-bit x86: * A '''char''' (one byte) will be 1-byte aligned. * A '''short''' (two bytes) will be 2-byte aligned. * An '''int''' (four bytes) will be 4-byte aligned. * A '''long''' (four bytes) will be 4-byte aligned. * A '''float''' (four bytes) will be 4-byte aligned. * A '''double''' (eight bytes) will be 8-byte aligned on Windows and 4-byte aligned on Linux (8-byte with ''-malign-double'' compile time option). * A '''long long''' (eight bytes) will be 8-byte aligned on Windows and 4-byte aligned on Linux (8-byte with ''-malign-double'' compile time option). * A '''long double''' (ten bytes with C++Builder and DMC, eight bytes with Visual C++, twelve bytes with GCC) will be 8-byte aligned with C++Builder, 2-byte aligned with DMC, 8-byte aligned with Visual C++, and 4-byte aligned with GCC. * Any '''pointer''' (four bytes) will be 4-byte aligned. (e.g.: char, int) The only notable differences in alignment for an [[LP64]] 64-bit system when compared to a 32-bit system are: * A '''long''' (eight bytes) will be 8-byte aligned. * A '''double''' (eight bytes) will be 8-byte aligned. * A '''long long''' (eight bytes) will be 8-byte aligned. * A '''long double''' (eight bytes with Visual C++, sixteen bytes with GCC) will be 8-byte aligned with Visual C++ and 16-byte aligned with GCC. * Any '''pointer''' (eight bytes) will be 8-byte aligned. Some data types are dependent on the implementation. Line 150 ⟶ 152: { char Data1; /* 1 byte / char Padding1[1]; / 1 byte for the following 'short' to be aligned on a 2 -byte boundary assuming that the address where structure begins is an even number / short Data2; / 2 bytes / Line 203 ⟶ 205: <syntaxhighlight lang="c"> #pragma pack(push) / push current alignment to stack / #pragma pack(1) / set alignment to 1 -byte boundary / struct MyPackedData Line 241 ⟶ 243: ==Allocating memory aligned to cache lines== It would be beneficial to allocate memory aligned to [[cache line]]s. If an array is partitioned for more than one thread to operate on, having the sub-array boundaries unaligned to cache lines could lead to performance degradation. Here is an example to allocate memory (double array of size 10) aligned to cache of 64 bytes. <syntaxhighlight lang="c"> #include <stdlib.h> Line 255 ⟶ 258: Alignment concerns can affect areas much larger than a C structure when the purpose is the efficient mapping of that area through a hardware [[CPU cache#Address translation\|address translation]] mechanism (PCI remapping, operation of a [[memory management unit\|MMU]]). For instance, on a 32-bit operating system, a 4 [[KiB]] (4096 ~~Bytes~~ bytes) page is not just an arbitrary 4 KiB chunk of data. Instead, it is usually a region of memory that's aligned on a 4 KiB boundary. This is because aligning a page on a page-sized boundary lets the hardware map a virtual address to a physical address by substituting the higher bits in the address, rather than doing complex arithmetic. Example: Assume that we have a TLB mapping of virtual address 0x2CFC7000 to physical address 0x12345000. (Note that both these addresses are aligned at 4 KiB boundaries.) Accessing data located at virtual address va=0x2CFC7ABC causes a TLB resolution of 0x2CFC7 to 0x12345 to issue a physical access to pa=0x12345ABC. Here, the 20/12-bit split luckily matches the hexadecimal representation split at 5/3 digits. The hardware can implement this translation by simply combining the first 20 bits of the physical address (0x12345) and the last 12 bits of the virtual address (0xABC). This is also referred to as virtually indexed (ABC) physically tagged (12345). A block of data of size 2<sup>(n+1)</sup> -− 1 always has one sub-block of size 2<sup>n</sup> aligned on 2<sup>n</sup> bytes. This is how a dynamic allocator that has no knowledge of alignment, can be used to provide aligned buffers, at the price of a factor two in space loss. <syntaxhighlight lang="c"> // Example: get 4096 bytes aligned on a 4096 -byte buffer with malloc() // unaligned pointer to large area Line 288 ⟶ 291: ==Further reading== {{cite book \|author-last1=Bryant \|author-first1=Randal E. \|author-link1=Randal Bryant \|author-last2=David \|author-first2=O'Hallaron \|title=Computer Systems: A Programmer's Perspective \|place=Upper Saddle River, New Jersey, USA \|publisher=[[Pearson Education]] \|date=2003 \|edition=2003 \|url=http://csapp.cs.cmu.edu/ \|isbn=0-13-034074-X}} * {{cite book \|title=8086 Family Utilities -– User's Guide for 8080/8085-Based Development Systems \|chapter=1. Introduction: Segment Alignment \|id=Order Number: 9800639-04 \|edition=A620/5821 6K DD \|version=Revision E \|date=May 1982 \|orig-year=1980, 1978 \|publisher=[[Intel Corporation]] \|___location=Santa Clara, California, USA \|pages=((1-6, 3-5)) \|url=http://bitsavers.trailing-edge.com/pdf/intel/ISIS_II/9800639-04E_8086_Famility_Utilities_Users_Guide_May82.pdf \|access-date=2020-02-29 \|url-status=live \|archive-url=https://web.archive.org/web/20200229032355/http://bitsavers.trailing-edge.com/pdf/intel/ISIS_II/9800639-04E_8086_Famility_Utilities_Users_Guide_May82.pdf \|archive-date=2020-02-29 \|quote=[…] A segment can have one (and in the case of the inpage attribute, two) of five alignment attributes: […] Byte, which means a segment can be located at any address. […] Word, which means a segment can only be located at an address that is a multiple of two, starting from address 0H. […] Paragraph, which means a segment can only be located at an address that is a multiple of 16, starting from address 0. […] Page, which means a segment can only be located at an address that is a multiple of 256, starting from address 0. […] Inpage, which means a segment can be located at whichever of the preceding attributes apply plus must be located so that it does not cross a page boundary […] The alignment codes are: […] B -– byte […] W -– word […] G -– paragraph […] xR -– inpage […] P -– page […] A -– absolute […] the x in the inpage alignment code can be any other alignment code. […] a segment can have the inpage attribute, meaning it must reside within a 256 byte page and can have the word attribute, meaning it must reside on an even numbered byte. […]}} ==External links== * [https://developer.ibm.com/articles/pa-dalign/ IBM ~~developerWorks~~Developer article on data alignment] * [https://fylux.github.io/2017/07/11/Memory_Alignment/ Article on data alignment and performance] * [~~http~~https://~~msdn2~~learn.microsoft.com/en-us/~~library~~previous-versions/ms253949(v=vs.~~aspx~~90) Microsoft ~~MSDN~~Learn article on data alignment] * [~~http~~https://www.codesynthesis.com/~boris/blog/2009/04/06/cxx-data-alignment-portability/ Article on data alignment and data portability] * [~~http~~https://www.eventhelix.com/~~RealtimeMantra~~embedded/byte-alignment-and-ordering/~~ByteAlignmentAndOrdering.htm~~ Byte Alignment and Ordering] * ~~[http~~{{webarchive\|url=https://web.archive.org/web/20181229171433/https://www.cpu2.net/stackalign.html \|title=Stack Alignment in 64-bit Calling Conventions]}} -– discusses stack alignment for x86-64 calling conventions * [http://www.catb.org/esr/structure-packing/ The Lost Art of Structure Packing by Eric S. Raymond]