Data structure alignment: Difference between revisions

Content deleted Content added
No edit summary
Aij (talk | contribs)
No edit summary
Line 1:
{{context}}
== Definition ==
 
An address ''a'', is said to be ''n''-byte aligned when ''a'' is a multiple of ''n'' bytes.
A common problem in computer programming is called word alignement.
One way to increase performance, especially in [[RISC]] processors which
are designed to maximize raw performance is to require data to loaded or
stored on a word boundary. So though memory is commonly addressed by
8 bit bytes, loading a 32 bit integer or 64 bit floating point number
would be required to be start at every 64 bits on a 64 bit machine.
The processor could flag a fault if it were asked to load a number
which was not on such a boundary, or call a routine which would effectively
figure out which word or words contained the data and extract the equivalent
value. This caused difficulty when the team from [[Mosaic Software]] ported
their [[Twin Spreadsheet]] to the [[68000]] based [[Atari ST]]. The Intel [[8086]]
architecture had no such restrictions. It would also
cause difficulties in porting Microsoft Office to Windows NT on [[MIPS]]
and [[PowerPC]] for [[NEC]] and [[IBM]]. Since the software was not written with such restrictions
in mind, designers had to set a bit in the O/S to enable non-aligned data.
However since this bit was masked with other flags, it was impossible to
keep the O/S from faulting on non-aligned data when other modules used the other flags. This may have been a major factor in abandoning Windows NT on non-Intel processors as they failed as platforms for hosting common Windows applications and
one more reason for the baffling dominance of the x86 architecture over technologically elegant rivals.
 
== TechnicalProblems View==
 
Because of the way [[computer memory]] works, it is highly desirable for all memory accesses to meet certain alignment requirements. As a rule of thumb, the alignment for a primitive data type should be the same as the size of the data to be accessed, rounded up to a power of two. This avoids crossing any [[word (computer science)|word]], [[cache-line]], or [[paging|page]] boundaries.
 
Some of the problems caused by analigned access are:
* Extra transistors on the CPU are required to support accesses which are not word-aligned
* Reads not aligned to the width of the memory bus require two reads.
* Writes not aligned to the width of the memory bus require two reads and two writes.
* Accesses across [[cache-lines]] require evicting two cache-lines.
* Accesses across page boundaries can incur two [[translation lookaside buffer|TLB]] misses and could even require swaping in both pages from disk
 
==Compatibility==
 
Because the main (only?) advantage to supporting unaligned access is that some memory can be used which would otherwise be lost as padding, and because the cost of supporting it is so high, many modern [[computer architecture]] designs don't support it at all. Most (all?) of the ones that do, require the [[operating system|OS]] to do most of the work. Older designs (notably [[x68]]) still need to implement full support in hardware.
 
Some programs which were written for an architecture that did not enforce aligmnent may use unaligned memory access (either intentionally or not). This can be a problem when porting said programs.
 
==Typical aligment of C structs on x86==
 
[[Data structure]] members are stored sequentially in a memory so that in the structure below the member Data1 will always precede Data2 and Data2 will always precede Data3:
Line 31 ⟶ 32:
};
 
If the type "short" is stored in two bytes of memory then each member of the data structure depicted above would be aligned to a boundary of 2-byte bytesaligned. Data1 would be at offset 0, Data2 at offset 2 and Data3 at offset 4. The size of this structure after would be 6 bytes.
 
The type of each member of the structure usually has a requireddefault alignment, meaning that it will, unless otherwise requested by the programmer, be aligned on a pre-determined boundary. As a rule of thumb an integral data member will align to a boundary equal to its own size. The following typical requirementsalignments are valid for compilers from [[Microsoft]], [[Borland]], and [[BorlandGNU]] when compiling for x86:
 
A '''bytechar''' aligns(one tobyte) anywill '''be 1-byte''' boundaryaligned.<br/>
A '''short word''' (consisting of two bytes) alignswill to a '''twobe 2-byte''' boundaryaligned.<br/>
AAn '''long wordint''' (four bytes) alignswill to a '''fourbe 4-byte''' boundaryaligned.<br/>
A '''float''' (four bytes) will be 4-byte aligned.<br/>
A '''double''' (eight bytes) will be 8-byte aligned on Windows and 4-byte aligned on Linux.
 
Here is a structure with members of various types, totaling '''8 bytes''' before compilation:
Line 43 ⟶ 46:
struct MixedData
{
bytechar Data1;
short Data2;
longint Data3;
bytechar Data4;
};
 
Line 53 ⟶ 56:
struct MixedData (after compilation)
{
bytechar Data1;
bytechar PaddingPadding0[1];
short Data2;
longint Data3;
bytechar Data4;
bytechar PaddingPadding1[3];
};
 
Line 80 ⟶ 83:
 
This structure would have a compiled size of '''6 bytes'''. The above directives are available in compilers from [[Microsoft]], [[Borland]] and many others.
 
==References==
* Bryant, Randal and O'Hallaron, David. [2003] 2001 ''Computer Systems: A Programmer's Perspective.'' Prentice Hall. ISBN 0-13-034074-X.
 
==External links==
*[http://msdn2.microsoft.com/en-us/library/ms253949.aspx MSDN Article on data alignment]
*[http://www.eventhelix.com/RealtimeMantra/ByteAlignmentAndOrdering.htm Byte Alignment and Ordering]
*[http://developer.intel.com/design/itanium/manuals/245317.htm Intel® Itanium® Architecture Software Developer's Manual]
*[http://www-306.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF778525699600719DF2 PowerPC Microprocessor Family: The Programming Environments for 32-Bit Microprocessors]
 
[[Category:Computer programming]]