Revision as of 18:26, 23 October 2006 edit ReyBrujo (talk \| contribs) Extended confirmed users, Pending changes reviewers, Rollbackers 40,722 edits Reverted to version as of 18:31, 6 October 2006, links added to promote a site are not good external links per 3rd guideline. ← Previous edit		Revision as of 00:03, 28 October 2006 edit undo 207.195.130.34 (talk) No edit summary Next edit →
Line 3: '''Data structure alignment''' is the way [[Data (computing)\|data]] is arranged in [[physical memory]]. == ~~Definition~~Definitions == AnA [[computer memory]] address ''a'', is said to be ''n''-byte aligned'' when ''n'' is a power of two and ''a'' is a multiple of ''n'' [[byte\|bytes]]. In this context a byte is the smallest unit of memory access, i.e. each memory address specifies a different byte. An ''n''-byte aligned address would have n least-significant zeros when expressed in [[Binary numeral system\|binary]]. A memory access is said to be ''aligned'' when the datum being accessed is ''n'' bytes long and the datum address is ''n''-byte aligned. When a memory access is not aligned, it is said to be ''misaligned''. Note that by definition byte memory accesses are always aligned. A memory pointer that refers to primitive data that is ''n'' bytes long is said to be ''aligned'' if it is only allowed to contain addresses that are ''n''-byte aligned, otherwise it is said to be ''unaligned''. A memory pointer that refers to a data aggregate (a data structure or array) is ''aligned'' if (and only if) each primitive datum in the aggregate is aligned. Note that the definitions above assume that each primitive datum is an even power of two bytes long. When this is not the case (as with 80-bit floating-point on x86) the context influences the conditions where the datum is considered aligned or not. == Problems == A computer accesses memory a single memory word at a time. If the highest and lowest bytes in a datum are not within the same memory word the computer must split the datum access into multiple memory accesses. This requires a lot of complex circuitry to generate the memory accesses and coordinate them. Because of the way [[computer memory]] works, it is highly desirable for all memory accesses to meet certain alignment requirements. As a rule of thumb, the alignment for a primitive data type should be the same as the size of the data to be accessed, rounded up to a power of two. This avoids crossing any [[word (computer science)\|word]], [[cache-line]], or [[paging\|page]] boundaries. If the highest and lowest bytes are in different memory-management pages the problems can be even worse because accessing either page could result in a [[translation lookaside buffer\|TLB]] miss or a [[page fault]]. Some of the problems caused by unaligned access are:▼ * Extra transistors on the CPU are required to support accesses which are not word-aligned▼ As long as the memory word size is at least as large as the largest primitive data type supported by the computer, aligned accesses will always access a single memory word. * Reads not aligned to the width of the memory bus require two reads. * Writes not aligned to the width of the memory bus require two reads and two writes. ▲Some of the problems caused by ~~unaligned~~misaligned access are: * Accesses across [[cache-lines]] require evicting two cache-lines.▼ ▲* Extra ~~transistors~~logic on the CPU ~~are~~is required to support accesses which are not word-aligned. ▲* Accesses across [[cache-lines]] may require evicting two cache-lines. * Accesses across page boundaries can incur two [[translation lookaside buffer\|TLB]] misses and could even require swapping in both pages from disk ==Compatibility== The advantage to supporting unaligned access is that it is easier to write compilers that do not need to align memory, at the expense of the cost of slower access. One way to increase performance in [[RISC]] processors which are designed to maximize raw performance is to require data to be loaded or stored on a word boundary. So though memory is commonly addressed by 8 bit bytes, loading a 32 bit integer or 64 bit floating point number would be required to be start at every 64 bits on a 64 bit machine. The processor could flag a fault if it were asked to load a number which was not on such a boundary, but this would result in a slower call to a routine which would need to figure out which word or words contained the data and extract the equivalent value. Line 23 ⟶ 30: This caused difficulty when the team from [[Mosaic Software]] ported their [[Twin Spreadsheet]] to the [[68000]] based [[Atari ST]]. The Intel [[8086]] architecture had no such restrictions. {{fact}} It would also cause difficulties in porting Microsoft Office to Windows NT on [[MIPS]], [[DEC Alpha\|Alpha]] and [[PowerPC]] for [[NEC]], [[Digital Equipment Corporation\|DEC]] and [[IBM]] respectively. Since the software was not written with such restrictions in mind, designers had to set a bit in the operating system to enable non-aligned data. However since this bit was masked with other flags which were used elsewhere, it was impossible to keep the operating system in a state from faulting on non-aligned data. ~~Both~~These platforms ~~ulimately~~ultimately failed as platforms for hosting Windows applications. {{fact}} ==Typical alignment of C structs on x86==

Data structure alignment: Difference between revisions