Data dependency: Difference between revisions

Content deleted Content added
m Reverted edits by 41.203.78.146 (talk) (HG) (3.4.6)
Rraimii (talk | contribs)
For clarity, added reminder that the RAW dependency cannot be removed.
 
(42 intermediate revisions by 29 users not shown)
Line 1:
{{Short description|Programming situation where an instruction refers to a prior instruction's data}}
{{More citations needed|date=September 2024}}
A '''data dependency''' in [[computer science]] is a situation in which a [[program statement]] (instruction) refers to the data of a preceding statement. In [[compiler theory]], the technique used to discover data dependencies among statements (or instructions) is called [[dependence analysis]].
 
== Description ==
There are three types of dependencies: data, name, and control.
 
== Data dependencies ==
 
Assuming statement <math>S_1</math> and <math>S_2</math>, <math>S_2</math> depends on <math>S_1</math> if:
 
:<math>\left[I(S_1) \cap O(S_2)\right] \cup \left[O(S_1) \cap I(S_2)\right] \cup \left[O(S_1) \cap O(S_2)\right] \neq \varnothing</math>
 
where:
 
* <math>I(S_i)</math> is the set of memory locations read by {{nobr|1=<math>S_i</math> and,}}
* <math>O(S_j)</math> is the set of memory locations written by {{nobr|1=<math>S_j</math>,}} and
* and there is a feasible run-time execution path from <math>S_1</math> to {{nobr|1=<math>S_2</math>.}}
 
These conditions are called Bernstein's Conditions, named after Arthur J. Bernstein.<ref>{{cite journal|last=Bernstein|first=Arthur J.|title=Analysis of Programs for Parallel Processing|journal=IEEE Transactions on Electronic Computers|date=1 October 1966|volume=EC-15|issue=5|pages=757–763|doi=10.1109/PGEC.1966.264565}}</ref>
This Condition is called Bernstein Condition, named by A. J. Bernstein.
 
Three cases exist:
 
* Anti-dependence: <math>I(S_1) \cap O(S_2) \neq \varnothing</math>, <math>S_1 \rightarrow S_2</math> and <math>S_1</math> reads something before <math>S_2</math> overwrites it
* Flow (data) dependence: <math>O(S_1) \cap I(S_2) \neq \varnothing</math>, <math>S_2S_1 \rightarrow S_1S_2</math> and <math>S_1</math> writes before something read by <math>S_2</math>
* Output dependence: <math>O(S_1) \cap O(S_2) \neq \varnothing</math>, <math>S_1 \rightarrow S_2</math> and both write the same memory ___location.
 
=== FlowTypes dependency ===
 
=== Data hazards ===
A Flow dependency, also known as a data dependency or true dependency or read-after-write (RAW), occurs when an instruction depends on the result of a previous instruction:
Data hazards occur when instructions that exhibit data dependence modify data in different stages of a pipeline. Ignoring potential data hazards can result in [[Race condition|race conditions]] (also termed race hazards). There are three situations in which a data hazard can occur:
 
# read after write (RAW), a ''true dependency''
# write after read (WAR), an ''anti-dependency''
# write after write (WAW), an ''output dependency''
# read after read (RAR), a false dependency
 
Read after read (RAR) is not a hazard case.
 
Consider two instructions {{Mono|i1}} and {{Mono|i2}}, with {{Mono|i1}} occurring before {{Mono|i2}} in program order.
 
==== Read after write (RAW) ====
({{Mono|i2}} tries to read a source before {{Mono|i1}} writes to it) A read after write (RAW) data hazard refers to a situation where an instruction refers to a result that has not yet been calculated or retrieved. This can occur because even though an instruction is executed after a prior instruction, the prior instruction has been processed only partly through the pipeline.
 
===== Example =====
For example:
i1. '''R2''' <- R5 + R8
i2. R4 <- '''R2''' + R8
The first instruction is calculating a value to be saved in register {{Mono|R2}}, and the second is going to use this value to compute a result for register {{Mono|R4}}. However, in a [[Pipeline (computing)|pipeline]], when operands are fetched for the 2nd operation, the results from the first have not yet been saved, and hence a data dependency occurs.
 
A data dependency occurs with instruction {{Mono|i2}}, as it is dependent on the completion of instruction {{Mono|i1}}.
 
==== Write after read (WAR) ====
({{Mono|i2}} tries to write a destination before it is read by {{Mono|i1}}) A write after read (WAR) data hazard represents a problem with concurrent execution.
 
===== Example =====
For example:
i1. R4 <- R1 + '''R5'''
i2. '''R5''' <- R1 + R2
In any situation with a chance that {{Mono|i2}} may finish before {{Mono|i1}} (i.e., with concurrent execution), it must be ensured that the result of register {{Mono|R5}} is not stored before {{Mono|i1}} has had a chance to fetch the operands.
 
==== Write after write (WAW) ====
({{Mono|i2}} tries to write an operand before it is written by {{Mono|i1}}) A write after write (WAW) data hazard may occur in a [[Concurrent computing|concurrent execution]] environment.
 
===== Example =====
For example:
i1. '''R5''' <- R4 + R7
i2. '''R5''' <- R1 + R3
The write back (WB) of {{Mono|i2}} must be delayed until {{Mono|i1}} finishes executing.
 
=== True dependency (read-after-write) ===
 
A true dependency, also known as a '''flow''' '''dependency''' or '''data dependency''', occurs when an instruction depends on the result of a previous instruction. A violation of a true dependency leads to a '''read-after-write (RAW)''' hazard.
 
1. A = 3
Line 31 ⟶ 74:
3. C = B
 
Instruction 3 is truly dependent on instruction 2, as the final value of C depends on the instruction updating B. Instruction 2 is truly dependent on instruction 1, as the final value of B depends on the instruction updating A. Since instruction 3 is truly dependent upon instruction 2 and instruction 2 is truly dependent on instruction 1, instruction 3 is also truly dependent on instruction 1. [[Instruction level parallelism]] is therefore not an option in this example.<ref name="architecture">{{cite book | author=[[John L. Hennessy]]; [[David A. Patterson (scientist)|David A. Patterson]] | title=Computer Architecture: a quantitative approach (3rd ed.) | publisher=[[Morgan Kaufmann]] | year=2003 | isbn=1-55860-724-2}}</ref>
<ref name="architecture">{{cite book | author=[[John L. Hennessy]]; [[David A. Patterson (scientist)|David A. Patterson]] | title=Computer Architecture: a quantitative approach (3rd ed.) | publisher=[[Morgan Kaufmann]] | year=2003 | isbn=1-55860-724-2}}</ref>
 
=== Anti-dependency (write-after-read) ===
 
An anti-dependency, also known as write-after-read (WAR), occurs when an instruction requires a value that is later updated. A violation of an anti-dependency leads to a '''write-after-read (WAR)''' hazard.

In the following example, instruction 2 anti-depends on instruction 3 — the ordering of these instructions cannot be changed, nor can they be executed in parallel (possibly changing the instruction ordering), as this would affect the final value of A.
 
1. B = 3
2. A = B + 1
3. B = 7
 
Example:
MUL R3,R1,R2
ADD R2,R5,R6
 
It is clear that there is anti-dependence between these 2 instructions. At first we read R2 then in second instruction we are Writing a new value for it.
 
An anti-dependency is an example of a ''name dependency''. That is, renaming of variables could remove the dependency, as in the next example:
Line 49 ⟶ 99:
3. B = 7
 
A new variable, B2, has been declared as a copy of B in a new instruction, instruction N. The anti-dependency between 2 and 3 has been removed, meaning that these instructions may now be executed in parallel. However, the modification has introduced a new dependency: instruction 2 is now truly dependent on instruction N, which is truly dependent upon instruction 1. As flow dependencies, these new dependencies are impossible to safely remove.
<ref name="architecture"/>
 
Note that there is still a read-after-write dependency: instruction 2 is truly dependent on instruction N, which is truly dependent upon instruction 1. This dependency existed in the original version, where instruction 2 was truly dependent on instruction 1. This dependency cannot be safely removed.<ref name="architecture"/>
=== Output dependency ===
 
=== Output dependency (write-after-write) ===
An output dependency, also known as write-after-write (WAW), occurs when the ordering of instructions will affect the final output value of a variable. In the example below, there is an output dependency between instructions 3 and 1 — changing the ordering of instructions in this example will change the final value of A, thus these instructions cannot be executed in parallel.
 
An output dependency occurs when the ordering of instructions will affect the final output value of a variable. A violation of an output dependency leads to an '''write-after-write (WAW)''' hazard.
 
In the example below, there is an output dependency between instructions 3 and 1 — changing the ordering of instructions in this example will change the final value of A, thus these instructions cannot be executed in parallel.
 
1. B = 3
Line 60 ⟶ 113:
3. B = 7
 
As with anti-dependencies, output dependencies are ''name dependencies''. That is, they may be removed through renaming of variables, as in the below modification of the above example:
 
1. B2 = 3
Line 66 ⟶ 119:
3. B = 7
 
==Implications==
A commonly used naming convention for data dependencies is the following: Read-after-Write or RAW (flow dependency), Write-After-Read or WAR (anti-dependency), or Write-after-Write or WAW (output dependency).
Conventional programs are written assuming the [[sequential execution model]]. Under this model, instructions execute one after the other, atomically (i.e., at any given point in time, only one instruction is executed) and in the order specified by the program.
<ref name="architecture"/>
 
However, dependencies among statements or instructions may hinder parallelism — parallel execution of multiple instructions, either by a parallelizing compiler or by a processor exploiting [[instruction-level parallelism]]. Recklessly executing multiple instructions without considering related dependences may cause danger of getting wrong results, namely [[Hazard (computer architecture)|hazards]].
==Control Dependency==
An instruction B has a control dependency on a preceding instruction A if the outcome of A determines whether B should be executed or not. In the following example, the instruction <math>S_2</math> has a control dependency on instruction <math>S_1</math>. However, <math>S_3</math> does not depend on <math>S_1</math> because <math>S_3</math> is always executed irrespective of the outcome of <math>S_1</math>.
 
==Relevance in computing==
S1. if (a == b)
S2. a = a + b
S3. b = a + b
 
Data dependencies are relevant in various areas of computing, particularly in processor design, compiler construction, parallel computing, and concurrent programming.
Intuitively, there is control dependence between two statements A and B if
* B could be possibly executed after A
* The outcome of the execution of A will determine whether B will be executed or not.
 
=== Processor design ===
A typical example is that there are control dependences between the condition part of an if statement and the statements in its true/false bodies.
 
* [[Instruction pipelining]]: In pipelined processors, multiple instruction are executed in parallel in multiple pipeline stages. Thereby, data dependencies between [[Processor register|registers]] must be respected and handled in the processor pipeline. Most relevant are true dependencies that are resolved e.g. by [[Pipeline stall|stalling the pipeline]] or [[operand forwarding]].
A formal definition of control dependence can be presented as follows:
* [[Out-of-order execution]]: Modern processors often execute instructions out of their original order to improve performance. Thereby, name dependencies between registers must be respected (in addition to data dependencies) and are resolved e.g. by [[register renaming]] or [[scoreboarding]]. Data dependencies are also relevant for memory accesses and must be respected by [[memory disambiguation]] techniques that execute [[Primary storage|memory]] access [[Instruction (computer science)|instructions]] (loads and stores) out of program order.
 
=== Compiler construction ===
A statement <math>S_2</math> is said to be control dependent on another statement <math>S_1</math> [[If and only if|iff]]
Data dependencies are relevant for various [[compiler optimizations]], e.g.
* there exists a path <math>P</math> from <math>S_1</math> to <math>S_2</math> such that every statement <math>S_i</math> ≠ <math>S_1</math> within <math>P</math> will be followed by <math>S_2</math> in each possible path to the end of the program and
* <math>S_1</math> will not necessarily be followed by <math>S_2</math>, i.e. there is an execution path from <math>S_1</math> to the end of the program that does not go through <math>S_2</math>.
 
* [[Instruction scheduling]]: Compilers must schedule instructions in a way that respects data dependencies. This is crucial in optimizing compilers that rearrange code for better performance.
Expressed with the help of (post-)dominance the two conditions are equivalent to
* [[Loop transformation|Loop transformations]]: In optimizing loops, compilers need to consider data dependencies to apply transformations like loop unrolling, fusion, or tiling without changing the semantics of the program.
* <math>S_2</math> post-dominates all <math>S_i</math>
* [[Code motion]]: When a compiler considers moving a piece of code, it must ensure that data dependencies are not violated.
* <math>S_2</math> does not post-dominate <math>S_1</math>
 
==See also==
=== Construction of Control Dependences ===
* [[Dependency analysis]]
Control dependences are essentially the [[Dominator (graph theory)|dominance frontier]] in the reverse graph of the [[control flow graph]] (CFG).<ref>{{Cite journal|title = An Efficient Method of Computing Static Single Assignment Form|url = http://doi.acm.org/10.1145/75277.75280|publisher = ACM|journal = Proceedings of the 16th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages|date = 1989-01-01|___location = New York, NY, USA|isbn = 0897912942|pages = 25–35|series = POPL '89|doi = 10.1145/75277.75280|first = R.|last = Cytron|first2 = J.|last2 = Ferrante|first3 = B. K.|last3 = Rosen|first4 = M. N.|last4 = Wegman|first5 = F. K.|last5 = Zadeck}}</ref> Thus, one way of constructing them, would be to construct the post-dominance frontier of the CFG, and then reversing it to obtain a control dependence graph.
* [[Control dependency]]
 
* [[Loop_dependence_analysis#Data_dependence|Loop dependence analysis § Data dependence]]
The following is a pseudo-code for constructing the post-dominance frontier:
* [[Hazard (computer architecture)#Data hazards|Hazard (computer architecture) § Data hazards]]
for each X in a bottom-up traversal of the dominator tree do:
PostDominanceFrontier(X) ← ∅
for each Y ∈ Predecessors(X) do:
if immediatePostDominator(Y) ≠ X:
then PostDominanceFrontier(X) ← PostDominanceFrontier(X) ∪ {Y}
done
for each Z ∈ Children(X) do:
for each Y ∈ PostDominanceFrontier(Z) do:
if immediatePostDominator(Y) ≠ X:
then PostDominanceFrontier(X) ← PostDominanceFrontier(X) ∪ {Y}
done
done
Here, Children(X) is the set of nodes in the CFG that are post-dominated by X, and Predecessors(X) are the set of nodes in the CFG that directly precede X in the CFG.
 
Once the post-dominance frontier map is computed, reversing it will result in a map from the nodes in the CFG to the nodes that have a control dependence on them.
 
==Implications==
Conventional programs are written assuming the [[sequential execution model]]. Under this model, instructions execute one after the other, atomically (i.e., at any given point of time only one instruction is executed) and in the order specified by the program.
 
However, dependencies among statements or instructions may hinder parallelism — parallel execution of multiple instructions, either by a parallelizing compiler or by a processor exploiting [[instruction-level parallelism]]. Recklessly executing multiple instructions without considering related dependences may cause danger of getting wrong results, namely [[Hazard (computer architecture)|hazards]].
 
==References==