Data dependency: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 02:26, 27 September 2023 edit Citation bot (talk \| contribs) Bots 5,869,347 edits Removed parameters. \| Use this bot. Report bugs. \| #UCB_CommandLine ← Previous edit		Latest revision as of 09:12, 21 March 2025 edit undo Rraimii (talk \| contribs) 2 edits For clarity, added reminder that the RAW dependency cannot be removed.
(19 intermediate revisions by 13 users not shown)
Line 1: {{Short description\|Programming situation where an instruction refers to a prior instruction's data}} {{More citations needed\|date=September 2024}} A '''data dependency''' in [[computer science]] is a situation in which a [[program statement]] (instruction) refers to the data of a preceding statement. In [[compiler theory]], the technique used to discover data dependencies among statements (or instructions) is called [[dependence analysis]]. == Description == ~~There are three types of dependencies: data, name, and control.~~ ~~== Data dependencies ==~~ Assuming statement <math>S_1</math> and <math>S_2</math>, <math>S_2</math> depends on <math>S_1</math> if: Line 15: * there is a feasible run-time execution path from <math>S_1</math> to {{nobr\|1=<math>S_2</math>.}} These conditions are called Bernstein's Conditions, named after Arthur J. Bernstein.<ref>{{cite journal\|last=Bernstein\|first=Arthur J.\|title=Analysis of Programs for Parallel Processing\|journal=IEEE Transactions on Electronic Computers\|date=1 October 1966\|volume=EC-15\|issue=5\|pages=757–763\|doi=10.1109/PGEC.1966.264565}}</ref> ~~This Condition is called Bernstein Condition, named by A. J. Bernstein.~~ Three cases exist: Line 23: * Output dependence: <math>O(S_1) \cap O(S_2) \neq \varnothing</math>, <math>S_1 \rightarrow S_2</math> and both write the same memory ___location. == Types == ~~=== Flow dependency (True dependency) ===~~ === Data hazards === ~~A Flow dependency, also known as a data dependency or true dependency or read-after-write (RAW), occurs when an instruction depends on the result of a previous instruction.~~ Data hazards occur when instructions that exhibit data dependence modify data in different stages of a pipeline. Ignoring potential data hazards can result in [[Race condition\|race conditions]] (also termed race hazards). There are three situations in which a data hazard can occur: # read after write (RAW), a ''true dependency'' # write after read (WAR), an ''anti-dependency'' # write after write (WAW), an ''output dependency'' # read after read (RAR), a false dependency Read after read (RAR) is not a hazard case. Consider two instructions {{Mono\|i1}} and {{Mono\|i2}}, with {{Mono\|i1}} occurring before {{Mono\|i2}} in program order. ==== Read after write (RAW) ==== ({{Mono\|i2}} tries to read a source before {{Mono\|i1}} writes to it) A read after write (RAW) data hazard refers to a situation where an instruction refers to a result that has not yet been calculated or retrieved. This can occur because even though an instruction is executed after a prior instruction, the prior instruction has been processed only partly through the pipeline. ===== Example ===== For example: i1. '''R2''' <- R5 + R8 i2. R4 <- '''R2''' + R8 The first instruction is calculating a value to be saved in register {{Mono\|R2}}, and the second is going to use this value to compute a result for register {{Mono\|R4}}. However, in a [[Pipeline (computing)\|pipeline]], when operands are fetched for the 2nd operation, the results from the first have not yet been saved, and hence a data dependency occurs. A data dependency occurs with instruction {{Mono\|i2}}, as it is dependent on the completion of instruction {{Mono\|i1}}. ==== Write after read (WAR) ==== ({{Mono\|i2}} tries to write a destination before it is read by {{Mono\|i1}}) A write after read (WAR) data hazard represents a problem with concurrent execution. ===== Example ===== For example: i1. R4 <- R1 + '''R5''' i2. '''R5''' <- R1 + R2 In any situation with a chance that {{Mono\|i2}} may finish before {{Mono\|i1}} (i.e., with concurrent execution), it must be ensured that the result of register {{Mono\|R5}} is not stored before {{Mono\|i1}} has had a chance to fetch the operands. ==== Write after write (WAW) ==== ({{Mono\|i2}} tries to write an operand before it is written by {{Mono\|i1}}) A write after write (WAW) data hazard may occur in a [[Concurrent computing\|concurrent execution]] environment. ===== Example ===== For example: i1. '''R5''' <- R4 + R7 i2. '''R5''' <- R1 + R3 The write back (WB) of {{Mono\|i2}} must be delayed until {{Mono\|i1}} finishes executing. === True dependency (read-after-write) === A true dependency, also known as a '''flow''' '''dependency''' or '''data dependency''', occurs when an instruction depends on the result of a previous instruction. A violation of a true dependency leads to a '''read-after-write (RAW)''' hazard. 1. A = 3 Line 31 ⟶ 74: 3. C = B Instruction 3 is truly dependent on instruction 2, as the final value of C depends on the instruction updating B. Instruction 2 is truly dependent on instruction 1, as the final value of B depends on the instruction updating A. Since instruction 3 is truly dependent upon instruction 2 and instruction 2 is truly dependent on instruction 1, instruction 3 is also truly dependent on instruction 1. [[Instruction level parallelism]] is therefore not an option in this example.<ref name="architecture">{{cite book \| author=[[John L. Hennessy]]; [[David A. Patterson (scientist)\|David A. Patterson]] \| title=Computer Architecture: a quantitative approach (3rd ed.) \| publisher=[[Morgan Kaufmann]] \| year=2003 \| isbn=1-55860-724-2}}</ref> <ref name="architecture">{{cite book \| author=[[John L. Hennessy]]; [[David A. Patterson (scientist)\|David A. Patterson]] \| title=Computer Architecture: a quantitative approach (3rd ed.) \| publisher=[[Morgan Kaufmann]] \| year=2003 \| isbn=1-55860-724-2}}</ref> === Anti-dependency (write-after-read) === An anti-dependency~~, also known as write-after-read (WAR),~~ occurs when an instruction requires a value that is later updated. A violation of an anti-dependency leads to a '''write-after-read (WAR)''' hazard. In the following example, instruction 2 anti-depends on instruction 3 — the ordering of these instructions cannot be changed, nor can they be executed in parallel (possibly changing the instruction ordering), as this would affect the final value of A. 1. B = 3 Line 42 ⟶ 86: 3. B = 7 Example : MUL R3,R1,R2 ADD R2,R5,R6 It is clear that there is anti-dependence between these 2 instructions. At first we read R2 then in second instruction we are Writing a new value for it. ~~value for it.~~ An anti-dependency is an example of a ''name dependency''. That is, renaming of variables could remove the dependency, as in the next example: Line 56 ⟶ 99: 3. B = 7 A new variable, B2, has been declared as a copy of B in a new instruction, instruction N. The anti-dependency between 2 and 3 has been removed, meaning that these instructions may now be executed in parallel. However, the modification has introduced a new dependency: instruction 2 is now truly dependent on instruction N, which is truly dependent upon instruction 1. As flow dependencies, these new dependencies are impossible to safely remove. ~~<ref name="architecture"/>~~ Note that there is still a read-after-write dependency: instruction 2 is truly dependent on instruction N, which is truly dependent upon instruction 1. This dependency existed in the original version, where instruction 2 was truly dependent on instruction 1. This dependency cannot be safely removed.<ref name="architecture"/> ~~=== Output dependency ===~~ === Output dependency (write-after-write) === An output dependency, also known as write-after-write (WAW), occurs when the ordering of instructions will affect the final output value of a variable. In the example below, there is an output dependency between instructions 3 and 1 — changing the ordering of instructions in this example will change the final value of A, thus these instructions cannot be executed in parallel. An output dependency occurs when the ordering of instructions will affect the final output value of a variable. A violation of an output dependency leads to an '''write-after-write (WAW)''' hazard. In the example below, there is an output dependency between instructions 3 and 1 — changing the ordering of instructions in this example will change the final value of A, thus these instructions cannot be executed in parallel. 1. B = 3 Line 67 ⟶ 113: 3. B = 7 As with anti-dependencies, output dependencies are ''name dependencies''. That is, they may be removed through renaming of variables, as in the below modification of the above example: 1. B2 = 3 Line 73 ⟶ 119: 3. B = 7 ==Implications== A commonly used naming convention for data dependencies is the following: Read-after-Write or RAW (flow dependency), Write-After-Read or WAR (anti-dependency), or Write-after-Write or WAW (output dependency). Conventional programs are written assuming the [[sequential execution model]]. Under this model, instructions execute one after the other, atomically (i.e., at any given point in time, only one instruction is executed) and in the order specified by the program. ~~<ref name="architecture"/>~~ However, dependencies among statements or instructions may hinder parallelism — parallel execution of multiple instructions, either by a parallelizing compiler or by a processor exploiting [[instruction-level parallelism]]. Recklessly executing multiple instructions without considering related dependences may cause danger of getting wrong results, namely [[Hazard (computer architecture)\|hazards]]. ~~==Control dependency==~~ An instruction B has a control dependency on a preceding instruction A if the outcome of A determines whether B should be executed or not. In the following example, the instruction <math>S_2</math> has a control dependency on instruction <math>S_1</math>. However, <math>S_3</math> does not depend on <math>S_1</math> because <math>S_3</math> is always executed irrespective of the outcome of <math>S_1</math>. ~~S1. if (a == b)~~ ~~S2. a = a + b~~ ~~S3. b = a + b~~ ==Relevance in computing== ~~Intuitively, there is control dependence between two statements A and B if~~ * B could be possibly executed after A * The outcome of the execution of A will determine whether B will be executed or not. Data dependencies are relevant in various areas of computing, particularly in processor design, compiler construction, parallel computing, and concurrent programming. ~~A typical example is that there are control dependences between the condition part of an if statement and the statements in its true/false bodies.~~ === Processor design === ~~A formal definition of control dependence can be presented as follows:~~ * [[Instruction pipelining]]: In pipelined processors, multiple instruction are executed in parallel in multiple pipeline stages. Thereby, data dependencies between [[Processor register\|registers]] must be respected and handled in the processor pipeline. Most relevant are true dependencies that are resolved e.g. by [[Pipeline stall\|stalling the pipeline]] or [[operand forwarding]]. ~~A statement <math>S_2</math> is said to be control dependent on another statement <math>S_1</math> [[If and only if\|iff]]~~ * [[Out-of-order execution]]: Modern processors often execute instructions out of their original order to improve performance. Thereby, name dependencies between registers must be respected (in addition to data dependencies) and are resolved e.g. by [[register renaming]] or [[scoreboarding]]. Data dependencies are also relevant for memory accesses and must be respected by [[memory disambiguation]] techniques that execute [[Primary storage\|memory]] access [[Instruction (computer science)\|instructions]] (loads and stores) out of program order. * there exists a path <math>P</math> from <math>S_1</math> to <math>S_2</math> such that every statement <math>S_i</math> ≠ <math>S_1</math> within <math>P</math> will be followed by <math>S_2</math> in each possible path to the end of the program and * <math>S_1</math> will not necessarily be followed by <math>S_2</math>, i.e. there is an execution path from <math>S_1</math> to the end of the program that does not go through <math>S_2</math>. === Compiler construction === ~~Expressed with the help of (post-)dominance the two conditions are equivalent to~~ Data dependencies are relevant for various [[compiler optimizations]], e.g. * <math>S_2</math> post-dominates all <math>S_i</math> * <math>S_2</math> does not post-dominate <math>S_1</math> * [[Instruction scheduling]]: Compilers must schedule instructions in a way that respects data dependencies. This is crucial in optimizing compilers that rearrange code for better performance. ~~=== Construction of control dependences ===~~ * [[Loop transformation\|Loop transformations]]: In optimizing loops, compilers need to consider data dependencies to apply transformations like loop unrolling, fusion, or tiling without changing the semantics of the program. Control dependences are essentially the [[Dominator (graph theory)\|dominance frontier]] in the reverse graph of the [[control-flow graph]] (CFG).<ref>{{Cite book\|publisher = ACM\|date = 1989-01-01\|___location = New York, NY, USA\|isbn = 0897912942\|pages = 25–35\|doi = 10.1145/75277.75280\|first1 = R.\|last1 = Cytron\|first2 = J.\|last2 = Ferrante\|first3 = B. K.\|last3 = Rosen\|first4 = M. N.\|last4 = Wegman\|first5 = F. K.\|last5 = Zadeck\| title=Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '89 \| chapter=An efficient method of computing static single assignment form \| s2cid=8301431 }}</ref> Thus, one way of constructing them, would be to construct the post-dominance frontier of the CFG, and then reversing it to obtain a control dependence graph. * [[Code motion]]: When a compiler considers moving a piece of code, it must ensure that data dependencies are not violated. ~~The following is a pseudo-code for constructing the post-dominance frontier:~~ ~~for each X in a bottom-up traversal of the post-dominator tree do:~~ ~~PostDominanceFrontier(X) ← ∅~~ ~~for each Y ∈ Predecessors(X) do:~~ ~~if immediatePostDominator(Y) ≠ X:~~ ~~then PostDominanceFrontier(X) ← PostDominanceFrontier(X) ∪ {Y}~~ ~~done~~ ~~for each Z ∈ Children(X) do:~~ ~~for each Y ∈ PostDominanceFrontier(Z) do:~~ ~~if immediatePostDominator(Y) ≠ X:~~ ~~then PostDominanceFrontier(X) ← PostDominanceFrontier(X) ∪ {Y}~~ ~~done~~ ~~done~~ ~~done~~ ~~Here, Children(X) is the set of nodes in the CFG that are immediately post-dominated by X, and Predecessors(X) are the set of nodes in the CFG that directly precede X in the CFG.~~ ~~Note that node X shall be processed only after all its Children have been processed.~~ ~~Once the post-dominance frontier map is computed, reversing it will result in a map from the nodes in the CFG to the nodes that have a control dependence on them.~~ ~~==Implications==~~ Conventional programs are written assuming the [[sequential execution model]]. Under this model, instructions execute one after the other, atomically (i.e., at any given point in time, only one instruction is executed) and in the order specified by the program. However, dependencies among statements or instructions may hinder parallelism — parallel execution of multiple instructions, either by a parallelizing compiler or by a processor exploiting [[instruction-level parallelism]]. Recklessly executing multiple instructions without considering related dependences may cause danger of getting wrong results, namely [[Hazard (computer architecture)\|hazards]]. ==See also== ~~==Data Dependency Management==~~ * [[Dependency analysis]] In 2016, Edgard Marx, a computer scientist from Leipzig University, coined the term Data Dependency Management<ref>{{Cite web\|title=Data Dependency Management\|url=https://github.com/AKSW/KBox\|access-date=2023-07-13\|website=github.com}}</ref> to refer to the systems that deal with the management of data. * [[Control dependency]] Data Dependency Management systems are designed to facilitate the deployment and management of data on the cloud, personal computers, or smart devices (edge). Data Dependency Management frameworks can be used to describe how the data was conceived, licensing as well as its dependencies. The concept of data dependency management comes from software package dependency management tools such as npm for JavaScript, [[RubyGems\|gem]] for Ruby, and [[NuGet]] for .NET. Their rationale is to allow users to manage the software dependency on data, such as machine learning models for data-driven applications. They are useful to publish, locate, and install data packages. A typical example of a data dependency management frameworks are Hugging Face, KBox,<ref>{{Cite journal\|title=KBox\|url=https://ieeexplore.ieee.org/document/7889519\|access-date=2023-07-13\|website=gieeexplore.ieee.org\|date=January 2017 \|pages=125–132 \|doi=10.1109/ICSC.2017.77 \|s2cid=14980310 }}</ref> among others. * [[Loop_dependence_analysis#Data_dependence\|Loop dependence analysis § Data dependence]] * [[Hazard (computer architecture)#Data hazards\|Hazard (computer architecture) § Data hazards]] ==References==