Predication (computer architecture): Difference between revisions

Content deleted Content added
Advantages: mention use of condition codes as predicates
SIMD, SIMT and vector predication: ILLIAC IV had masked Predicated SWAR! only 2 bits (2x32 or 1x64) but still!
Tags: Mobile edit Mobile web edit Advanced mobile edit
 
(38 intermediate revisions by 13 users not shown)
Line 1:
{{Short description|Form of conditionals in computer programming}}
{{More citations needed|date=March 2014}}
{{Distinguish|Branchbranch prediction}}
In [[computer science]], '''predication''' is an [[computer architecture|architectural]] feature that provides an alternative to conditional transfer of [[control flow|control]], implemented by [[instruction (computer science)|machine instructions]] such as conditional [[branch (computer science)|branch]], conditional [[subroutine|call]], conditional [[return statement|return]], and [[branch table]]s. Predication works by executing instructions from both paths of the branch and only permitting those instructions from the taken path to modify architectural state.<ref name="rvinyard">{{cite web
|url= https://www.cs.nmsu.edu/~rvinyard/itanium/predication.htm
|title= Predication
|date= 2000-04-26 |accessdate= 2014-04-22
|author= Rick Vinyard |website= cs.nmsu.edu
}}</ref> The instructions from the taken path are permitted to modify architectural state because they have been associated (''predicated'') with a ''predicate'', a [[Boolean data type|Boolean value]] used by the instruction to control whether the instruction is allowed to modify the architectural state or not.
 
In [[computer architecture]], '''predication''' is a feature that provides an alternative to [[Conditional (computer programming)|conditional]] transfer of [[control flow|control]], as implemented by conditional [[branch (computer science)|branch]] machine [[instruction (computer science)|instructions]]. Predication works by having conditional (''predicated'') non-branch instructions associated with a ''predicate'', a [[Boolean data type|Boolean value]] used by the instruction to control whether the instruction is allowed to modify the architectural state or not. If the predicate specified in the instruction is true, the instruction modifies the architectural state; otherwise, the architectural state is unchanged. For example, a predicated move instruction (a conditional move) will only modify the destination if the predicate is true. Thus, instead of using a conditional branch to select an instruction or a sequence of instructions to [[Execution (computing)|execute]] based on the predicate that controls whether the branch occurs, the instructions to be executed are associated with that predicate, so that they will be executed, or not executed, based on whether that predicate is true or false.<ref name="rvinyard">{{cite web |author=Rick Vinyard |date=2000-04-26 |title=Predication |url=https://www.cs.nmsu.edu/~rvinyard/itanium/predication.htm |archive-url=https://web.archive.org/web/20150420152310/https://www.cs.nmsu.edu/~rvinyard/itanium/predication.htm |archive-date=20 Apr 2015 |url-status=dead |access-date=2014-04-22 |website=cs.nmsu.edu}}</ref>
Put simply: if the Register Condition bit is set, the instruction is executed; if the bit is clear, it is not. With Condition Register's contents being dynamic, predicated instruction execution is likewise dynamic, i.e. (conditionally) determined at runtime.
 
[[Vector processors]], some [[SIMD]] ISAs (such as [[AVX2]] and [[AVX-512]]]) and [[GPU]]s in general make heavy use of predication, applying one bit of a conditional ''mask Vectorvector'' to the corresponding elements in the Vectorvector registers being processed, whereas scalar predication in scalar instruction sets only need the one predicate bit. Where Predicatepredicate Masksmasks become particularly powerful in [[Vectorvector processing]] is if an ''array'' of [[Condition_code_registercondition_code_register|Conditioncondition Codescodes]], one per Vectorvector element, may feed back into Predicatepredicate Masksmasks that are then applied to subsequent Vectorvector instructions.
 
==Overview==
Line 19 ⟶ 14:
<syntaxhighlight lang="c">
if condition
{dosomethingdo_something}
else
{dosomethingelsedo_something_else};
</syntaxhighlight>
 
Line 27 ⟶ 22:
 
<syntaxhighlight lang="c">
branch_if_condition_to label1
branch-if-condition to label1
do_something_else
dosomethingelse
branch-to branch_always_to label2
label1:
do_something
dosomething
label2:
...
</syntaxhighlight>
 
Line 39 ⟶ 34:
 
<syntaxhighlight lang="c">
(condition) dosomething do_something
(not condition) dosomethingelsedo_something_else
</syntaxhighlight>
 
Besides eliminating branches, less code is needed in total, provided the architecture provides predicated instructions. While this does not guarantee faster execution in general, it will if the <ttcode>dosomethingdo_something</ttcode> and <ttcode>dosomethingelsedo_something_else</ttcode> blocks of code are short enough.
 
Predication's simplest form is ''partial predication'', where the architecture has ''conditional move'' or ''conditional select'' instructions. Conditional move instructions write the contents of one register over another only if the predicate's value is true, whereas conditional select instructions choose which of two registers has its contents written to a third based on the predicate's value. A more generalized and capable form is ''full predication''. Full predication has a set of predicate registers for storing predicates (which allows multiple nested or sequential branches to be simultaneously eliminated) and most instructions in the architecture have aan (optional) register specifier field to specify which predicate register supplies the predicate.<ref>{{cite journalconference |last1=Mahlke|first1=Scott A.|last2=Hank|first2=Richard E.|last3=McCormick|first3=James E.|last4=August|first4=David I.|last5=Hwn|first5=Wen-mei W.|title=A Comparison of Full and Partial Predicated Execution Support for ILP Processors |journalconference=The 22nd International Symposium on Computer Architecture|date=, 22–24 June 1995 |year=1995 |doi=10.1145/223982.225965 |isbn=0-89791-698-0 |citeseerx=10.1.1.19.3187}}</ref>
 
==Advantages==
Line 56 ⟶ 51:
 
==Disadvantages==
Predication's primary drawback is in increased encoding space. In typical implementations, every instruction reserves a bitfield for the predicate specifying under what conditions that instruction should have an effect. When available memory is limited, as on [[embedded system|embedded devices]], this space cost can be prohibitive. However, some architectures such as [[Thumb-2]] are able to avoid this issue (see below). Other detriments are the following:<ref name="Fisher04">{{cite book |first1=Joseph A. |last1=Fisher, |first2=Paolo |last2=Faraboschi, |first3=Cliff |last3=Young (|year=2004) ''|title=Embedded Computing - A VLIW Approach to Architecture, Compilers, and Tools''. Page |page=172 |chapter-url=https://books.google.com/books?id=DbOa0p1wRD4C&pg=PA172 |chapter=4.5.2 Predication § Predication in the Embedded Domain |isbn=9780080477541 |publisher=Elsevier}}</ref>
*Predication complicates the hardware by adding levels of [[control unit|logic]] to critical [[datapath|paths]] and potentially degrades clock speed.
*A predicated block includes cycles for all operations, so shorter [[control-flow graph|paths]] may take longer and be penalized.
* An extra register read is required. A non-predicated ADD would read two registers from a register file, where a Predicated ADD would need to also read the predicate register file. This increases Hazards in [[Out-of-order execution]].
*Predication is not usually speculated and causes a longer dependency chain. For ordered data this translates to a performance loss compared to a predictable branch.<ref>{{cite web |last1=Cordes |first1=Peter |title=assembly - How does Out of Order execution work with conditional instructions, Ex: CMOVcc in Intel or ADDNE (Add not equal) in ARM |url=https://stackoverflow.com/a/50960323 |website=Stack Overflow |quote=Unlike with control dependencies (branches), they don't predict or speculate what the flags will be, so a cmovcc instead of a jcc can create a loop-carried dependency chain and end up being worse than a predictable branch. [https://stackoverflow.com/questions/50959808 gcc optimization flag -O3 makes code slower than -O2] is an example of that.}}</ref>
 
Predication is most effective when paths are balanced or when the longest path is the most frequently executed,<ref name="Fisher04"/> but determining such a path is very difficult at compile time, even in the presence of [[profiling (computer programming)|profiling information]].
Line 74 ⟶ 70:
In the [[ARM architecture]], the original 32-bit instruction set provides a feature called ''conditional execution'' that allows most instructions to be predicated by one of 13 predicates that are based on some combination of the four condition codes set by the previous instruction. ARM's [[ARM architecture#Thumb|Thumb]] instruction set (1994) dropped conditional execution to reduce the size of instructions so they could fit in 16 bits, but its successor, [[ARM architecture#Thumb-2|Thumb-2]] (2003) overcame this problem by using a special instruction which has no effect other than to supply predicates for the following four instructions. The 64-bit instruction set introduced in ARMv8-A (2011) replaced conditional execution with conditional selection instructions.
 
== SIMD, SIMT and vector predication ==
Some [[SIMD]] instruction sets, like AVX2, have the ability to use a logical [[Mask (computing)|mask]] to conditionally load/store values to memory, a parallel form of the conditional move. This form of predication is also used in [[single instruction, multiple threads]] GPU computing, where a computing core executes the code of a branch with unmet condition but does not commit the results computed in that execution path.
{{see also|Single instruction, multiple threads}}
 
Some [[SIMD within a register]] instruction sets, like AVX2, have the ability to use a logical [[Mask (computing)|mask]] to conditionally load/store values to memory, in a parallel form of the conditional move. ThisThey form of predication ismay also usedapply inindividual [[singlemask instruction,bits multipleto threads]]individual GPUarithmetic computing,units whereexecuting a computingparallel coreoperation. executesOne thepredicate codemask ofbit ais branchavailable withfor unmeteach conditionsub-word but does not commitof the resultsSWAR computedregister inor thateach executionload/store pathvalue.
This form of multi-bit predication is also used in [[vector processors]] at the element level (synonymous with SWAR sub-words):
 
<syntaxhighlight lang="c">
for each (sub-word i) of SWAR (or Vector) register
(condition-maskbit i) do_something(sub-word i)
(not condition-maskbit i) do_something_else(sub-word i)
</syntaxhighlight>
 
Masking is an integral part of [[Flynn's taxonomy|Array Processors]] such as the [[ILLIAC IV]]. Array Processors are known today as [[single instruction, multiple threads]] (SIMT), and a predicate bit ''per PE'' used to activate or de-activate each Processing Element. When the PE has no [[SIMD within a register]] instructions, each PE may be individually Predicated:
 
<syntaxhighlight lang="c">
for each (PE j) // of non-SWAR synchronously-concurrent array
(active-maskbit j) broadcast_scalar_instruction_to(PE j)
</syntaxhighlight>
 
Modern SIMT [[GPUs]] use (or used, but ILLIAC IV documentation termed it [[ILLIAC IV#Branches|"branching"]]) predication to enable/disable individual Processing Elements ''and'', separately and furthermore, to ''also'' mask-out sub-words within any given PE's SWAR ALU.
 
<syntaxhighlight lang="c">
for each (PE j) of SIMT synchronously-concurrent array
(active-maskbit j) { // broadcast only to active SWAR PEs
for each (sub-word i) of SWAR register in (PE j)
(condition-maskbit i) do_something(sub-word i)
(not condition-maskbit i) do_something_else(sub-word i)
}
</syntaxhighlight>
 
All the techniques, advantages and disadvantages of single scalar predication apply just as well to the parallel processing case, where the issues associated with branching are made far more complex.<ref>https://gpgpuarch.org/en/basic/simt/</ref>
 
==See also==
Line 95 ⟶ 121:
 
==Further reading==
*{{cite book|first=Alan |last=Clements|title=Computer Organization & Architecture: Themes and Variations |chapter=8.3.7 Predication |chapter-url={{google books|id=ySILAAAAQBAJ&pg=PA532|plainurl=yes}}|year=2013|publisher=Cengage Learning|isbn=978-1-285-41542-60|pages=532-539532–9}}
 
{{DEFAULTSORT:Predication}}