Predication (computer architecture): Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 19:08, 27 July 2025 edit Lkcl (talk \| contribs) Extended confirmed users 3,004 edits →Overview: optional predication! Tags: Mobile edit Mobile web edit Advanced mobile edit ← Previous edit		Latest revision as of 08:03, 7 August 2025 edit undo Lkcl (talk \| contribs) Extended confirmed users 3,004 edits →SIMD, SIMT and vector predication: ILLIAC IV had masked Predicated SWAR! only 2 bits (2x32 or 1x64) but still! Tags: Mobile edit Mobile web edit Advanced mobile edit
(One intermediate revision by the same user not shown)
Line 54: Predication complicates the hardware by adding levels of [[control unit\|logic]] to critical [[datapath\|paths]] and potentially degrades clock speed. A predicated block includes cycles for all operations, so shorter [[control-flow graph\|paths]] may take longer and be penalized. * An extra register read is required. A non-predicated ADD would read two registers from a register file, where a Predicated ADD would need to also read the predicate register file. This increases Hazards in [[Out-of-order execution]]. *Predication is not usually speculated and causes a longer dependency chain. For ordered data this translates to a performance loss compared to a predictable branch.<ref>{{cite web \|last1=Cordes \|first1=Peter \|title=assembly - How does Out of Order execution work with conditional instructions, Ex: CMOVcc in Intel or ADDNE (Add not equal) in ARM \|url=https://stackoverflow.com/a/50960323 \|website=Stack Overflow \|quote=Unlike with control dependencies (branches), they don't predict or speculate what the flags will be, so a cmovcc instead of a jcc can create a loop-carried dependency chain and end up being worse than a predictable branch. [https://stackoverflow.com/questions/50959808 gcc optimization flag -O3 makes code slower than -O2] is an example of that.}}</ref> Line 81 ⟶ 82: </syntaxhighlight> Masking is an integral part of [[Flynn's taxonomy\|Array Processors]] such as the [[ILLIAC IV]]. Array Processors are known today as [[single instruction, multiple threads]] (SIMT), and a predicate bit ''per PE'' used to activate or de-activate each Processing Element. When the PE has no [[SIMD within a register]] instructions, each PE may be individually Predicated: <syntaxhighlight lang="c"> for each (PE j) // of ~~ILLIAC IV~~non-SWAR synchronously-concurrent array (active-maskbit j) broadcast_scalar_instruction_to(PE j) </syntaxhighlight> Modern SIMT [[GPUs]] use (or used, but ILLIAC IV documentation termed it [[ILLIAC IV#Branches\|"branching"]]) predication to enable/disable individual Processing Elements ''and'', separately and furthermore, to ''also'' mask-out sub-words within any given PE's SWAR ALU. <syntaxhighlight lang="c">