Revision as of 19:43, 6 August 2025 edit Lkcl (talk \| contribs) Extended confirmed users 3,004 edits →Nyuzi GPGPU: add nyuzi performance analysis link Tags: Mobile edit Mobile web edit Advanced mobile edit ← Previous edit		Revision as of 04:58, 7 August 2025 edit undo Lkcl (talk \| contribs) Extended confirmed users 3,004 edits →Description: ILLIAC IV having masked predication is a big damn deal as it predates NVIDIA and AMD by 30 years. Tags: Mobile edit Mobile web edit Advanced mobile edit Next edit →
Line 31: == Description == SIMT processors execute multiple "threads" (or "work-items" or "Sequence of SIMD Lane operations"), in lock-step, under the control of a single central unit. The model ~~has~~shares ~~much~~common ~~in common~~features with [[SIMD lanes]].<ref>{{cite book \|author1=Michael McCool \|author2=James Reinders \|author3=Arch Robison \|title=Structured Parallel Programming: Patterns for Efficient Computation \|publisher=Elsevier \|year=2013 \|page=52}}</ref> The [[ILLIAC IV]] as the world's first known SIMT processor had its [[ILLIAC_IV#Branches\|"branching"]] mechanism extensively documented, however fascinatingly it turns out to be [[Predication_(computer_architecture)#SIMD,_SIMT_and_vector_predication\|"predicate masking"]] in modern terminology. As access time of all the widespread [[random-access memory\|RAM]] types (e.g. [[DDR SDRAM]], [[GDDR SDRAM]], [[XDR DRAM]], etc.) is still relatively high, engineers came up with the idea to hide the latency that inevitably comes with each memory access. Strictly, the latency-hiding is a feature of the zero-overhead scheduling implemented by modern GPUs.

Single instruction, multiple threads: Difference between revisions