Content deleted Content added
Maxeto0910 (talk | contribs) expanded Tags: Visual edit Mobile edit Mobile web edit Advanced mobile edit |
CorePaprika (talk | contribs) m updated list of cards supporting DLSS with RTX 50 series |
||
Line 145:
== Architecture ==
With the exception of the shader-core version implemented in ''Control'', DLSS is only available on [[GeForce 20 series|GeForce RTX 20]], [[GeForce 30 series|GeForce RTX 30]], [[GeForce 40 series|GeForce RTX 40]], [[GeForce 50 series|GeForce RTX 50]], and [[Quadro#Quadro RTX|Quadro RTX]] series of video cards, using dedicated [[AI accelerator]]s called '''Tensor Cores'''.<ref name="nvidia20"/>{{Failed verification|date=March 2024}} Tensor Cores are available since the Nvidia [[Volta (microarchitecture)|Volta]] [[graphics processing unit|GPU]] [[microarchitecture]], which was first used on the [[Nvidia Tesla|Tesla V100]] line of products.<ref>
{{cite web|url=https://www.tomshardware.com/news/nvidia-tensor-core-tesla-v100,34384.html|title=On Tensors, Tensorflow, And Nvidia's Latest 'Tensor Cores'|publisher=tomshardware.com|date=2017-04-11|access-date=2020-04-08}}</ref> They are used for doing [[Multiply–accumulate operation|fused multiply-add]] (FMA) operations that are used extensively in neural network calculations for applying a large series of multiplications on weights, followed by the addition of a bias. Tensor cores can operate on FP16, INT8, INT4, and INT1 data types. Each core can do 1024 bits of FMA operations per clock, so 1024 INT1, 256 INT4, 128 INT8, and 64 FP16 operations per clock per tensor core, and most Turing GPUs have a few hundred tensor cores.<ref>{{Cite web|title=Tensor Core DL Performance Guide|url=https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9926-tensor-core-performance-the-ultimate-guide.pdf|url-status=live|website=Nvidia|archive-url=https://web.archive.org/web/20201111223322/https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9926-tensor-core-performance-the-ultimate-guide.pdf |archive-date=2020-11-11 }}</ref> The Tensor Cores use [[CUDA]] [[Warp (CUDA)|Warp]]-Level Primitives on 32 parallel threads to take advantage of their parallel architecture.<ref>{{cite web|url=https://devblogs.nvidia.com/using-cuda-warp-level-primitives/|title=Using CUDA Warp-Level Primitives|publisher=[[Nvidia]]|date=2018-01-15|access-date=2020-04-08|quote=NVIDIA GPUs execute groups of threads known as warps in SIMT (Single Instruction, Multiple Thread) fashion.}}</ref> A Warp is a set of 32 [[Thread (computing)|threads]] which are configured to execute the same instruction. Since [[Windows 10 version 1903]], Microsoft Windows provided [[DirectML]] as one part of [[DirectX]] to support Tensor Cores.
|