Talk:Deep Learning Super Sampling: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Add topic

Revision as of 04:56, 7 January 2025 edit Maxeto0910 (talk \| contribs) Extended confirmed users 117,383 edits →DLSS 4 exclusive to RTX 50 series: new section Tags: Mobile edit Mobile web edit Advanced mobile edit New topic ← Previous edit		Latest revision as of 17:48, 3 April 2025 edit undo Anothercat613 (talk \| contribs) 39 edits →Tensor Cores as separate page?: Reply Tag: Reply
(8 intermediate revisions by 4 users not shown)
Line 142: Sorta... it's an insanely verbose instruction (but a warp primitive in cuda compiles down to multiple instructions as below) that's sent to the warp scheduler to be executed as a ton of individual instructions on 32 hopefully optimally scheduled and localized (no guarantees are made) tensor subunits, then reassembled into a result when they finish. This cuts down on silicon per multiply-add subunit of the tensor cores (no need to do anything but multiply and accumulate, and a couple of simple bitwise ops on integers) but as far as i can tell you're SoL if you want a 4x4 matrix multiply add, the smallest "shape" they list is 8x4. There is no way, afaik, to individually address these sub-processing units from anywhere ''but'' the warp scheduler on chip, except maybe overriding data distribution. This is what the assembly looks like for a 16x16 matrix multiplication, which requires 32 tensor units each scheduled internally to do the required series of 4x4 operations: <syntaxhighlight lang="text"> ~~<code>~~ .global .align 32 .f16 A[256], B[256]; .global .align 32 .f32 C[256], D[256]; Line 152: wmma.store.d.sync.aligned.m16n16k16.global.col.f32 [D], {d0, d1, d2, d3, d4, d5, d6, d7}; </~~code~~syntaxhighlight><ref>{{cite web\|url=https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions\|title=PTX ISA v7.2: Section 9.7.13: Warp Level Matrix Multiply-Accumulate Instructions}}</ref> So that's the level it's designed to be used at. Since most humans can't smoke enough meth without dying to want to write it like that, compiler support for the PTX instructions is constantly worked on and a CUDA API created. I went through that pedantic mess because the wording implies that being designed for use in c++ (it was either that or C) and goes on with "wow you can even use it in ''compilers'' as if that's some super elite feature or even makes any sense. The miracle would be if it's ''supported'' by any compilers that nVidia employees didn't add the feature to. As a former compiler engineer I'd rather mate with a garbage disposal than have to implement just the backend encoding for their instruction set, let alone an assembler / assembly printer or any kind of language support. Line 213: Is it really a family of upscaling technologies or rather just one technology with several updated iterations? In my opinion, "family" implies that there are several different versions for corresponding compatible devices, possibly each with their own advantages and disadvantages, which it is not. It is just the same technology updated with new, optimized iterations over time. Sure, Frame Generation serves a different purpose and is only compatible with RTX 40-series cards, but it's just an optional function of DLSS 3.5, which is also explained in the article. The upscaling technology of DLSS 3.5 itself hasn't branched off into different subversions depending on the compatability of the graphics card used, as the term "family" suggests. [[User:Maxeto0910\|Maxeto0910]] ([[User talk:Maxeto0910\|talk]]) 16:33, 16 September 2024 (UTC) :I now [https://en.m.wikipedia.org/w/index.php?title=Deep_Learning_Super_Sampling&diff=1273058950&oldid=1273058758 changed the wording] as there hasn't been opposition to it for quite some months. [[User:Maxeto0910\|Maxeto0910]] ([[User talk:Maxeto0910\|talk]]) 13:38, 31 January 2025 (UTC) == DLSS 4 exclusive to RTX 50 series == It seems like DLSS 4.0 will be exclusive to the GPUs of the RTX 50 series when looking at the benchmarks on the [https://www.nvidia.com/en-me/geforce/graphics-cards/50-series/ official website of the series], as the 40 series GPUs use FG, while the 50 series GPUs use MFG in the comparison, likely to highlight the new exclusive feature. [[User:Maxeto0910\|Maxeto0910]] ([[User talk:Maxeto0910\|talk]]) 04:56, 7 January 2025 (UTC) :Okay, it's explained in [https://m.youtube.com/watch?v=qQn3bsPNTyI this video]. DLSS 4 is a set of multiple features, most of which are also available on the 40 series GPUs and some even on older ones, but MFG will indeed only come to the GPUs of the 50 series. [[User:Maxeto0910\|Maxeto0910]] ([[User talk:Maxeto0910\|talk]]) 05:32, 7 January 2025 (UTC) == Tensor Cores as separate page? == For some reason, Tensor Cores seemed to redirect to this page. It seems like Tensor Core as a technology have grown enough to be considered a separate thing form DLSS. [[User:Anothercat613\|Anothercat613]] ([[User talk:Anothercat613\|talk]]) 09:59, 3 April 2025 (UTC) :[[Tensor Core]] redirects to [[Deep_Learning_Super_Sampling#Architecture]]. There is not enough material here at the moment to justify a [[WP:SPLIT]]. ~[[User:Kvng\|Kvng]] ([[User talk:Kvng\|talk]]) 15:55, 3 April 2025 (UTC) ::Yeah. The section needs a lot of work and is rather outdated. Perhaps a split can be done after more material is added. Or someone can just start a new page about tensor cores. [[User:Anothercat613\|Anothercat613]] ([[User talk:Anothercat613\|talk]]) 17:48, 3 April 2025 (UTC)