Talk:Deep Learning Super Sampling: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Add topic

Revision as of 13:31, 25 November 2022 edit Mortense (talk \| contribs) Extended confirmed users 11,805 edits →"Tensor cores", "Tensor Cores", or "tensor cores"?: new section ← Previous edit		Latest revision as of 17:48, 3 April 2025 edit undo Anothercat613 (talk \| contribs) 39 edits →Tensor Cores as separate page?: Reply Tag: Reply
(19 intermediate revisions by 9 users not shown)
Line 1: {{Talk header}} {{WikiProject banner shell\|▼ {{WikiProject ~~Computing~~banner shell\|class=C\|~~importance=Low\|hardware=yes\|hardware-importance=Low\|software~~collapsed=yes\|~~software-importance~~1=~~Low}}~~ {{WikiProject ~~Computer science~~Computing\|~~class~~importance=CLow\|hardware=yes\|hardware-importance=Low\|software=yes\|software-importance=Low\|science=yes\|science-importance=Low}} {{WikiProject Computer graphics~~\|class=C~~\|importance=High}} {{WikiProject Video games\|importance=Low}} ▲{{WikiProject ~~banner shell\|~~Technology}} {{WikiProject Artificial Intelligence}} }} Line 139 ⟶ 142: Sorta... it's an insanely verbose instruction (but a warp primitive in cuda compiles down to multiple instructions as below) that's sent to the warp scheduler to be executed as a ton of individual instructions on 32 hopefully optimally scheduled and localized (no guarantees are made) tensor subunits, then reassembled into a result when they finish. This cuts down on silicon per multiply-add subunit of the tensor cores (no need to do anything but multiply and accumulate, and a couple of simple bitwise ops on integers) but as far as i can tell you're SoL if you want a 4x4 matrix multiply add, the smallest "shape" they list is 8x4. There is no way, afaik, to individually address these sub-processing units from anywhere ''but'' the warp scheduler on chip, except maybe overriding data distribution. This is what the assembly looks like for a 16x16 matrix multiplication, which requires 32 tensor units each scheduled internally to do the required series of 4x4 operations: <syntaxhighlight lang="text"> ~~<code>~~ .global .align 32 .f16 A[256], B[256]; .global .align 32 .f32 C[256], D[256]; Line 149 ⟶ 152: wmma.store.d.sync.aligned.m16n16k16.global.col.f32 [D], {d0, d1, d2, d3, d4, d5, d6, d7}; </~~code~~syntaxhighlight><ref>{{cite web\|url=https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions\|title=PTX ISA v7.2: Section 9.7.13: Warp Level Matrix Multiply-Accumulate Instructions}}</ref> So that's the level it's designed to be used at. Since most humans can't smoke enough meth without dying to want to write it like that, compiler support for the PTX instructions is constantly worked on and a CUDA API created. I went through that pedantic mess because the wording implies that being designed for use in c++ (it was either that or C) and goes on with "wow you can even use it in ''compilers'' as if that's some super elite feature or even makes any sense. The miracle would be if it's ''supported'' by any compilers that nVidia employees didn't add the feature to. As a former compiler engineer I'd rather mate with a garbage disposal than have to implement just the backend encoding for their instruction set, let alone an assembler / assembly printer or any kind of language support. Line 168 ⟶ 171: --[[User:Mortense\|Mortense]] ([[User talk:Mortense\|talk]]) 13:31, 25 November 2022 (UTC) == DLSS 1.9 to this day is complete conjecture == Neither Nvidia nor Remedy have ever confirmed a thing such as DLSS 1.9 existing. When Remedy implemented DLSS 1.0 into the game Control it surpassed the quality of similar implementations in other games. After the release of DLSS 2.0 someone coined the previous implementation DLSS 1.9 which spread like wildfire. But it was only coined because of it's qualitative similarity to 2.0. Which never excluded Remedy and Nvidia working together to provide a better implementation (and better machine learning training). Which does not at all imply it being a different implementation or version of DLSS 1.0 at all. Only that Remedy's own TAA implementation and their implementation of DLSS 1.0 was an improvement. But the same could have been said about the implementations of DLSS 1.0 in Metro Exodus and for example Battlefield V. Both of which had implementations so vastly differing in quality that one could imply they were different versions of DLSS 1.0 when they were not. This self proclaimed fact of DLSS 1.9 being an actual differing iteration from DLSS 1.0 comes from the perception that DLSS 1.9 is/was the same (or almost the same) algorithm as DLSS 2.0 but ran on shader cores instead of shader cores. This think is fueled by the presumption (and common disdain for Nvidia marketing and sales tactics) that Nvidia forced obsolescence by making DLSS 2.0 only work on RTX GPUs when it could have worked on all GPUs as shader-based implementation. Which is completely unfounded, biased and complete speculation. I believe much of this is based on a wishful perception by opponents of Nvidia and an unfortunate miswording or misunderstanding by Techspot.com who likely coined the term or at the very least picked it up in a more official journalistic capacity and therefor helped its propagation. (see https://www.techspot.com/article/1992-nvidia-dlss-2020/) The only facts that remain are that only DLSS 1.0 and DLSS 2.0 officially ever existed as upscaling techniques and that within releases of games with DLSS 1.0 support we always had vast differentials in image quality. These differences have never been proven to be anything but the quality of implementation and quality of game specific machine learning data. Which as we know was one major factor in the different qualities of DLSS 1.0 implantations. The fact that DLSS 1.0 had to be trained on a per-game basis. I therefor conclude that, until Nvidia ever officially states otherwise, DLSS 1.9 never existed. Remedy's implementation of DLSS 1.0 simply outshone other implementations either by amount of effort to their own engine or/and to the amount of game specific machine-learned training that made people believe it was a different version entirely. [[User:Fnna509\|Fnna509]] ([[User talk:Fnna509\|talk]]) 11:49, 13 August 2023 (UTC) == DLSS 3 is NOT Exclusive to Ada Lovelace NVIDIA GPUs. == DLSS 3 Has 2 main components:<br> 1. Upscaling<br> 2. Frame Generation<br> While DLSS Frame Generation is indeed exclusive to Ada Lovelace NVIDIA GPUs, DLSS 3 ''(Subsequently, the third generation of DLSS)'' Upscaling <u>isn't</u>. [[Special:Contributions/85.198.63.121\|85.198.63.121]] ([[User talk:85.198.63.121\|talk]]) 16:07, 30 October 2023 (UTC) == Family of upscaling technologies? == Is it really a family of upscaling technologies or rather just one technology with several updated iterations? In my opinion, "family" implies that there are several different versions for corresponding compatible devices, possibly each with their own advantages and disadvantages, which it is not. It is just the same technology updated with new, optimized iterations over time. Sure, Frame Generation serves a different purpose and is only compatible with RTX 40-series cards, but it's just an optional function of DLSS 3.5, which is also explained in the article. The upscaling technology of DLSS 3.5 itself hasn't branched off into different subversions depending on the compatability of the graphics card used, as the term "family" suggests. [[User:Maxeto0910\|Maxeto0910]] ([[User talk:Maxeto0910\|talk]]) 16:33, 16 September 2024 (UTC) :I now [https://en.m.wikipedia.org/w/index.php?title=Deep_Learning_Super_Sampling&diff=1273058950&oldid=1273058758 changed the wording] as there hasn't been opposition to it for quite some months. [[User:Maxeto0910\|Maxeto0910]] ([[User talk:Maxeto0910\|talk]]) 13:38, 31 January 2025 (UTC) == DLSS 4 exclusive to RTX 50 series == It seems like DLSS 4.0 will be exclusive to the GPUs of the RTX 50 series when looking at the benchmarks on the [https://www.nvidia.com/en-me/geforce/graphics-cards/50-series/ official website of the series], as the 40 series GPUs use FG, while the 50 series GPUs use MFG in the comparison, likely to highlight the new exclusive feature. [[User:Maxeto0910\|Maxeto0910]] ([[User talk:Maxeto0910\|talk]]) 04:56, 7 January 2025 (UTC) :Okay, it's explained in [https://m.youtube.com/watch?v=qQn3bsPNTyI this video]. DLSS 4 is a set of multiple features, most of which are also available on the 40 series GPUs and some even on older ones, but MFG will indeed only come to the GPUs of the 50 series. [[User:Maxeto0910\|Maxeto0910]] ([[User talk:Maxeto0910\|talk]]) 05:32, 7 January 2025 (UTC) == Tensor Cores as separate page? == For some reason, Tensor Cores seemed to redirect to this page. It seems like Tensor Core as a technology have grown enough to be considered a separate thing form DLSS. [[User:Anothercat613\|Anothercat613]] ([[User talk:Anothercat613\|talk]]) 09:59, 3 April 2025 (UTC) :[[Tensor Core]] redirects to [[Deep_Learning_Super_Sampling#Architecture]]. There is not enough material here at the moment to justify a [[WP:SPLIT]]. ~[[User:Kvng\|Kvng]] ([[User talk:Kvng\|talk]]) 15:55, 3 April 2025 (UTC) ::Yeah. The section needs a lot of work and is rather outdated. Perhaps a split can be done after more material is added. Or someone can just start a new page about tensor cores. [[User:Anothercat613\|Anothercat613]] ([[User talk:Anothercat613\|talk]]) 17:48, 3 April 2025 (UTC)