Floating point operations per second: Difference between revisions

Content deleted Content added
FrescoBot (talk | contribs)
m Bot: link syntax and minor changes
Line 77:
 
== Floating-point operations per clock cycle for various processors ==
{{Table alignment}}
{| class="wikitable sortable"
{{sort-under}}
{| class="wikitable sortable sort-under col3right col4right col5right"
|+ Floating-point operations per clock cycle per core<ref>{{Cite web | url=https://en.wikichip.org/wiki/flops | title=Floating-Point Operations Per Second (FLOPS)}}</ref>
! scope="col" | Microarchitecture
Line 89 ⟶ 91:
|[[Intel 80486]]
|[[x87]] (32-bit)
| {{dunno}}
|?
|0.128<ref name=":1" />
| {{dunno}}
|?
|-
|{{plainlist|
Line 98 ⟶ 100:
}}
|[[x87]] (32-bit)
| {{dunno}}
|?
|0.5<ref name=":1">{{Cite web|title=home.iae.nl |url=http://home.iae.nl/users/mhx/flops_4.tbl|access-date=|website=}}</ref>
| {{dunno}}
|?
|-
|{{plainlist|
Line 107 ⟶ 109:
}}
|[[MMX (instruction set)|MMX]] (64-bit)
| {{dunno}}
|?
|1<ref name=":0">{{Cite web|title=Computing Power throughout History|url=https://www.alternatewars.com/BBOW/Computing/Computing_Power.htm|access-date=2021-02-13|website=alternatewars.com}}</ref>
| {{dunno}}
|?
|-
|Intel [[P6 (microarchitecture)|P6]] [[Pentium III]]
|[[Streaming SIMD Extensions|SSE]] (64-bit)
| {{dunno}}
|?
|2<ref name=":0" />
| {{dunno}}
|?
|-
|Intel [[NetBurst]] [[Pentium 4]] (Willamette, Northwood)
Line 121 ⟶ 123:
|2
|4
| {{dunno}}
|?
|-
|Intel [[P6 (microarchitecture)|P6]] [[Pentium M]]
Line 127 ⟶ 129:
|1
|2
| {{dunno}}
|?
|-
|{{plainlist|
Line 137 ⟶ 139:
|2
|4
| {{dunno}}
|?
|-
|{{plainlist|
Line 147 ⟶ 149:
*[[SSE4]] (128-bit)
}}
| 4 || 8 || ?{{dunno}}
|-
| Intel [[Atom (system on chip)|Atom]] ([[Bonnell (microarchitecture)|Bonnell]], [[Saltwell (microarchitecture)|Saltwell]], [[Silvermont (microarchitecture)|Silvermont]] and [[Goldmont]]) || [[SSE3]] (128-bit) |
| 2 || 4 || ?{{dunno}}
|-
| Intel [[Sandy Bridge]] ([[Sandy Bridge]], [[Ivy Bridge (microarchitecture)|Ivy Bridge]]) || [[Advanced Vector Extensions|AVX]] (256-bit) |
| 8 || 16 || 0
|-
|{{plainlist|
Line 157 ⟶ 161:
*Intel [[Skylake (microarchitecture)|Skylake]] ([[Skylake (microarchitecture)|Skylake]], [[Kaby Lake]], [[Coffee Lake]], [[Comet Lake (microprocessor)|Comet Lake]], [[Whiskey Lake (microarchitecture)|Whiskey Lake]], [[Amber Lake (microarchitecture)|Amber Lake]])
}}
|[[Advanced Vector Extensions|AVX2]] & [[FMA instruction set|FMA]] (256-bit) |
| 16 || 32 || 0
|-
| Intel [[Xeon Phi]] ([[Knights Corner]]) || [[Initial Many Core Instructions|IMCI]] (512-bit) |
| 16 || 32 || 0
|-
|{{plainlist|
Line 166 ⟶ 172:
*Intel [[Ice Lake (microprocessor)|Ice Lake]], [[Tiger Lake (microprocessor)|Tiger Lake]] and [[Rocket Lake]]
}}
| [[Advanced Vector Extensions|AVX-512]] & [[FMA instruction set|FMA]] (512-bit) || 32 || 64 || 0
|-
! colspan="5" |AMD CPU
Line 185 ⟶ 191:
*[[Advanced Vector Extensions|AVX]] (128-bit) (Bulldozer, Steamroller)
*[[AVX2]] (128-bit) (Excavator)
*[[FMA instruction set|FMA3]] (Bulldozer)<ref>{{Cite web|url=https://developer.amd.com/wordpress/media/2012/10/New-Bulldozer-and-Piledriver-Instructions.pdf|title=New instructions support for Bulldozer (FMA3) and Piledriver (FMA3+4 and CVT, BMI, TBMTB
M)}}</ref>
*[[FMA instruction set|FMA3/4]] (Piledriver, Excavator)
}}
Line 194 ⟶ 201:
*AMD [[Zen+]]<ref name="tpeak_jos"/><ref>{{Cite web | url=http://www.agner.org/optimize/blog/read.php?i=838 | title=Agner's CPU blog - Test results for AMD Ryzen}}</ref><ref>https://arstechnica.com/gadgets/2017/03/amds-moment-of-zen-finally-an-architecture-that-can-compete/2/ "each core now has a pair of 128-bit FMA units of its own"</ref><ref>{{cite conference |url=https://www.hotchips.org/wp-content/uploads/hc_archives/hc28/HC28.23-Tuesday-Epub/HC28.23.90-High-Perform-Epub/HC28.23.930-X86-core-MikeClark-AMD-final_v2-28.pdf#page=7 |title=A New x86 Core Architecture for the Next Generation of Computing |author=Mike Clark |date=August 23, 2016 |publisher=AMD |conference=HotChips 28 |access-date=October 8, 2017 |archive-date=July 31, 2020 |archive-url=https://web.archive.org/web/20200731171730/https://www.hotchips.org/wp-content/uploads/hc_archives/hc28/HC28.23-Tuesday-Epub/HC28.23.90-High-Perform-Epub/HC28.23.930-X86-core-MikeClark-AMD-final_v2-28.pdf#page=7 |url-status=dead }} [https://images.anandtech.com/doci/10591/HC28.AMD.Mike%20Clark.final-page-007.jpg page 7]</ref> (Ryzen 2000 series, Threadripper 2000 series)
}}
| [[Advanced Vector Extensions|AVX2]] & [[FMA instruction set|FMA]] (128-bit, 256-bit decoding)<ref>{{Cite web |title=The microarchitecture of Intel and AMD CPUs |url=https://www.agner.org/optimize/microarchitecture.pdf}}</ref> |
| 8 || 16 || 0
|-
|{{plainlist|
Line 200 ⟶ 208:
*AMD [[Zen 3]] (Ryzen 5000 series, Epyc [[Epyc|Milan]])
}}
| [[Advanced Vector Extensions|AVX2]] & [[FMA instruction set|FMA]] (256-bit) |
| 16 || 32 || 0
|-
! colspan="5" |ARM CPU
|-
| ARM Cortex-A7, A9, A15 || [[ARM architecture|ARMv7]] |
| 1 || 8 || 0
|-
| ARM Cortex-A32, A35 || [[ARM architecture|ARMv8]] |
| 2 || 8 || 0
|-
| [[ARM Cortex-A53]], [[ARM Cortex-A55|A55]], [[ARM Cortex-A57|A57]],<ref name="tpeak_jos"/> [[ARM Cortex-A72|A72]], [[ARM Cortex-A73|A73]], [[ARM Cortex-A75|A75]] || [[ARM architecture|ARMv8]] |
| 4 || 8 || 0
|-
| [[ARM Cortex-A76]], [[ARM Cortex-A77|A77]], [[ARM Cortex-A78|A78]]|| [[ARM architecture|ARMv8]] |
| 8 || 16 || 0
|-
| [[ARM Cortex-X1]] || [[ARM architecture|ARMv8]] |
| 16 || 32 || ?{{dunno}}
|-
| Qualcomm [[Krait (CPU)|Krait]] || [[ARM architecture|ARMv8]] |
| 1 || 8 || 0
|-
| Qualcomm [[Kryo]] (1xx - 3xx) || [[ARM architecture|ARMv8]] |
| 2 || 8 || 0
|-
| Qualcomm [[Kryo]] (4xx - 5xx) || [[ARM architecture|ARMv8]] |
| 8 || 16 || 0
|-
| Samsung [[Exynos]] M1 and M2 || [[ARM architecture|ARMv8]] |
| 2 || 8 || 0
|-
| Samsung [[Exynos]] M3 and M4 || [[ARM architecture|ARMv8]] |
| 3 || 12 || 0
|-
| IBM PowerPC [[IBM A2|A2]] (Blue Gene/Q) || ?{{dunno}} |
| 8 || 8 <br/>(as FP64) || 0
|-
| [[Hitachi SH-4]]<ref>{{cite journal |title=Entertainment Systems and High-Performance Processor SH-4 |journal=Hitachi Review |date=1999 |volume=48 |issue=2 |pages=58–63 |publisher=[[Hitachi]] |url=https://retrocdn.net/images/f/fa/Entertainment_Systems_and_High-Performance_Processor_SH-4.pdf |access-date=June 21, 2019}}</ref><ref>{{cite web |title=SH-4 Next-Generation DSP Architecture for VoIP |url=https://retrocdn.net/images/b/b3/SH-4_Next-Generation_DSP_Architecture.pdf |publisher=[[Hitachi]] |year=2000 |access-date=June 21, 2019}}</ref> || [[Hitachi SH-4|SH-4]] |
| 1 || 7 || 0
|-
! colspan="5" |Nvidia GPU
|-
|Nvidia [[Curie (microarchitecture)|Curie]] ([[GeForce 6 series]] and [[GeForce 7 series]])
|[[Parallel Thread Execution|PTX]] || ?{{dunno}} || 8 || ?{{dunno}}
|-
|Nvidia [[Tesla (microarchitecture)|Tesla]] 2.0 (GeForce GTX 260–295)
|[[Parallel Thread Execution|PTX]] || ?{{dunno}} || 2 || ?{{dunno}}
|-
| Nvidia [[Fermi (microarchitecture)|Fermi]] (only GeForce GTX 465–480, 560 Ti, 570–590) || [[Parallel Thread Execution|PTX]] |
| {{1/4 }}<br/>(locked by driver, <br/>1 in hardware) || 2 || 0
|-
| Nvidia [[Fermi (microarchitecture)|Fermi]] (only Quadro 600–2000) || [[Parallel Thread Execution|PTX]] |
| {{frac|1/|8}} || 2 || 0
|-
| Nvidia [[Fermi (microarchitecture)|Fermi]] (only Quadro 4000–7000, Tesla) || [[Parallel Thread Execution|PTX]] |
| 1 || 2 || 0
|-
| Nvidia [[Kepler (microarchitecture)|Kepler]] (GeForce (except Titan and Titan Black), Quadro (except K6000), Tesla K10) || [[Parallel Thread Execution|PTX]]
| {{frac| 1/|12 }}<br/>(for [[GeForce 700 series|GK110]]: <br/>locked by driver, <br/>{{2/3}} in hardware) || 2 || 0
|-
| Nvidia [[Kepler (microarchitecture)|Kepler]] (GeForce GTX Titan and Titan Black, Quadro K6000, Tesla (except K10)) || [[Parallel Thread Execution|PTX]] |
| {{2/3}} || 2 || 0
|-
|{{plainlist|
Line 250 ⟶ 276:
* Nvidia [[Pascal (microarchitecture)|Pascal]] (all except Quadro GP100 and Tesla P100)
}}
| [[Parallel Thread Execution|PTX]] || {{frac|1/|16}} || 2 || {{frac|1/|32}}
|-
| Nvidia [[Pascal (microarchitecture)|Pascal]] (only Quadro GP100 and Tesla P100) || [[Parallel Thread Execution|PTX]] || 1 || 2 || 4
|-
| Nvidia [[Volta (microarchitecture)|Volta]]<ref name="Nvidia Volta">{{cite web|title=Inside Volta: The World's Most Advanced Data Center GPU|date=May 10, 2017 |url=https://devblogs.nvidia.com/inside-volta/}}</ref> || [[Parallel Thread Execution|PTX]] |
| 1 || 2 &nbsp;([[FP32]]) + 2 &nbsp;([[Int32|INT32]]) || 16
|-
| Nvidia [[Turing (microarchitecture)|Turing]] (only GeForce [[GeForce 16 series|16XX]]) || [[Parallel Thread Execution|PTX]] |
| {{frac|1/|16}} || 2&nbsp;(FP32) + 2&nbsp;(INT32) || 4
|-
| Nvidia [[Turing (microarchitecture)|Turing]] (all except GeForce [[GeForce 16 series|16XX]]) || [[Parallel Thread Execution|PTX]] |
| {{frac|1/|16}} || 2&nbsp;(FP32) + 2&nbsp;(INT32) || 16
|-
| Nvidia [[Ampere (microarchitecture)|Ampere]]<ref name="Nvidia Ampere 1">{{cite web|title=NVIDIA Ampere Architecture In-Depth |date=May 14, 2020 |url=https://devblogs.nvidia.com/nvidia-ampere-architecture-in-depth/}}</ref><ref name="Nvidia Ampere 2">{{Cite web |title=NVIDIA A100 GPUs Power the Modern Data Center |website=NVIDIA |url=https://www.nvidia.com/en-us/data-center/a100/}}</ref> (only Tesla A100/A30) || [[Parallel Thread Execution|PTX]] |
| 2 || 2&nbsp;(FP32) + 2&nbsp;(INT32) || 32
|-
|{{plainlist|
Line 266 ⟶ 296:
* Nvidia [[Ada Lovelace (microarchitecture)|Ada Lovelace]]
}}
|| [[Parallel Thread Execution|PTX]] || {{frac|1/|32}} || 2&nbsp;(FP32) + 0&nbsp;(INT32) <br/>''or'' <br/>1&nbsp;(FP32) + 1&nbsp;(INT32) || 8
|-
! colspan="5" |AMD GPU
|-
| AMD [[TeraScale (microarchitecture)#TeraScale 1|TeraScale 1]] ([[Radeon HD 4000 series]])
|[[TeraScale (microarchitecture)#TeraScale 1|TeraScale 1]] || 0.4 || 2 || ?{{dunno}}
|-
| AMD [[TeraScale (microarchitecture)#TeraScale 2|TeraScale 2]] ([[Radeon HD 5000 series]])
|[[TeraScale (microarchitecture)#TeraScale 2|TeraScale 2]] || 1 || 2 || ?{{dunno}}
|-
| AMD [[TeraScale (microarchitecture)#TeraScale 3|TeraScale 3]] ([[Radeon HD 6000 series]])
|[[TeraScale (microarchitecture)#TeraScale 3|TeraScale 3]] || 1 || 4 || ?{{dunno}}
|-
| AMD [[Graphics Core Next|GCN]] (only Radeon Pro W 8100–9100) || [[Graphics Core Next|GCN]] || 1 || 2 || ?{{dunno}}
|-
| AMD [[Graphics Core Next|GCN]] (all except Radeon Pro W 8100–9100, Vega 10–20) || [[Graphics Core Next|GCN]] || {{frac|1/|8}} || 2 || 4
|-
| AMD [[AMD RX Vega series|GCN Vega 10]] || [[Graphics Core Next|GCN]] || {{frac|1/|8}} || 2 || 4
|-
| AMD [[AMD RX Vega series|GCN Vega 20]] (only Radeon VII) || [[Graphics Core Next|GCN]] |
| {{1/2 }}<br/>(locked by driver, <br/>1 in hardware) || 2 || 4
|-
| AMD [[AMD RX Vega series|GCN Vega 20]] (only Radeon Instinct MI50 / MI60 and Radeon Pro VII) || [[Graphics Core Next|GCN]] |
| 1 || 2 || 4
|-
|{{plainlist|
Line 293 ⟶ 325:
* AMD [[RDNA 2]]
}}
| [[AMD RDNA Architecture|RDNA]] |
| {{frac|1/|8}} || 2 || 4
|-
| AMD RDNA3 || [[AMD RDNA Architecture|RDNA]] |
| {{frac|1/|8}}? || 4 || 8?
|-
| AMD [[AMD CDNA Architecture|CDNA]] || [[AMD CDNA Architecture|CDNA]] |
| 1 || 4 <br/>(Tensor)<ref name="AMD">{{cite web|url=https://www.amd.com/en/products/server-accelerators/instinct-mi100|title=AMD Instinct MI100 Accelerator}}</ref> || 16
|-
| AMD [[AMD CDNA Architecture|CDNA 2]] || [[AMD CDNA Architecture|CDNA 2]] |
| 4 <br/>(Tensor) || 4 <br/>(Tensor) || 16
|-
! colspan="5" |Intel GPU
|-
| Intel Xe-LP (Iris Xe MAX)<ref name="intel.com">{{cite web|url=https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-the-xe-hpg-architecture.html|title=Introduction to the Xe-HPG Architecture}}</ref> || Xe |
| {{1/2}}? || 2 || 4
|-
| Intel Xe-HPG (Arc Alchemist)<ref name="intel.com"/> || Xe || 0 || 2 || 16
Line 309 ⟶ 346:
| Intel Xe-HPC (Ponte Vecchio)<ref name="allinfo.space">{{cite web|url=https://allinfo.space/2022/11/09/intel-data-center-gpu-max-ponte-vecchio-starts-in-3-variants-for-supercomputers/|title=Intel Data Center GPU Max|date=November 9, 2022 }}</ref> || Xe || 2 || 2 || 32
|-
| Intel Xe2 (Arc Battlemage) || Xe2 || {{frac|1/|8}} || 2 || 16
|-
! colspan="5" |Qualcomm GPU
Line 321 ⟶ 358:
! colspan="5" |Graphcore
|-
| Graphcore Colossus GC2<ref name="Source 2">{{cite web|url=https://www.youtube.com/watch?v=2IOyQEIlN6Y&t=1361|title=250 TFLOPs/s for two chips with FP16 mixed precision|website=youtube.com|date=October 26, 2018 }}</ref><ref name="Source 3">Archived at [https://ghostarchive.org/varchive/youtube/20211211/7XtBZ4Hsi_M Ghostarchive]{{cbignore}} and the [https://web.archive.org/web/20180119094342/https://www.youtube.com/watch?v=7XtBZ4Hsi_M&gl=US&hl=en Wayback Machine]{{cbignore}}: {{cite web|url=https://www.youtube.com/watch?v=7XtBZ4Hsi_M&t=2208|title=Estimation via power consumption that FP32 is 1/4 of FP16 and that clock frequency is below 1.5GHz|website=youtube.com|date=October 25, 2017 }}{{cbignore}}</ref> |
| ?{{dunno}} || 0 || 16 || 64
|-
|{{plainlist|
* Graphcore Colossus GC200 Mk2<ref name="Source 4">Archived at [https://ghostarchive.org/varchive/youtube/20211211/_zvU0uwIafQ Ghostarchive]{{cbignore}} and the [https://web.archive.org/web/20200716143430/https://www.youtube.com/watch?v=_zvU0uwIafQ Wayback Machine]{{cbignore}}: {{cite web|url=https://www.youtube.com/watch?v=_zvU0uwIafQ|title=Introducing Graphcore's Mk2 IPU systems|website=youtube.com|date=July 15, 2020 }}{{cbignore}}</ref>
'''* '''Graphcore Bow-2000<ref name="Source 5">{{cite web|url=https://docs.graphcore.ai/projects/bow-2000-datasheet/en/latest/product-description.html#technical-specifications|title=Bow-2000 IPU-Machine|website=docs.graphcore.ai/}}{{cbignore}}</ref>
}}
|| ?{{dunno}} || 0 || 32 || 128
|-
! colspan="5" |[[Supercomputer]]
Line 333 ⟶ 371:
|[[ENIAC]] @ 100&nbsp;kHz in 1945
|
|0.004<ref>ENIAC]] @ 100&nbsp;kHz with 385 Flops {{Cite web|title=Computers of Yore|url=https://www.clear.rice.edu/comp201/08-spring/lectures/lec02/computers.shtml|access-date=2021-02-26|website=clear.rice.edu}}</ref> <br/>(~0.00000003 FLOPS/[[Watt|W]])
|
|
Line 345 ⟶ 383:
|60-bit processor @ 10 MHz in [[CDC 6600]] in 1964
|
|0.3 <br/>(FP60)
|
|
Line 351 ⟶ 389:
|60-bit processor @ 10 MHz in [[CDC 7600]] in 1967
|
|1.0 <br/>(FP60)
|
|
Line 357 ⟶ 395:
|[[Cray-1]] @ 80 MHz in 1976
|
|2 <br/>(700 FLOPS/W)
|
|