Revision as of 22:29, 11 February 2025 edit FrescoBot (talk \| contribs) Bots 1,152,845 edits m Bot: link syntax and minor changes ← Previous edit		Revision as of 00:13, 12 February 2025 edit undo Cedar101 (talk \| contribs) Extended confirmed users 18,554 edits m →Floating-point operations per clock cycle for various processors: right align Next edit →
Line 77: == Floating-point operations per clock cycle for various processors == {{Table alignment}} {\| class="wikitable sortable"▼ {{sort-under}} ▲{\| class="wikitable sortable sort-under col3right col4right col5right" \|+ Floating-point operations per clock cycle per core<ref>{{Cite web \| url=https://en.wikichip.org/wiki/flops \| title=Floating-Point Operations Per Second (FLOPS)}}</ref> ! scope="col" \| Microarchitecture Line 89 ⟶ 91: \|[[Intel 80486]] \|[[x87]] (32-bit) \| {{dunno}} \|? \|0.128<ref name=":1" /> \| {{dunno}} \|? \|- \|{{plainlist\| Line 98 ⟶ 100: }} \|[[x87]] (32-bit) \| {{dunno}} \|? \|0.5<ref name=":1">{{Cite web\|title=home.iae.nl \|url=http://home.iae.nl/users/mhx/flops_4.tbl\|access-date=\|website=}}</ref> \| {{dunno}} \|? \|- \|{{plainlist\| Line 107 ⟶ 109: }} \|[[MMX (instruction set)\|MMX]] (64-bit) \| {{dunno}} \|? \|1<ref name=":0">{{Cite web\|title=Computing Power throughout History\|url=https://www.alternatewars.com/BBOW/Computing/Computing_Power.htm\|access-date=2021-02-13\|website=alternatewars.com}}</ref> \| {{dunno}} \|? \|- \|Intel [[P6 (microarchitecture)\|P6]] [[Pentium III]] \|[[Streaming SIMD Extensions\|SSE]] (64-bit) \| {{dunno}} \|? \|2<ref name=":0" /> \| {{dunno}} \|? \|- \|Intel [[NetBurst]] [[Pentium 4]] (Willamette, Northwood) Line 121 ⟶ 123: \|2 \|4 \| {{dunno}} \|? \|- \|Intel [[P6 (microarchitecture)\|P6]] [[Pentium M]] Line 127 ⟶ 129: \|1 \|2 \| {{dunno}} \|? \|- \|{{plainlist\| Line 137 ⟶ 139: \|2 \|4 \| {{dunno}} \|? \|- \|{{plainlist\| Line 147 ⟶ 149: [[SSE4]] (128-bit) }} \| 4 \|\| 8 \|\| ?{{dunno}} \|- \| Intel [[Atom (system on chip)\|Atom]] ([[Bonnell (microarchitecture)\|Bonnell]], [[Saltwell (microarchitecture)\|Saltwell]], [[Silvermont (microarchitecture)\|Silvermont]] and [[Goldmont]]) \|\| [[SSE3]] (128-bit) \| \| 2 \|\| 4 \|\| ?{{dunno}} \|- \| Intel [[Sandy Bridge]] ([[Sandy Bridge]], [[Ivy Bridge (microarchitecture)\|Ivy Bridge]]) \|\| [[Advanced Vector Extensions\|AVX]] (256-bit) \| \| 8 \|\| 16 \|\| 0 \|- \|{{plainlist\| Line 157 ⟶ 161: Intel [[Skylake (microarchitecture)\|Skylake]] ([[Skylake (microarchitecture)\|Skylake]], [[Kaby Lake]], [[Coffee Lake]], [[Comet Lake (microprocessor)\|Comet Lake]], [[Whiskey Lake (microarchitecture)\|Whiskey Lake]], [[Amber Lake (microarchitecture)\|Amber Lake]]) }} \|[[Advanced Vector Extensions\|AVX2]] & [[FMA instruction set\|FMA]] (256-bit) \| \| 16 \|\| 32 \|\| 0 \|- \| Intel [[Xeon Phi]] ([[Knights Corner]]) \|\| [[Initial Many Core Instructions\|IMCI]] (512-bit) \| \| 16 \|\| 32 \|\| 0 \|- \|{{plainlist\| Line 166 ⟶ 172: Intel [[Ice Lake (microprocessor)\|Ice Lake]], [[Tiger Lake (microprocessor)\|Tiger Lake]] and [[Rocket Lake]] }} \| [[Advanced Vector Extensions\|AVX-512]] & [[FMA instruction set\|FMA]] (512-bit) \|\| 32 \|\| 64 \|\| 0 \|- ! colspan="5" \|AMD CPU Line 185 ⟶ 191: [[Advanced Vector Extensions\|AVX]] (128-bit) (Bulldozer, Steamroller) [[AVX2]] (128-bit) (Excavator) [[FMA instruction set\|FMA3]] (Bulldozer)<ref>{{Cite web\|url=https://developer.amd.com/wordpress/media/2012/10/New-Bulldozer-and-Piledriver-Instructions.pdf\|title=New instructions support for Bulldozer (FMA3) and Piledriver (FMA3+4 and CVT, BMI, ~~TBM~~TB M)}}</ref> [[FMA instruction set\|FMA3/4]] (Piledriver, Excavator) }} Line 194 ⟶ 201: AMD [[Zen+]]<ref name="tpeak_jos"/><ref>{{Cite web \| url=http://www.agner.org/optimize/blog/read.php?i=838 \| title=Agner's CPU blog - Test results for AMD Ryzen}}</ref><ref>https://arstechnica.com/gadgets/2017/03/amds-moment-of-zen-finally-an-architecture-that-can-compete/2/ "each core now has a pair of 128-bit FMA units of its own"</ref><ref>{{cite conference \|url=https://www.hotchips.org/wp-content/uploads/hc_archives/hc28/HC28.23-Tuesday-Epub/HC28.23.90-High-Perform-Epub/HC28.23.930-X86-core-MikeClark-AMD-final_v2-28.pdf#page=7 \|title=A New x86 Core Architecture for the Next Generation of Computing \|author=Mike Clark \|date=August 23, 2016 \|publisher=AMD \|conference=HotChips 28 \|access-date=October 8, 2017 \|archive-date=July 31, 2020 \|archive-url=https://web.archive.org/web/20200731171730/https://www.hotchips.org/wp-content/uploads/hc_archives/hc28/HC28.23-Tuesday-Epub/HC28.23.90-High-Perform-Epub/HC28.23.930-X86-core-MikeClark-AMD-final_v2-28.pdf#page=7 \|url-status=dead }} [https://images.anandtech.com/doci/10591/HC28.AMD.Mike%20Clark.final-page-007.jpg page 7]</ref> (Ryzen 2000 series, Threadripper 2000 series) }} \| [[Advanced Vector Extensions\|AVX2]] & [[FMA instruction set\|FMA]] (128-bit, 256-bit decoding)<ref>{{Cite web \|title=The microarchitecture of Intel and AMD CPUs \|url=https://www.agner.org/optimize/microarchitecture.pdf}}</ref> \| \| 8 \|\| 16 \|\| 0 \|- \|{{plainlist\| Line 200 ⟶ 208: AMD [[Zen 3]] (Ryzen 5000 series, Epyc [[Epyc\|Milan]]) }} \| [[Advanced Vector Extensions\|AVX2]] & [[FMA instruction set\|FMA]] (256-bit) \| \| 16 \|\| 32 \|\| 0 \|- ! colspan="5" \|ARM CPU \|- \| ARM Cortex-A7, A9, A15 \|\| [[ARM architecture\|ARMv7]] \| \| 1 \|\| 8 \|\| 0 \|- \| ARM Cortex-A32, A35 \|\| [[ARM architecture\|ARMv8]] \| \| 2 \|\| 8 \|\| 0 \|- \| [[ARM Cortex-A53]], [[ARM Cortex-A55\|A55]], [[ARM Cortex-A57\|A57]],<ref name="tpeak_jos"/> [[ARM Cortex-A72\|A72]], [[ARM Cortex-A73\|A73]], [[ARM Cortex-A75\|A75]] \|\| [[ARM architecture\|ARMv8]] \| \| 4 \|\| 8 \|\| 0 \|- \| [[ARM Cortex-A76]], [[ARM Cortex-A77\|A77]], [[ARM Cortex-A78\|A78]]\|\| [[ARM architecture\|ARMv8]] \| \| 8 \|\| 16 \|\| 0 \|- \| [[ARM Cortex-X1]] \|\| [[ARM architecture\|ARMv8]] \| \| 16 \|\| 32 \|\| ?{{dunno}} \|- \| Qualcomm [[Krait (CPU)\|Krait]] \|\| [[ARM architecture\|ARMv8]] \| \| 1 \|\| 8 \|\| 0 \|- \| Qualcomm [[Kryo]] (1xx - 3xx) \|\| [[ARM architecture\|ARMv8]] \| \| 2 \|\| 8 \|\| 0 \|- \| Qualcomm [[Kryo]] (4xx - 5xx) \|\| [[ARM architecture\|ARMv8]] \| \| 8 \|\| 16 \|\| 0 \|- \| Samsung [[Exynos]] M1 and M2 \|\| [[ARM architecture\|ARMv8]] \| \| 2 \|\| 8 \|\| 0 \|- \| Samsung [[Exynos]] M3 and M4 \|\| [[ARM architecture\|ARMv8]] \| \| 3 \|\| 12 \|\| 0 \|- \| IBM PowerPC [[IBM A2\|A2]] (Blue Gene/Q) \|\| ?{{dunno}} \| \| 8 \|\| 8 <br/>(as FP64) \|\| 0 \|- \| [[Hitachi SH-4]]<ref>{{cite journal \|title=Entertainment Systems and High-Performance Processor SH-4 \|journal=Hitachi Review \|date=1999 \|volume=48 \|issue=2 \|pages=58–63 \|publisher=[[Hitachi]] \|url=https://retrocdn.net/images/f/fa/Entertainment_Systems_and_High-Performance_Processor_SH-4.pdf \|access-date=June 21, 2019}}</ref><ref>{{cite web \|title=SH-4 Next-Generation DSP Architecture for VoIP \|url=https://retrocdn.net/images/b/b3/SH-4_Next-Generation_DSP_Architecture.pdf \|publisher=[[Hitachi]] \|year=2000 \|access-date=June 21, 2019}}</ref> \|\| [[Hitachi SH-4\|SH-4]] \| \| 1 \|\| 7 \|\| 0 \|- ! colspan="5" \|Nvidia GPU \|- \|Nvidia [[Curie (microarchitecture)\|Curie]] ([[GeForce 6 series]] and [[GeForce 7 series]]) \|[[Parallel Thread Execution\|PTX]] \|\| ?{{dunno}} \|\| 8 \|\| ?{{dunno}} \|- \|Nvidia [[Tesla (microarchitecture)\|Tesla]] 2.0 (GeForce GTX 260–295) \|[[Parallel Thread Execution\|PTX]] \|\| ?{{dunno}} \|\| 2 \|\| ?{{dunno}} \|- \| Nvidia [[Fermi (microarchitecture)\|Fermi]] (only GeForce GTX 465–480, 560 Ti, 570–590) \|\| [[Parallel Thread Execution\|PTX]] \| \| {{1/4 }}<br/>(locked by driver, <br/>1 in hardware) \|\| 2 \|\| 0 \|- \| Nvidia [[Fermi (microarchitecture)\|Fermi]] (only Quadro 600–2000) \|\| [[Parallel Thread Execution\|PTX]] \| \| {{frac\|1/\|8}} \|\| 2 \|\| 0 \|- \| Nvidia [[Fermi (microarchitecture)\|Fermi]] (only Quadro 4000–7000, Tesla) \|\| [[Parallel Thread Execution\|PTX]] \| \| 1 \|\| 2 \|\| 0 \|- \| Nvidia [[Kepler (microarchitecture)\|Kepler]] (GeForce (except Titan and Titan Black), Quadro (except K6000), Tesla K10) \|\| [[Parallel Thread Execution\|PTX]] \| {{frac\| 1/\|12 }}<br/>(for [[GeForce 700 series\|GK110]]: <br/>locked by driver, <br/>{{2/3}} in hardware) \|\| 2 \|\| 0 \|- \| Nvidia [[Kepler (microarchitecture)\|Kepler]] (GeForce GTX Titan and Titan Black, Quadro K6000, Tesla (except K10)) \|\| [[Parallel Thread Execution\|PTX]] \| \| {{2/3}} \|\| 2 \|\| 0 \|- \|{{plainlist\| Line 250 ⟶ 276: Nvidia [[Pascal (microarchitecture)\|Pascal]] (all except Quadro GP100 and Tesla P100) }} \| [[Parallel Thread Execution\|PTX]] \|\| {{frac\|1/\|16}} \|\| 2 \|\| {{frac\|1/\|32}} \|- \| Nvidia [[Pascal (microarchitecture)\|Pascal]] (only Quadro GP100 and Tesla P100) \|\| [[Parallel Thread Execution\|PTX]] \|\| 1 \|\| 2 \|\| 4 \|- \| Nvidia [[Volta (microarchitecture)\|Volta]]<ref name="Nvidia Volta">{{cite web\|title=Inside Volta: The World's Most Advanced Data Center GPU\|date=May 10, 2017 \|url=https://devblogs.nvidia.com/inside-volta/}}</ref> \|\| [[Parallel Thread Execution\|PTX]] \| \| 1 \|\| 2  ([[FP32]]) + 2  ([[Int32\|INT32]]) \|\| 16 \|- \| Nvidia [[Turing (microarchitecture)\|Turing]] (only GeForce [[GeForce 16 series\|16XX]]) \|\| [[Parallel Thread Execution\|PTX]] \| \| {{frac\|1/\|16}} \|\| 2 (FP32) + 2 (INT32) \|\| 4 \|- \| Nvidia [[Turing (microarchitecture)\|Turing]] (all except GeForce [[GeForce 16 series\|16XX]]) \|\| [[Parallel Thread Execution\|PTX]] \| \| {{frac\|1/\|16}} \|\| 2 (FP32) + 2 (INT32) \|\| 16 \|- \| Nvidia [[Ampere (microarchitecture)\|Ampere]]<ref name="Nvidia Ampere 1">{{cite web\|title=NVIDIA Ampere Architecture In-Depth \|date=May 14, 2020 \|url=https://devblogs.nvidia.com/nvidia-ampere-architecture-in-depth/}}</ref><ref name="Nvidia Ampere 2">{{Cite web \|title=NVIDIA A100 GPUs Power the Modern Data Center \|website=NVIDIA \|url=https://www.nvidia.com/en-us/data-center/a100/}}</ref> (only Tesla A100/A30) \|\| [[Parallel Thread Execution\|PTX]] \| \| 2 \|\| 2 (FP32) + 2 (INT32) \|\| 32 \|- \|{{plainlist\| Line 266 ⟶ 296: * Nvidia [[Ada Lovelace (microarchitecture)\|Ada Lovelace]] }} \|\| [[Parallel Thread Execution\|PTX]] \|\| {{frac\|1/\|32}} \|\| 2 (FP32) + 0 (INT32) <br/>''or'' <br/>1 (FP32) + 1 (INT32) \|\| 8 \|- ! colspan="5" \|AMD GPU \|- \| AMD [[TeraScale (microarchitecture)#TeraScale 1\|TeraScale 1]] ([[Radeon HD 4000 series]]) \|[[TeraScale (microarchitecture)#TeraScale 1\|TeraScale 1]] \|\| 0.4 \|\| 2 \|\| ?{{dunno}} \|- \| AMD [[TeraScale (microarchitecture)#TeraScale 2\|TeraScale 2]] ([[Radeon HD 5000 series]]) \|[[TeraScale (microarchitecture)#TeraScale 2\|TeraScale 2]] \|\| 1 \|\| 2 \|\| ?{{dunno}} \|- \| AMD [[TeraScale (microarchitecture)#TeraScale 3\|TeraScale 3]] ([[Radeon HD 6000 series]]) \|[[TeraScale (microarchitecture)#TeraScale 3\|TeraScale 3]] \|\| 1 \|\| 4 \|\| ?{{dunno}} \|- \| AMD [[Graphics Core Next\|GCN]] (only Radeon Pro W 8100–9100) \|\| [[Graphics Core Next\|GCN]] \|\| 1 \|\| 2 \|\| ?{{dunno}} \|- \| AMD [[Graphics Core Next\|GCN]] (all except Radeon Pro W 8100–9100, Vega 10–20) \|\| [[Graphics Core Next\|GCN]] \|\| {{frac\|1/\|8}} \|\| 2 \|\| 4 \|- \| AMD [[AMD RX Vega series\|GCN Vega 10]] \|\| [[Graphics Core Next\|GCN]] \|\| {{frac\|1/\|8}} \|\| 2 \|\| 4 \|- \| AMD [[AMD RX Vega series\|GCN Vega 20]] (only Radeon VII) \|\| [[Graphics Core Next\|GCN]] \| \| {{1/2 }}<br/>(locked by driver, <br/>1 in hardware) \|\| 2 \|\| 4 \|- \| AMD [[AMD RX Vega series\|GCN Vega 20]] (only Radeon Instinct MI50 / MI60 and Radeon Pro VII) \|\| [[Graphics Core Next\|GCN]] \| \| 1 \|\| 2 \|\| 4 \|- \|{{plainlist\| Line 293 ⟶ 325: * AMD [[RDNA 2]] }} \| [[AMD RDNA Architecture\|RDNA]] \| \| {{frac\|1/\|8}} \|\| 2 \|\| 4 \|- \| AMD RDNA3 \|\| [[AMD RDNA Architecture\|RDNA]] \| \| {{frac\|1/\|8}}? \|\| 4 \|\| 8? \|- \| AMD [[AMD CDNA Architecture\|CDNA]] \|\| [[AMD CDNA Architecture\|CDNA]] \| \| 1 \|\| 4 <br/>(Tensor)<ref name="AMD">{{cite web\|url=https://www.amd.com/en/products/server-accelerators/instinct-mi100\|title=AMD Instinct MI100 Accelerator}}</ref> \|\| 16 \|- \| AMD [[AMD CDNA Architecture\|CDNA 2]] \|\| [[AMD CDNA Architecture\|CDNA 2]] \| \| 4 <br/>(Tensor) \|\| 4 <br/>(Tensor) \|\| 16 \|- ! colspan="5" \|Intel GPU \|- \| Intel Xe-LP (Iris Xe MAX)<ref name="intel.com">{{cite web\|url=https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-the-xe-hpg-architecture.html\|title=Introduction to the Xe-HPG Architecture}}</ref> \|\| Xe \| \| {{1/2}}? \|\| 2 \|\| 4 \|- \| Intel Xe-HPG (Arc Alchemist)<ref name="intel.com"/> \|\| Xe \|\| 0 \|\| 2 \|\| 16 Line 309 ⟶ 346: \| Intel Xe-HPC (Ponte Vecchio)<ref name="allinfo.space">{{cite web\|url=https://allinfo.space/2022/11/09/intel-data-center-gpu-max-ponte-vecchio-starts-in-3-variants-for-supercomputers/\|title=Intel Data Center GPU Max\|date=November 9, 2022 }}</ref> \|\| Xe \|\| 2 \|\| 2 \|\| 32 \|- \| Intel Xe2 (Arc Battlemage) \|\| Xe2 \|\| {{frac\|1/\|8}} \|\| 2 \|\| 16 \|- ! colspan="5" \|Qualcomm GPU Line 321 ⟶ 358: ! colspan="5" \|Graphcore \|- \| Graphcore Colossus GC2<ref name="Source 2">{{cite web\|url=https://www.youtube.com/watch?v=2IOyQEIlN6Y&t=1361\|title=250 TFLOPs/s for two chips with FP16 mixed precision\|website=youtube.com\|date=October 26, 2018 }}</ref><ref name="Source 3">Archived at [https://ghostarchive.org/varchive/youtube/20211211/7XtBZ4Hsi_M Ghostarchive]{{cbignore}} and the [https://web.archive.org/web/20180119094342/https://www.youtube.com/watch?v=7XtBZ4Hsi_M&gl=US&hl=en Wayback Machine]{{cbignore}}: {{cite web\|url=https://www.youtube.com/watch?v=7XtBZ4Hsi_M&t=2208\|title=Estimation via power consumption that FP32 is 1/4 of FP16 and that clock frequency is below 1.5GHz\|website=youtube.com\|date=October 25, 2017 }}{{cbignore}}</ref> \| \| ?{{dunno}} \|\| 0 \|\| 16 \|\| 64 \|- \|{{plainlist\| * Graphcore Colossus GC200 Mk2<ref name="Source 4">Archived at [https://ghostarchive.org/varchive/youtube/20211211/_zvU0uwIafQ Ghostarchive]{{cbignore}} and the [https://web.archive.org/web/20200716143430/https://www.youtube.com/watch?v=_zvU0uwIafQ Wayback Machine]{{cbignore}}: {{cite web\|url=https://www.youtube.com/watch?v=_zvU0uwIafQ\|title=Introducing Graphcore's Mk2 IPU systems\|website=youtube.com\|date=July 15, 2020 }}{{cbignore}}</ref> '''* '''Graphcore Bow-2000<ref name="Source 5">{{cite web\|url=https://docs.graphcore.ai/projects/bow-2000-datasheet/en/latest/product-description.html#technical-specifications\|title=Bow-2000 IPU-Machine\|website=docs.graphcore.ai/}}{{cbignore}}</ref> }} \|\| ?{{dunno}} \|\| 0 \|\| 32 \|\| 128 \|- ! colspan="5" \|[[Supercomputer]] Line 333 ⟶ 371: \|[[ENIAC]] @ 100 kHz in 1945 \| \|0.004<ref>ENIAC]] @ 100 kHz with 385 Flops {{Cite web\|title=Computers of Yore\|url=https://www.clear.rice.edu/comp201/08-spring/lectures/lec02/computers.shtml\|access-date=2021-02-26\|website=clear.rice.edu}}</ref> <br/>(~0.00000003 FLOPS/[[Watt\|W]]) \| \| Line 345 ⟶ 383: \|60-bit processor @ 10 MHz in [[CDC 6600]] in 1964 \| \|0.3 <br/>(FP60) \| \| Line 351 ⟶ 389: \|60-bit processor @ 10 MHz in [[CDC 7600]] in 1967 \| \|1.0 <br/>(FP60) \| \| Line 357 ⟶ 395: \|[[Cray-1]] @ 80 MHz in 1976 \| \|2 <br/>(700 FLOPS/W) \| \|

Floating point operations per second: Difference between revisions