Floating point operations per second: Difference between revisions

Content deleted Content added
Updated to El Capitan
Tags: Visual edit Mobile edit Mobile web edit
Partial restore with corrected model
 
(39 intermediate revisions by 21 users not shown)
Line 5:
'''Floating point operations per second''' ('''FLOPS''', '''flops''' or '''flop/s''') is a measure of [[computer performance]] in [[computing]], useful in fields of scientific computations that require [[floating-point]] calculations.<ref>{{cite web |title=Understand measures of supercomputer performance and storage system capacity |url=https://kb.iu.edu/d/apeq |website=kb.iu.edu |access-date=23 March 2024}}</ref>
 
For such cases, it is a more accurate measure than measuring [[instructions per second]].{{cn|date=March 2024}}
 
==Floating-point arithmetic==
{{Anchor|multipliers}}
{| class="wikitable floatright sortable"
|+ Multipliers for flops
Line 23 ⟶ 24:
|-
| [[Giga-|giga]]FLOPS
| GFLOPS<ref>{{cite web | title = GPU GFLOPS Statistics 2007-2025: NVIDIA AMD Intel | url = https://gpus.axiomgaming.net/gflops-statistics | website = Axiom Gaming | publisher = Axiom Gaming | access-date = 14 August 2025}}</ref>
| GFLOPS
| 10<sup>9</sup>
|-
Line 55 ⟶ 56:
|-
|}
[[Floating-point arithmetic]] is needed for very large or very small [[real number]]s, or computations that require a large dynamic range. Floating-point representation is similar to scientific notation, except everythingcomputers isuse carried out in[[Binary number|base two]] (with rare exceptions), rather than [[Decimal|base ten]]. The encoding scheme stores the sign, the [[exponent]] (in base two for Cray and [[VAX]], base two or ten for [[IEEE floating point]] formats, and base 16 for [[IBM hexadecimal floating-point|IBM Floating Point Architecture]]) and the [[significand]] (number after the [[radix point]]). While several similar formats are in use, the most common is [[IEEE 754-1985|ANSI/IEEE Std. 754-1985]]. This standard defines the format for 32-bit numbers called ''single precision'', as well as 64-bit numbers called ''double precision'' and longer numbers called ''extended precision'' (used for intermediate results). Floating-point representations can support a much wider range of values than fixed-point, with the ability to represent very small numbers and very large numbers.<ref>[http://www.dspguide.com/ch4/3.htm Floating Point] Retrieved on December 25, 2009.</ref>
 
===Dynamic range and precision===
Line 72 ⟶ 73:
: <math>\text{FLOPS} = \text{cores} \times \frac{\text{cycles}}{ \text{second}} \times \frac{\text{FLOPs}}{\text{cycle}}.</math>
 
FLOPS can be recorded in different measures of precision, for example, the [[TOP500]] supercomputer list ranks computers by 64 -bit ([[double-precision floating-point format]]) operations per second, abbreviated to ''FP64''.<ref name="top500faq">{{cite web |title=FREQUENTLY ASKED QUESTIONS |url=https://www.top500.org/resources/frequently-asked-questions/ |website=top500.org |access-date=June 23, 2020}}</ref> Similar measures are available for [[Single-precision floating-point format|32-bit]] (''FP32'') and [[Half-precision floating-point format|16-bit]] (''FP16'') operations.
 
{{anchor|FLOPSforProcessors}}
 
== Floating-point operations per clock cycle for various processors ==
{{Table alignment}}
{| class="wikitable sortable"
{{sort-under}}
{| class="wikitable sortable sort-under col3right col4right col5right"
|+ Floating-point operations per clock cycle per core<ref>{{Cite web | url=https://en.wikichip.org/wiki/flops | title=Floating-Point Operations Per Second (FLOPS)}}</ref>
! scope="col" | Microarchitecture
Line 88 ⟶ 91:
|-
|[[Intel 80486]]
|[[x87]] (3280-bit)
| {{dunno}}
|?
|0.128<ref name=":1" />
| {{dunno}}
|?
|-
|{{plainlist|
Line 97 ⟶ 100:
*Intel [[P6 (microarchitecture)|P6]] [[Pentium Pro]]
}}
|[[x87]] (3280-bit)
| {{dunno}}
|?
|0.5<ref name=":1">{{Cite web|title=home.iae.nl |url=http://home.iae.nl/users/mhx/flops_4.tbl|access-date=|website=}}</ref>
| {{dunno}}
|?
|-
|{{plainlist|
Line 106 ⟶ 109:
*Intel [[P6 (microarchitecture)|P6]] [[Pentium II]]
}}
|[[MMX (instruction set)|MMXx87]] (6480-bit)
| {{dunno}}
|?
|1<ref name=":0">{{Cite web|title=Computing Power throughout History|url=https://www.alternatewars.com/BBOW/Computing/Computing_Power.htm|access-date=2021-02-13|website=alternatewars.com}}</ref>
| {{dunno}}
|?
|-
|Intel [[P6 (microarchitecture)|P6]] [[Pentium III]]
|[[Streaming SIMD Extensions|SSE]] (64-bit)
| {{dunno}}
|?
|2<ref name=":0" />
| {{dunno}}
|?
|-
|Intel [[NetBurst]] [[Pentium 4]] (Willamette, Northwood)
Line 121 ⟶ 124:
|2
|4
| {{dunno}}
|?
|-
|Intel [[P6 (microarchitecture)|P6]] [[Pentium M]]
Line 127 ⟶ 130:
|1
|2
| {{dunno}}
|?
|-
|{{plainlist|
Line 137 ⟶ 140:
|2
|4
| {{dunno}}
|?
|-
|{{plainlist|
Line 147 ⟶ 150:
*[[SSE4]] (128-bit)
}}
| 4 || 8 || ?{{dunno}}
|-
| Intel [[Atom (system on chip)|Atom]] ([[Bonnell (microarchitecture)|Bonnell]], [[Saltwell (microarchitecture)|Saltwell]], [[Silvermont (microarchitecture)|Silvermont]] and [[Goldmont]]) || [[SSE3]] (128-bit) |
| 2 || 4 || ?{{dunno}}
|-
| Intel [[Sandy Bridge]] ([[Sandy Bridge]], [[Ivy Bridge (microarchitecture)|Ivy Bridge]]) || [[Advanced Vector Extensions|AVX]] (256-bit) |
| 8 || 16 || 0
|-
|{{plainlistublist|
*| Intel [[Haswell (microarchitecture)|Haswell]]<ref name="tpeak_jos"/> ([[Haswell (microarchitecture)|Haswell]], [[Haswell (microarchitecture)|Devil's Canyon]], [[Broadwell (microarchitecture)|Broadwell]])
*| Intel [[Skylake (microarchitecture)|Skylake]] <br/>([[Skylake (microarchitecture)|Skylake]], [[Kaby Lake]], [[Coffee Lake]], [[Comet Lake (microprocessor)|Comet Lake]], [[Whiskey Lake (microarchitecture)|Whiskey Lake]], [[Amber Lake (microarchitecture)|Amber Lake]])
}}
|[[Advanced Vector Extensions|AVX2]] & [[FMA instruction set|FMA]] (256-bit) |
| 16 || 32 || 0
|-
| Intel [[Xeon Phi]] ([[Knights Corner]]) || [[Initial Many Core Instructions|IMCI]] (512-bit) |
| 16 || 32 || 0
|-
|{{plainlist|
* Intel [[Skylake (microarchitecture)|Skylake-X]] ([[Skylake (microarchitecture)|Skylake-X]], [[Cascade Lake (microarchitecture)|Cascade Lake]])
* Intel [[Xeon Phi]] ([[Knights Landing (microarchitecture)|Knights Landing]], [[Knights Mill]])
* Intel [[Ice Lake (microprocessor)|Ice Lake]], [[Tiger Lake (microprocessor)|Tiger Lake]] and [[Rocket Lake]]
}}
| [[Advanced Vector Extensions|AVX-512]] & [[FMA instruction set|FMA]] (512-bit) |
| 32 || 64 || 0
|-
! colspan="5" |AMD CPU
Line 175 ⟶ 183:
*AMD [[Jaguar (microarchitecture)|Jaguar]]
*AMD [[Puma (microarchitecture)|Puma]]
|[[Advanced Vector Extensions|AVX]] (128-bit)
}}
|[[Advanced Vector Extensions|AVX]] (128-bit)
|
| 4 || 8 || 0
|4
|8
|0
|-
|AMD [[AMD 10h|K10]]
|[[SSE4|SSE4/4a]] (128-bit) || 4 || 8 || 0
|4
|8
|0
|-
| AMD [[Bulldozer (microarchitecture)|Bulldozer]]<ref name="tpeak_jos" /> <br/>([[Piledriver (microarchitecture)|Piledriver]], [[Steamroller (microarchitecture)|Steamroller]], [[Excavator (microarchitecture)|Excavator]])
|{{plainlistublist|
*|[[Advanced Vector Extensions|AVX]] (128-bit) <br/>(Bulldozer, Steamroller)
*|[[AVX2]] (128-bit) (Excavator)
*|[[FMA instruction set|FMA3]] (Bulldozer)<ref>{{Cite web|url=https://developer.amd.com/wordpress/media/2012/10/New-Bulldozer-and-Piledriver-Instructions.pdf|title=New instructions support for Bulldozer (FMA3) and Piledriver (FMA3+4 and CVT, BMI, TBMTB M)}}</ref>
*|[[FMA instruction set|FMA3/4]] (Piledriver, Excavator)
}}
| 4 || 8 || 0
|-
|{{plainlistublist|
*|AMD [[Zen (microarchitecture)|Zen]] <br/>(Ryzen 1000 series, Threadripper 1000 series, Epyc [[Epyc|Naples]])
*|AMD [[Zen+]]<ref name="tpeak_jos"/><ref>{{Cite web | url=http://www.agner.org/optimize/blog/read.php?i=838 | title=Agner's CPU blog - Test results for AMD Ryzen}}</ref><ref>https://arstechnica.com/gadgets/2017/03/amds-moment-of-zen-finally-an-architecture-that-can-compete/2/ "each core now has a pair of 128-bit FMA units of its own"</ref><ref>{{cite conference |url=https://www.hotchips.org/wp-content/uploads/hc_archives/hc28/HC28.23-Tuesday-Epub/HC28.23.90-High-Perform-Epub/HC28.23.930-X86-core-MikeClark-AMD-final_v2-28.pdf#page=7 |title=A New x86 Core Architecture for the Next Generation of Computing |author=Mike Clark |date=August 23, 2016 |publisher=AMD |conference=HotChips 28 |access-date=October 8, 2017 |archive-date=July 31, 2020 |archive-url=https://web.archive.org/web/20200731171730/https://www.hotchips.org/wp-content/uploads/hc_archives/hc28/HC28.23-Tuesday-Epub/HC28.23.90-High-Perform-Epub/HC28.23.930-X86-core-MikeClark-AMD-final_v2-28.pdf#page=7 |url-status=dead }} [https://web.archive.org/web/20161209125020/http://images.anandtech.com/doci/10591/HC28.AMD.Mike%20Clark.final-page-007.jpg page 7]</ref> <br/>(Ryzen 2000 series, Threadripper 2000 series)
}}
| [[Advanced Vector Extensions|AVX2]] & [[FMA instruction set|FMA]]<br/>(128-bit, 256-bit decoding)<ref>{{Cite web |title=The microarchitecture of Intel and AMD CPUs |url=https://www.agner.org/optimize/microarchitecture.pdf}}</ref>
| 8 || 16 || 0
|-
|{{ublist|
|AMD [[Zen 2]]<ref name="www.youtube.com">{{cite web |url=https://www.youtube.com/watch?v=_96stDCb-mk&t=3299 |title=AMD CEO Lisa Su's COMPUTEX 2019 Keynote |archive-url=https://ghostarchive.org/varchive/youtube/20211211/_96stDCb-mk| archive-date=2021-12-11 |url-status=live |website=youtube.com|date=May 27, 2019 }}{{cbignore}}</ref><br/>(Ryzen 3000 series, Threadripper 3000 series, Epyc [[Epyc|Rome]])
|AMD [[Zen 3]]<br/>(Ryzen 5000 series, Epyc [[Epyc|Milan]])
}}
| [[Advanced Vector Extensions|AVX2]] & [[FMA instruction set|FMA]] (128-bit, 256-bit decoding)<ref>{{Cite web |title=The microarchitecture of Intel and AMD CPUs |url=https://www.agner.org/optimize/microarchitecture.pdf}}</ref> || 8 || 16 || 0
| 16 || 32 || 0
|-
|-
|{{plainlist|
|{{ublist|
*AMD [[Zen 2]]<ref name="www.youtube.com">{{cite web |url=https://www.youtube.com/watch?v=_96stDCb-mk&t=3299 |title=AMD CEO Lisa Su's COMPUTEX 2019 Keynote |archive-url=https://ghostarchive.org/varchive/youtube/20211211/_96stDCb-mk| archive-date=2021-12-11 |url-status=live |website=youtube.com|date=May 27, 2019 }}{{cbignore}}</ref> (Ryzen 3000 series, Threadripper 3000 series, Epyc [[Epyc|Rome]]))
*|AMD [[Zen 34]] <br/>(Ryzen 50007000 series, Threadripper 7000 series, Epyc [[Epyc|MilanGenoa]],[[Epyc|Bergamo]], [[Epyc|Siena]])
}}
| [[Advanced Vector Extensions|AVX-512]] & [[FMA instruction set|FMA]] (256-bit)
| 16 || 32 || 0
|-
|{{ublist|
|AMD [[Zen 5]]<ref>{{Cite web | url=https://community.amd.com/t5/server-processors/leadership-hpc-performance-with-5th-generation-amd-epyc/ba-p/739498 | title=Leadership HPC Performance with 5th Generation AMD EPYC Processors}}</ref><br/>(Ryzen 9000 series, Threadripper 9000 series, Epyc [[Epyc|Turin]])
}}
| [[Advanced Vector Extensions|AVX2AVX-512]] & [[FMA instruction set|FMA]] (256512-bit) |
| 1632 || 3264 || 0
|-
! colspan="5" |ARM CPU
|-
| ARM Cortex-A7, A9, A15 || [[ARM architecture|ARMv7]] |
| 1 || 8 || 0
|-
| ARM Cortex-A32, A35 || [[ARM architecture|ARMv8]] |
| 2 || 8 || 0
|-
| [[ARM Cortex-A53]], [[ARM Cortex-A55|A55]], [[ARM Cortex-A57|A57]],<ref name="tpeak_jos"/> [[ARM Cortex-A72|A72]], [[ARM Cortex-A73|A73]], [[ARM Cortex-A75|A75]] || [[ARM architecture|ARMv8]] |
| 4 || 8 || 0
|-
| [[ARM Cortex-A76]], [[ARM Cortex-A77|A77]], [[ARM Cortex-A78|A78]]|| [[ARM architecture|ARMv8]] |
| 8 || 16 || 0
|-
| [[ARM Cortex-X1]] || [[ARM architecture|ARMv8]]
| 16 || 32 || {{dunno}}
|[[ARM architecture|ARMv8]]
|16
|32
|?
|-
| Qualcomm [[Krait (CPU)|Krait]] || [[ARM architecture|ARMv8]] |
| 1 || 8 || 0
|-
| Qualcomm [[Kryo]] (1xx - 3xx) || [[ARM architecture|ARMv8]] |
| 2 || 8 || 0
|-
| Qualcomm [[Kryo]] (4xx - 5xx) || [[ARM architecture|ARMv8]] |
| 8 || 16 || 0
|-
| Samsung [[Exynos]] M1 and M2 || [[ARM architecture|ARMv8]] |
| 2 || 8 || 0
|-
| Samsung [[Exynos]] M3 and M4 || [[ARM architecture|ARMv8]] |
| 3 || 12 || 0
|-
| IBM PowerPC [[IBM A2|A2]] (Blue Gene/Q) || ?{{dunno}} |
| 8 || 8 <br/>(as FP64) || 0
|-
| [[Hitachi SH-4]]<ref>{{cite journal |title=Entertainment Systems and High-Performance Processor SH-4 |journal=Hitachi Review |date=1999 |volume=48 |issue=2 |pages=58–63 |publisher=[[Hitachi]] |url=https://retrocdn.net/images/f/fa/Entertainment_Systems_and_High-Performance_Processor_SH-4.pdf |access-date=June 21, 2019}}</ref><ref>{{cite web |title=SH-4 Next-Generation DSP Architecture for VoIP |url=https://retrocdn.net/images/b/b3/SH-4_Next-Generation_DSP_Architecture.pdf |publisher=[[Hitachi]] |year=2000 |access-date=June 21, 2019}}</ref> || [[Hitachi SH-4|SH-4]] |
| 1 || 7 || 0
|-
! colspan="5" |Nvidia GPU
|-
|Nvidia [[Curie (microarchitecture)|Curie]] ([[GeForce 6 series]] and [[GeForce 7 series]])
|[[Parallel Thread Execution|PTX]] || {{dunno}} || 8 || {{dunno}}
|?
|8
|?
|-
|Nvidia [[Tesla (microarchitecture)|Tesla]] 2.0 (GeForce GTX 260–295)
|[[Parallel Thread Execution|PTX]] || {{dunno}} || 2 || {{dunno}}
|?
|2
|?
|-
| Nvidia [[Fermi (microarchitecture)|Fermi]]
(only GeForce GTX 465–480, 560 Ti, 570–590) |
| [[Parallel Thread Execution|PTX]] |
| {{1/4 }}<br/>(locked by driver, <br/>1 in hardware) || 2 || 0
|-
| Nvidia [[Fermi (microarchitecture)|Fermi]]
(only Quadro 600–2000) |
| [[Parallel Thread Execution|PTX]]
| {{frac| 1/|8}} || 2 || 0
|-
| Nvidia [[Fermi (microarchitecture)|Fermi]]
(only Quadro 4000–7000, Tesla) |
| [[Parallel Thread Execution|PTX]] |
| 1 || 2 || 0
|-
| Nvidia [[Kepler (microarchitecture)|Kepler]]
(GeForce (except Titan and Titan Black), Quadro (except K6000), Tesla K10) |
| [[Parallel Thread Execution|PTX]]
| {{frac| 1|12}}<br/12 >(for [[GeForce 700 series|GK110]]: <br/>locked by driver, <br/>{{2/3}} in hardware) || 2 || 0
|-
| Nvidia [[Kepler (microarchitecture)|Kepler]]
(GeForce GTX Titan and Titan Black, Quadro K6000, Tesla (except K10)) |
| [[Parallel Thread Execution|PTX]] |
| {{2/3}} || 2 || 0
|-
|{{plainlistublist|
*| Nvidia [[Maxwell (microarchitecture)|Maxwell]]
*| Nvidia [[Pascal (microarchitecture)|Pascal]] <br/>(all except Quadro GP100 and Tesla P100)
}}
| [[Parallel Thread Execution|PTX]] || {{frac|1/|16}} || 2 || {{frac|1/|32}}
|-
| Nvidia [[Pascal (microarchitecture)|Pascal]] (only Quadro GP100 and Tesla P100) || [[Parallel Thread Execution|PTX]] || 1 || 2 || 4
|-
| Nvidia [[Volta (microarchitecture)|Volta]]<ref name="Nvidia Volta">{{cite web|title=Inside Volta: The World's Most Advanced Data Center GPU|date=May 10, 2017 |url=https://devblogs.nvidia.com/inside-volta/}}</ref> || [[Parallel Thread Execution|PTX]] |
| 1 || 2 &nbsp;([[FP32]]) + 2 &nbsp;([[Int32|INT32]]) || 16
|-
| Nvidia [[Turing (microarchitecture)|Turing]] (only GeForce [[GeForce 16 series|16XX]]) || [[Parallel Thread Execution|PTX]] |
| {{frac|1/|16}} || 2&nbsp;(FP32) + 2&nbsp;(INT32) || 4
|-
| Nvidia [[Turing (microarchitecture)|Turing]] (all except GeForce [[GeForce 16 series|16XX]]) || [[Parallel Thread Execution|PTX]] |
| {{frac|1/|16}} || 2&nbsp;(FP32) + 2&nbsp;(INT32) || 16
|-
| Nvidia [[Ampere (microarchitecture)|Ampere]]<ref name="Nvidia Ampere 1">{{cite web|title=NVIDIA Ampere Architecture In-Depth |date=May 14, 2020 |url=https://devblogs.nvidia.com/nvidia-ampere-architecture-in-depth/}}</ref><ref name="Nvidia Ampere 2">{{Cite web |title=NVIDIA A100 GPUs Power the Modern Data Center |website=NVIDIA |url=https://www.nvidia.com/en-us/data-center/a100/}}</ref> (only Tesla A100/A30) || [[Parallel Thread Execution|PTX]] |
| 2 || 2&nbsp;(FP32) + 2&nbsp;(INT32) || 32
|-
|{{plainlist|
| Nvidia [[Ampere (microarchitecture)|Ampere]] (all GeForce and Quadro, Tesla A40/A10) || [[Parallel Thread Execution|PTX]] || 1/32 || 2&nbsp;(FP32) + 0&nbsp;(INT32) ''or'' 1&nbsp;(FP32) + 1&nbsp;(INT32) || 8
* Nvidia [[Ampere (microarchitecture)|Ampere]] (all GeForce and Quadro, Tesla A40/A10)
* Nvidia [[Ada Lovelace (microarchitecture)|Ada Lovelace]]
}}
| [[Parallel Thread Execution|PTX]] || {{frac|1|32}} || {{nowrap|2&nbsp;(FP32) + 0&nbsp;(INT32)}}<br/>''or''<br/>{{nowrap|1&nbsp;(FP32) + 1&nbsp;(INT32)}} || 8
|-
| Nvidia [[Hopper (microarchitecture)|Hopper]] || [[Parallel Thread Execution|PTX]] || 2 || 2&nbsp;(FP32) + 1&nbsp;(INT32) || 32
|-
! colspan="5" |AMD GPU
|-
| AMD [[TeraScale (microarchitecture)#TeraScale 1|TeraScale 1]] ([[Radeon HD 4000 series]])
|[[TeraScale (microarchitecture)#TeraScale 1|TeraScale 1]] || 0.4 || 2 || {{dunno}}
|0.4
|2
|?
|-
| AMD [[TeraScale (microarchitecture)#TeraScale 2|TeraScale 2]] ([[Radeon HD 5000 series]])
|[[TeraScale (microarchitecture)#TeraScale 2|TeraScale 2]] || 1 || 2 || {{dunno}}
|1
|2
|?
|-
| AMD [[TeraScale (microarchitecture)#TeraScale 3|TeraScale 3]] ([[Radeon HD 6000 series]])
|[[TeraScale (microarchitecture)#TeraScale 3|TeraScale 3]] || 1 || 4 || {{dunno}}
|1
|4
|?
|-
| AMD [[Graphics Core Next|GCN]] <br/>(only Radeon Pro W 8100–9100) |
| [[Graphics Core Next|GCN]] || 1 || 2 || ?{{dunno}}
|-
| AMD [[Graphics Core Next|GCN]] <br/>(all except Radeon Pro W 8100–9100, Vega 10–20) |
| [[Graphics Core Next|GCN]] || {{frac|1/|8}} || 2 || 4
|-
| AMD [[AMD RX Vega series|GCN Vega 10]] || [[Graphics Core Next|GCN]] || {{frac|1/|8}} || 2 || 4
|-
| AMD [[AMD RX Vega series|GCN Vega 20]] <br/>(only Radeon VII) || [[Graphics Core Next|GCN]] |
| {{1/2 }}<br/>(locked by driver, <br/>1 in hardware) || 2 || 4
|-
| AMD [[AMD RX Vega series|GCN Vega 20]] <br/>(only Radeon Instinct MI50 / MI60 and Radeon Pro VII) |
| [[Graphics Core Next|GCN]] |
| 1 || 2 || 4
|-
|{{plainlist|
* AMD [[AMD Radeon RX 5000 series|RDNA]]<ref name="hardwareluxx">{{Cite web|url=https://www.hardwareluxx.de/index.php/artikel/hardware/grafikkarten/49892-alles-zu-navi-radeon-rx-5700-xt-ist-rdna-mit-gddr6.html|title=Die RDNA-Architektur - Seite 2|first=Andreas|last=Schilling|website=Hardwareluxx|date=June 10, 2019 }}</ref><ref name="techpowerup">{{Cite web|url=https://www.techpowerup.com/gpu-specs/radeon-rx-5700-xt.c3339|title=AMD Radeon RX 5700 XT Specs|website=TechPowerUp}}</ref>
* AMD [[RDNA 2]]
}}
| [[AMD RDNA Architecture|RDNA]] |
| {{frac|1/|8}} || 2 || 4
|-
| AMD RDNA3 || [[AMD RDNA Architecture|RDNA]] |
| {{frac|1/|8}}? || 4 || 8?
|-
| AMD [[AMD CDNA Architecture|CDNA]] || [[AMD CDNA Architecture|CDNA]] |
| 1 || 4 <br/>(Tensor)<ref name="AMD">{{cite web|url=https://www.amd.com/en/products/server-accelerators/instinct-mi100|title=AMD Instinct MI100 Accelerator}}</ref> || 16
|-
| AMD [[AMD CDNA Architecture|CDNA 2]] || [[AMD CDNA Architecture|CDNA 2]] |
| 4 <br/>(Tensor) || 4 <br/>(Tensor) || 16
|-
! colspan="5" |Intel GPU
|-
| Intel Xe-LP (Iris Xe MAX)<ref name="intel.com">{{cite web|url=https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-the-xe-hpg-architecture.html|title=Introduction to the Xe-HPG Architecture}}</ref> || Xe |
| {{1/2}}? || 2 || 4
|-
| Intel Xe-HPG (Arc Alchemist)<ref name="intel.com"/> || Xe || 0 || 2 || 16
|-
| Intel Xe-HPC (Ponte Vecchio)<ref name="allinfo.space">{{cite web|url=https://allinfo.space/2022/11/09/intel-data-center-gpu-max-ponte-vecchio-starts-in-3-variants-for-supercomputers/|title=Intel Data Center GPU Max|date=November 9, 2022 }}</ref> || Xe || 2 || 2 || 32
|-
| Intel Xe2 (Arc Battlemage) || Xe2 || {{frac|1|8}} || 2 || 16
|-
! colspan="5" |Qualcomm GPU
|-
|Qualcomm [[Adreno]] 5x0
|[[Adreno]] 5xx || 1 || 2 || 4
|1
|2
|4
|-
|Qualcomm [[Adreno]] 6x0
|[[Adreno]] 6xx || 1 || 2 || 4
|1
|2
|4
|-
! colspan="5" |Graphcore
|-
| Graphcore Colossus GC2<ref name="Source 2">{{cite web|url=https://www.youtube.com/watch?v=2IOyQEIlN6Y&t=1361|title=250 TFLOPs/s for two chips with FP16 mixed precision|website=youtube.com|date=October 26, 2018 }}</ref><ref name="Source 3">Archived at [https://ghostarchive.org/varchive/youtube/20211211/7XtBZ4Hsi_M Ghostarchive]{{cbignore}} and the [https://web.archive.org/web/20180119094342/https://www.youtube.com/watch?v=7XtBZ4Hsi_M&gl=US&hl=en Wayback Machine]{{cbignore}}: {{cite web|url=https://www.youtube.com/watch?v=7XtBZ4Hsi_M&t=2208|title=Estimation via power consumption that FP32 is 1/4 of FP16 and that clock frequency is below 1.5GHz|website=youtube.com|date=October 25, 2017 }}{{cbignore}}</ref> |
| ?{{dunno}} || 0 || 16 || 64
|-
|{{plainlist|
* Graphcore Colossus GC200 Mk2<ref name="Source 4">Archived at [https://ghostarchive.org/varchive/youtube/20211211/_zvU0uwIafQ Ghostarchive]{{cbignore}} and the [https://web.archive.org/web/20200716143430/https://www.youtube.com/watch?v=_zvU0uwIafQ Wayback Machine]{{cbignore}}: {{cite web|url=https://www.youtube.com/watch?v=_zvU0uwIafQ|title=Introducing Graphcore's Mk2 IPU systems|website=youtube.com|date=July 15, 2020 }}{{cbignore}}</ref>
 
*Graphcore Bow-2000<ref name="Source 5">{{cite web|url=https://docs.graphcore.ai/projects/bow-2000-datasheet/en/latest/product-description.html#technical-specifications|title=Bow-2000 IPU-Machine|website=docs.graphcore.ai/}}{{cbignore}}</ref>
}}
|| ?{{dunno}} || 0 || 32 || 128
|-
! colspan="5" |[[Supercomputer]]
Line 359 ⟶ 402:
|[[ENIAC]] @ 100&nbsp;kHz in 1945
|
|0.00400385<ref>ENIAC]] @ 100&nbsp;kHz with 385 Flops {{Cite web|title=Computers of Yore|url=https://www.clear.rice.edu/comp201/08-spring/lectures/lec02/computers.shtml|access-date=2021-02-26|website=clear.rice.edu}}</ref> <br/>(~0{{val|2.00000003 6|e=-3|u=FLOPS/[[Watt|upl=W]]}})<ref>consumed 150 kilowatts of power {{Cite web|title=National Museum of the United States Army|url=https://www.thenmusa.org/armyinnovations/innovationeniaccomputer/|access-date=2025-08-08}}</ref>
|
|
Line 371 ⟶ 414:
|60-bit processor @ 10 MHz in [[CDC 6600]] in 1964
|
|0.3 <br/>(FP60)
|
|
Line 377 ⟶ 420:
|60-bit processor @ 10 MHz in [[CDC 7600]] in 1967
|
|1.0 <br/>(FP60)
|
|
Line 383 ⟶ 426:
|[[Cray-1]] @ 80 MHz in 1976
|
|2 <br/>(700 FLOPS/W)
|
|
Line 403 ⟶ 446:
|[[Parallella]] E16 @ 1000 MHz in 2012
|
|2<ref name="Epiphany multi-core coprocessor E16G301 specs">[http://www.adapteva.com/products/silicon-devices/e16g301/ Epiphany-III 16-core 65nm Microprocessor (E16G301)] // [http://www.adapteva.com/author/admin/ admin] (August 19, 2012)</ref> <br/>(5.0&nbsp;GFLOPS/W)<ref name="FeldmanM_(2014)"/>
|
|
Line 409 ⟶ 452:
|[[Parallella]] E64 @ 800 MHz in 2012
|
|2<ref name="Epiphany multi-core coprocessor E64G401 specs">[http://www.adapteva.com/products/silicon-devices/e64g401/ Epiphany-IV 64-core 28nm Microprocessor (E64G401)] // [http://www.adapteva.com/author/admin/ admin] (August 19, 2012)</ref> <br/>(50.0&nbsp;GFLOPS/W)<ref name="FeldmanM_(2014)">{{cite web|url=http://www.hpcwire.com/2012/08/22/adapteva_unveils_64-core_chip/|title=Adapteva Unveils 64-Core Chip|last= Feldman|first=Michael|date=August 22, 2012|publisher=HPCWire|accessdate=September 3, 2014}}</ref>
|
|
Line 422 ⟶ 465:
==Performance records==
===Single computer records===
The [[NEC SX-2]], a [[supercomputer]] developed by [[NEC]] in 1983, achieved gigaFLOPS (GFLOPS) performance with 1.3 [[billion]] FLOPS.<ref>{{Cite web |title=【NEC】 SX-1, SX-2 |url=https://museum.ipsj.or.jp/en/computer/super/0008.html |access-date=2025-08-25 |website=IPSJ Computer Museum |publisher=[[Information Processing Society of Japan]]}}</ref>
 
In June 1997, [[Intel]]'s [[ASCI Red]] was the world's first computer to achieve one teraFLOPS and beyond. Sandia director Bill Camp said that ASCI Red had the best reliability of any supercomputer ever built, and "was supercomputing's high-water mark in longevity, price, and performance".<ref name="jacobsequity.com">{{cite web |title=Sandia's ASCI Red, world's first teraflop supercomputer, is decommissioned |url=http://www.jacobsequity.com/ASCI%20Red%20Supercomputer.pdf |access-date=November 17, 2011 |archive-url=https://web.archive.org/web/20101105131112/http://www.jacobsequity.com/ASCI%20Red%20Supercomputer.pdf |archive-date=November 5, 2010 }}</ref>
 
Line 436 ⟶ 481:
On October 25, 2007, [[NEC]] Corporation of Japan issued a press release announcing its SX series model [[SX-9]],<ref>{{cite news|url=http://www.nec.co.jp/press/en/0710/2501.html|title=NEC Launches World's Fastest Vector Supercomputer, SX-9|date=October 25, 2007|publisher=NEC|access-date=July 8, 2008}}</ref> claiming it to be the world's fastest vector supercomputer. The [[SX-9]] features the first CPU capable of a peak vector performance of 102.4 gigaFLOPS per single core.
 
On February 4, 2008, the [[National Science Foundation|NSF]] and the [[University of Texas at Austin]] opened full scale research runs on an [[AMD]], [[Sun Microsystems|Sun]] supercomputer named [[Texas Advanced Computing Center#Ranger|Ranger]],<ref>{{cite web
|url = http://www.tacc.utexas.edu/resources/hpcsystems/
|title = University of Texas at Austin, Texas Advanced Computing Center
Line 457 ⟶ 502:
In October 2010, China unveiled the [[Tianhe-1]], a supercomputer that operates at a peak computing rate of 2.5 petaFLOPS.<ref>{{cite news| url=https://www.bbc.co.uk/news/technology-11644252 | publisher=BBC News | title=China claims supercomputer crown | date=October 28, 2010}}</ref><ref>{{cite web|last=Dillow |first=Clay |url=http://www.popsci.com/technology/article/2010-10/china-unveils-2507-petaflop-supercomputer-worlds-fastest |title=China Unveils 2507 Petaflop Supercomputer, the World's Fastest |website=Popsci.com |date=October 28, 2010 |access-date=February 9, 2012 }}</ref>
 
{{As of|2010}} the fastest PC [[microprocessor|processor]] reached 109&nbsp;gigaFLOPS ([[Intel Core#Core i7|Intel Core i7]] [[Gulftown (microprocessor)|980 XE]])<ref>{{Cite web |url=http://techgage.com/article/intels_core_i7-980x_extreme_edition_-_ready_for_sick_scores/8 |title=Intel's Core i7-980X Extreme Edition – Ready for Sick Scores?: Mathematics: Sandra Arithmetic, Crypto, Microsoft Excel |website=Techgage |date=March 10, 2010 |access-date=February 9, 2012}}</ref> in double precision calculations. [[Graphics processing unit|GPU]]s are considerably more powerful. For example, [[Nvidia Tesla]] C2050 GPU computing processors perform around 515 gigaFLOPS<ref name="nvidia.com">{{cite web|url=http://www.nvidia.com/object/product_tesla_C2050_C2070_us.html |title=NVIDIA Tesla Personal Supercomputer |publisher=Nvidia.com |access-date=February 9, 2012}}</ref> in double precision calculations, and the AMD FireStream 9270 peaks at 240 gigaFLOPS.<ref name="ati.amd.com">{{cite web|url=https://www.amd.com/us/products/workstation/firestream/firestream-9270/pages/firestream-9270.aspx |title=AMD FireStream 9270 GPU Compute Accelerator |publisher=Amd.com |access-date=February 9, 2012}}</ref>
 
In November 2011, it was announced that Japan had achieved 10.51 petaFLOPS with its [[K computer]].<ref name="Petaflops">{{cite web|url=http://www.fujitsu.com/global/news/pr/archives/month/2011/20111102-02.html |title='K computer' Achieves Goal of 10 Petaflops |publisher=Fujitsu.com |access-date=February 9, 2012}}</ref> It has 88,128 [[SPARC64 VIIIfx]] [[central processing unit|processor]]s in 864 racks, with theoretical performance of 11.28 petaFLOPS. It is named after the Japanese word "[[wikt:京#Japanese|kei]]", which stands for 10 [[1,000,000,000,000,000|quadrillion]],<ref>See [[Japanese numerals#Large numbers|Japanese numbers]]</ref> corresponding to the target speed of 10 petaFLOPS.
Line 474 ⟶ 519:
 
In June 2022, the United States' [[Frontier (supercomputer)|Frontier]] was the most powerful supercomputer on TOP500, reaching 1102 petaFlops (1.102 exaFlops) on the LINPACK benchmarks.
<ref>{{cite web | url=https://en.wikipedia.org/wiki/TOP500 | title=TOP500 }}</ref>{{Circular reference|date=February 2025}}
 
In November 2024, the United States’ [[El Capitan (supercomputer)|El Capitan]] [[Exascale computing|exascale]] [[supercomputer]], hosted at the [[Lawrence Livermore National Laboratory]] in [[Livermore, California|Livermore]], displaced Frontier as the [[TOP500|world's fastest supercomputer]] in the 64th edition of the [[TOP500|Top500 (Nov 2024)]].
Line 515 ⟶ 560:
|${{Inflation|US|1.265|1945|r=3|fmt=c}}T
|[[ENIAC]]: {{US$|long=no|487000}} in 1945 and ${{Inflation|US|487000|1945|fmt=c|r=-3}} in 2023.
|{{US$|long=no|487000}} / {{val|0.000000385|ul=GFLOPS}}. [[Vacuum-tube computer|First-generation]] ([[vacuum tube]]-based) electronic digital computer.
|-
| 1961
Line 521 ⟶ 566:
| ${{Inflation|US|18.672|1961|r=3|fmt=c}}B
| A basic installation of [[IBM 7030 Stretch]] had a cost at the time of {{US$|7.78 million}} each.
| The [[IBM 7030 Stretch]] performs one floating-point multiply every {{val|2.4 |ul=microseconds}}.<ref>{{cite web|url=http://computer-history.info/Page4.dir/pages/IBM.7030.Stretch.dir/ |title=The IBM 7030 (STRETCH) |publisher=Norman Hardy |access-date=February 24, 2017}}</ref> [[Transistor computer|Second-generation]] (discrete [[Transistor computer|transistor]]-based) computer.
|-
| 1964
| $2.3B
| ${{Inflation|US|2.3|1964|r=3|fmt=c}}B
| Base model [[CDC 6600]] price: $6,891,300.
| The CDC 6600 is considered to be the first commercially-successful [[supercomputer]].
|-
| 1984
Line 566 ⟶ 617:
|-
| {{sort|2012/08|August 2012}}
| 75.00¢
| ${{Inflation|US|.75|2012|r=2|fmt=c}}¢
| Quad [[Radeon HD 7000 series|AMD Radeon 7970]] System
| A quad [[AMD]] [[Radeon HD 7000 series|Radeon 7970]] desktop computer reaching 16 TFLOPS of single-precision, 4 TFLOPS of double-precision computing performance. Total system cost was $3000; built using only commercially available hardware.<ref>{{cite web |url=http://www.overclock3d.net/reviews/gpu_displays/hd7970_quadfire_eyefinity_review/12 |title=HD7970 Quadfire Eyefinity Review |date=January 9, 2012 |website=OC3D.net |author=Tom Logan}}</ref>
Line 596 ⟶ 647:
|-
| {{sort|2017/07|June 2017}}
| 6.00¢
| {{Inflation|US|6.00|2017|r=2|fmt=c}}¢
| [[Zen (first generation)|AMD Ryzen 7 1700]] & [[Radeon Pro|AMD Radeon Vega Frontier Edition]] system
Line 695 ⟶ 746:
* [[Moore's law]]
* [[Multiply–accumulate operation]]
* [[Performance per watt#FLOPS per watt|Performance per watt § FLOPS per watt]]
* [[SPECfp]]
* [[SPECint]]