Floating point operations per second: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 00:16, 12 February 2025 edit Cedar101 (talk \| contribs) Extended confirmed users 18,555 edits m →Floating-point operations per clock cycle for various processors: <br/> Tag: Visual edit ← Previous edit		Latest revision as of 06:22, 26 August 2025 edit undo Zac67 (talk \| contribs) Extended confirmed users 11,933 edits Partial restore with corrected model
(30 intermediate revisions by 18 users not shown)
Line 5: '''Floating point operations per second''' ('''FLOPS''', '''flops''' or '''flop/s''') is a measure of [[computer performance]] in [[computing]], useful in fields of scientific computations that require [[floating-point]] calculations.<ref>{{cite web \|title=Understand measures of supercomputer performance and storage system capacity \|url=https://kb.iu.edu/d/apeq \|website=kb.iu.edu \|access-date=23 March 2024}}</ref> For such cases, it is a more accurate measure than ~~measuring~~ [[instructions per second]].{{cn\|date=March 2024}} ==Floating-point arithmetic== {{Anchor\|multipliers}} {\| class="wikitable floatright sortable" \|+ Multipliers for flops Line 23 ⟶ 24: \|- \| [[Giga-\|giga]]FLOPS \| GFLOPS<ref>{{cite web \| title = GPU GFLOPS Statistics 2007-2025: NVIDIA AMD Intel \| url = https://gpus.axiomgaming.net/gflops-statistics \| website = Axiom Gaming \| publisher = Axiom Gaming \| access-date = 14 August 2025}}</ref> ~~\| GFLOPS~~ \| 10<sup>9</sup> \|- Line 72 ⟶ 73: : <math>\text{FLOPS} = \text{cores} \times \frac{\text{cycles}}{ \text{second}} \times \frac{\text{FLOPs}}{\text{cycle}}.</math> FLOPS can be recorded in different measures of precision, for example, the [[TOP500]] supercomputer list ranks computers by 64 -bit ([[double-precision floating-point format]]) operations per second, abbreviated to ''FP64''.<ref name="top500faq">{{cite web \|title=FREQUENTLY ASKED QUESTIONS \|url=https://www.top500.org/resources/frequently-asked-questions/ \|website=top500.org \|access-date=June 23, 2020}}</ref> Similar measures are available for [[Single-precision floating-point format\|32-bit]] (''FP32'') and [[Half-precision floating-point format\|16-bit]] (''FP16'') operations. {{anchor\|FLOPSforProcessors}} Line 90 ⟶ 91: \|- \|[[Intel 80486]] \|[[x87]] (3280-bit) \| {{dunno}} \|0.128<ref name=":1" /> Line 99 ⟶ 100: Intel [[P6 (microarchitecture)\|P6]] [[Pentium Pro]] }} \|[[x87]] (3280-bit) \| {{dunno}} \|0.5<ref name=":1">{{Cite web\|title=home.iae.nl \|url=http://home.iae.nl/users/mhx/flops_4.tbl\|access-date=\|website=}}</ref> Line 108 ⟶ 109: Intel [[P6 (microarchitecture)\|P6]] [[Pentium II]] }} \|[[~~MMX (instruction set)\|MMX~~x87]] (6480-bit) \| {{dunno}} \|1<ref name=":0">{{Cite web\|title=Computing Power throughout History\|url=https://www.alternatewars.com/BBOW/Computing/Computing_Power.htm\|access-date=2021-02-13\|website=alternatewars.com}}</ref> Line 157 ⟶ 158: \| 8 \|\| 16 \|\| 0 \|- \|{{~~plainlist~~ublist\| \| Intel [[Haswell (microarchitecture)\|Haswell]]<ref name="tpeak_jos"/> ([[Haswell (microarchitecture)\|Haswell]], [[Haswell (microarchitecture)\|Devil's Canyon]], [[Broadwell (microarchitecture)\|Broadwell]]) \| Intel [[Skylake (microarchitecture)\|Skylake]] <br/>([[Skylake (microarchitecture)\|Skylake]], [[Kaby Lake]], [[Coffee Lake]], [[Comet Lake (microprocessor)\|Comet Lake]], [[Whiskey Lake (microarchitecture)\|Whiskey Lake]], [[Amber Lake (microarchitecture)\|Amber Lake]]) }} \|[[Advanced Vector Extensions\|AVX2]] & [[FMA instruction set\|FMA]] (256-bit) Line 172 ⟶ 173: * Intel [[Ice Lake (microprocessor)\|Ice Lake]], [[Tiger Lake (microprocessor)\|Tiger Lake]] and [[Rocket Lake]] }} \| [[Advanced Vector Extensions\|AVX-512]] & [[FMA instruction set\|FMA]] (512-bit) \| 32 \|\| 64 \|\| 0 \|- ! colspan="5" \|AMD CPU Line 181 ⟶ 183: AMD [[Jaguar (microarchitecture)\|Jaguar]] AMD [[Puma (microarchitecture)\|Puma]] }} \|[[Advanced Vector Extensions\|AVX]] (128-bit) ~~}} \|~~\| 4 \|\| 8 \|\| 0 \|- \|AMD [[AMD 10h\|K10]] \|[[SSE4\|SSE4/4a]] (128-bit) \|\| 4 \|\| 8 \|\| 0 \|- \| AMD [[Bulldozer (microarchitecture)\|Bulldozer]]<ref name="tpeak_jos" /> <br/>([[Piledriver (microarchitecture)\|Piledriver]], [[Steamroller (microarchitecture)\|Steamroller]], [[Excavator (microarchitecture)\|Excavator]]) \|{{~~plainlist~~ublist\| \|[[Advanced Vector Extensions\|AVX]] (128-bit) <br/>(Bulldozer, Steamroller) \|[[AVX2]] (128-bit) (Excavator) \|[[FMA instruction set\|FMA3]] (Bulldozer)<ref>{{Cite web\|url=https://developer.amd.com/wordpress/media/2012/10/New-Bulldozer-and-Piledriver-Instructions.pdf\|title=New instructions support for Bulldozer (FMA3) and Piledriver (FMA3+4 and CVT, BMI, TB M)}}</ref> \|[[FMA instruction set\|FMA3/4]] (Piledriver, Excavator)▼ ~~M)}}</ref>~~ ▲[[FMA instruction set\|FMA3/4]] (Piledriver, Excavator) }} \| 4 \|\| 8 \|\| 0 \|- \|{{~~plainlist~~ublist\| \|AMD [[Zen (microarchitecture)\|Zen]] <br/>(Ryzen 1000 series, Threadripper 1000 series, Epyc [[Epyc\|Naples]]) \|AMD [[Zen+]]<ref name="tpeak_jos"/><ref>{{Cite web \| url=http://www.agner.org/optimize/blog/read.php?i=838 \| title=Agner's CPU blog - Test results for AMD Ryzen}}</ref><ref>https://arstechnica.com/gadgets/2017/03/amds-moment-of-zen-finally-an-architecture-that-can-compete/2/ "each core now has a pair of 128-bit FMA units of its own"</ref><ref>{{cite conference \|url=https://www.hotchips.org/wp-content/uploads/hc_archives/hc28/HC28.23-Tuesday-Epub/HC28.23.90-High-Perform-Epub/HC28.23.930-X86-core-MikeClark-AMD-final_v2-28.pdf#page=7 \|title=A New x86 Core Architecture for the Next Generation of Computing \|author=Mike Clark \|date=August 23, 2016 \|publisher=AMD \|conference=HotChips 28 \|access-date=October 8, 2017 \|archive-date=July 31, 2020 \|archive-url=https://web.archive.org/web/20200731171730/https://www.hotchips.org/wp-content/uploads/hc_archives/hc28/HC28.23-Tuesday-Epub/HC28.23.90-High-Perform-Epub/HC28.23.930-X86-core-MikeClark-AMD-final_v2-28.pdf#page=7 \|url-status=dead }} [https://web.archive.org/web/20161209125020/http://images.anandtech.com/doci/10591/HC28.AMD.Mike%20Clark.final-page-007.jpg page 7]</ref> <br/>(Ryzen 2000 series, Threadripper 2000 series) }} \| [[Advanced Vector Extensions\|AVX2]] & [[FMA instruction set\|FMA]] <br/>(128-bit, 256-bit decoding)<ref>{{Cite web \|title=The microarchitecture of Intel and AMD CPUs \|url=https://www.agner.org/optimize/microarchitecture.pdf}}</ref> \| 8 \|\| 16 \|\| 0 \|- \|{{~~plainlist~~ublist\| \|AMD [[Zen 2]]<ref name="www.youtube.com">{{cite web \|url=https://www.youtube.com/watch?v=_96stDCb-mk&t=3299 \|title=AMD CEO Lisa Su's COMPUTEX 2019 Keynote \|archive-url=https://ghostarchive.org/varchive/youtube/20211211/_96stDCb-mk\| archive-date=2021-12-11 \|url-status=live \|website=youtube.com\|date=May 27, 2019 }}{{cbignore}}</ref> <br/>(Ryzen 3000 series, Threadripper 3000 series, Epyc [[Epyc\|Rome]])) \|AMD [[Zen 3]] <br/>(Ryzen 5000 series, Epyc [[Epyc\|Milan]]) }} \| [[Advanced Vector Extensions\|AVX2]] & [[FMA instruction set\|FMA]] (256-bit) \| 16 \|\| 32 \|\| 0 \|- \|- \|{{ublist\| \|AMD [[Zen 4]]<br/>(Ryzen 7000 series, Threadripper 7000 series, Epyc [[Epyc\|Genoa]],[[Epyc\|Bergamo]], [[Epyc\|Siena]]) }} \| [[Advanced Vector Extensions\|AVX-512]] & [[FMA instruction set\|FMA]] (256-bit) \| 16 \|\| 32 \|\| 0 \|- \|{{ublist\| \|AMD [[Zen 5]]<ref>{{Cite web \| url=https://community.amd.com/t5/server-processors/leadership-hpc-performance-with-5th-generation-amd-epyc/ba-p/739498 \| title=Leadership HPC Performance with 5th Generation AMD EPYC Processors}}</ref><br/>(Ryzen 9000 series, Threadripper 9000 series, Epyc [[Epyc\|Turin]]) }} \| [[Advanced Vector Extensions\|AVX-512]] & [[FMA instruction set\|FMA]] (512-bit) \| 32 \|\| 64 \|\| 0 \|- ! colspan="5" \|ARM CPU Line 257 ⟶ 272: \|[[Parallel Thread Execution\|PTX]] \|\| {{dunno}} \|\| 2 \|\| {{dunno}} \|- \| Nvidia [[Fermi (microarchitecture)\|Fermi]] (only GeForce GTX 465–480, 560 Ti, 570–590) \| \| [[Parallel Thread Execution\|PTX]] \| {{1/4}}<br/>(locked by driver,<br/>1 in hardware) \|\| 2 \|\| 0 \|- \| Nvidia [[Fermi (microarchitecture)\|Fermi]] (only Quadro 600–2000) \| \| [[Parallel Thread Execution\|PTX]] \| {{frac\|1\|8}} \|\| 2 \|\| 0 \|- \| Nvidia [[Fermi (microarchitecture)\|Fermi]] (only Quadro 4000–7000, Tesla) \| \| [[Parallel Thread Execution\|PTX]] \| 1 \|\| 2 \|\| 0 \|- Line 276 ⟶ 297: \| {{2/3}} \|\| 2 \|\| 0 \|- \|{{~~plainlist~~ublist\| \| Nvidia [[Maxwell (microarchitecture)\|Maxwell]] \| Nvidia [[Pascal (microarchitecture)\|Pascal]] <br/>(all except Quadro GP100 and Tesla P100) }} \| [[Parallel Thread Execution\|PTX]] \|\| {{frac\|1\|16}} \|\| 2 \|\| {{frac\|1\|32}} Line 300 ⟶ 321: Nvidia [[Ada Lovelace (microarchitecture)\|Ada Lovelace]] }} \| [[Parallel Thread Execution\|PTX]] \|\| {{frac\|1\|32}} \|\| {{nowrap\|2 (FP32) + 0 (INT32)}}<br/>''or''<br/>{{nowrap\|1 (FP32) + 1 (INT32)}} \|\| 8 \|- \| Nvidia [[Hopper (microarchitecture)\|Hopper]] \|\| [[Parallel Thread Execution\|PTX]] \|\| 2 \|\| 2 (FP32) + 1 (INT32) \|\| 32 \|- ! colspan="5" \|AMD GPU Line 313 ⟶ 336: \|[[TeraScale (microarchitecture)#TeraScale 3\|TeraScale 3]] \|\| 1 \|\| 4 \|\| {{dunno}} \|- \| AMD [[Graphics Core Next\|GCN]] <br/>(only Radeon Pro W 8100–9100) \| \| [[Graphics Core Next\|GCN]] \|\| 1 \|\| 2 \|\| {{dunno}} \|- \| AMD [[Graphics Core Next\|GCN]] <br/>(all except Radeon Pro W 8100–9100, Vega 10–20) \| \| [[Graphics Core Next\|GCN]] \|\| {{frac\|1\|8}} \|\| 2 \|\| 4 \|- \| AMD [[AMD RX Vega series\|GCN Vega 10]] \|\| [[Graphics Core Next\|GCN]] \|\| {{frac\|1\|8}} \|\| 2 \|\| 4 \|- \| AMD [[AMD RX Vega series\|GCN Vega 20]] <br/>(only Radeon VII) \|\| [[Graphics Core Next\|GCN]] \| {{1/2}}<br/>(locked by driver,<br/>1 in hardware) \|\| 2 \|\| 4 \|- \| AMD [[AMD RX Vega series\|GCN Vega 20]]<br/>(only Radeon Instinct MI50 / MI60 and Radeon Pro VII) ~~(only Radeon Instinct MI50 / MI60 and Radeon Pro VII)~~ \| [[Graphics Core Next\|GCN]] \| 1 \|\| 2 \|\| 4 Line 378 ⟶ 402: \|[[ENIAC]] @ 100 kHz in 1945 \| \|0.~~004~~00385<ref>ENIAC @ 100 kHz with 385 Flops {{Cite web\|title=Computers of Yore\|url=https://www.clear.rice.edu/comp201/08-spring/lectures/lec02/computers.shtml\|access-date=2021-02-26\|website=clear.rice.edu}}</ref><br/>(~0{{val\|2.~~00000003~~ 6\|e=-3\|u=FLOPS~~/[[Watt~~\|upl=W]]}})<ref>consumed 150 kilowatts of power {{Cite web\|title=National Museum of the United States Army\|url=https://www.thenmusa.org/armyinnovations/innovationeniaccomputer/\|access-date=2025-08-08}}</ref> \| \| Line 422 ⟶ 446: \|[[Parallella]] E16 @ 1000 MHz in 2012 \| \|2<ref name="Epiphany multi-core coprocessor E16G301 specs">[http://www.adapteva.com/products/silicon-devices/e16g301/ Epiphany-III 16-core 65nm Microprocessor (E16G301)] // [http://www.adapteva.com/author/admin/ admin] (August 19, 2012)</ref> <br/>(5.0 GFLOPS/W)<ref name="FeldmanM_(2014)"/> \| \| Line 428 ⟶ 452: \|[[Parallella]] E64 @ 800 MHz in 2012 \| \|2<ref name="Epiphany multi-core coprocessor E64G401 specs">[http://www.adapteva.com/products/silicon-devices/e64g401/ Epiphany-IV 64-core 28nm Microprocessor (E64G401)] // [http://www.adapteva.com/author/admin/ admin] (August 19, 2012)</ref> <br/>(50.0 GFLOPS/W)<ref name="FeldmanM_(2014)">{{cite web\|url=http://www.hpcwire.com/2012/08/22/adapteva_unveils_64-core_chip/\|title=Adapteva Unveils 64-Core Chip\|last= Feldman\|first=Michael\|date=August 22, 2012\|publisher=HPCWire\|accessdate=September 3, 2014}}</ref> \| \| Line 441 ⟶ 465: ==Performance records== ===Single computer records=== The [[NEC SX-2]], a [[supercomputer]] developed by [[NEC]] in 1983, achieved gigaFLOPS (GFLOPS) performance with 1.3 [[billion]] FLOPS.<ref>{{Cite web \|title=【NEC】 SX-1, SX-2 \|url=https://museum.ipsj.or.jp/en/computer/super/0008.html \|access-date=2025-08-25 \|website=IPSJ Computer Museum \|publisher=[[Information Processing Society of Japan]]}}</ref> In June 1997, [[Intel]]'s [[ASCI Red]] was the world's first computer to achieve one teraFLOPS and beyond. Sandia director Bill Camp said that ASCI Red had the best reliability of any supercomputer ever built, and "was supercomputing's high-water mark in longevity, price, and performance".<ref name="jacobsequity.com">{{cite web \|title=Sandia's ASCI Red, world's first teraflop supercomputer, is decommissioned \|url=http://www.jacobsequity.com/ASCI%20Red%20Supercomputer.pdf \|access-date=November 17, 2011 \|archive-url=https://web.archive.org/web/20101105131112/http://www.jacobsequity.com/ASCI%20Red%20Supercomputer.pdf \|archive-date=November 5, 2010 }}</ref> Line 455 ⟶ 481: On October 25, 2007, [[NEC]] Corporation of Japan issued a press release announcing its SX series model [[SX-9]],<ref>{{cite news\|url=http://www.nec.co.jp/press/en/0710/2501.html\|title=NEC Launches World's Fastest Vector Supercomputer, SX-9\|date=October 25, 2007\|publisher=NEC\|access-date=July 8, 2008}}</ref> claiming it to be the world's fastest vector supercomputer. The [[SX-9]] features the first CPU capable of a peak vector performance of 102.4 gigaFLOPS per single core. On February 4, 2008, the [[National Science Foundation\|NSF]] and the [[University of Texas at Austin]] opened full scale research runs on an [[AMD]], [[Sun Microsystems\|Sun]] supercomputer named ~~[[Texas Advanced Computing Center#~~Ranger~~\|Ranger]]~~,<ref>{{cite web \|url = http://www.tacc.utexas.edu/resources/hpcsystems/ \|title = University of Texas at Austin, Texas Advanced Computing Center Line 476 ⟶ 502: In October 2010, China unveiled the [[Tianhe-1]], a supercomputer that operates at a peak computing rate of 2.5 petaFLOPS.<ref>{{cite news\| url=https://www.bbc.co.uk/news/technology-11644252 \| publisher=BBC News \| title=China claims supercomputer crown \| date=October 28, 2010}}</ref><ref>{{cite web\|last=Dillow \|first=Clay \|url=http://www.popsci.com/technology/article/2010-10/china-unveils-2507-petaflop-supercomputer-worlds-fastest \|title=China Unveils 2507 Petaflop Supercomputer, the World's Fastest \|website=Popsci.com \|date=October 28, 2010 \|access-date=February 9, 2012 }}</ref> {{As of\|2010}} the fastest PC [[microprocessor\|processor]] reached 109 gigaFLOPS (~~[[Intel Core#Core i7\|~~Intel Core i7]] [[Gulftown (microprocessor)\|980 XE]])<ref>{{Cite web \|url=http://techgage.com/article/intels_core_i7-980x_extreme_edition_-_ready_for_sick_scores/8 \|title=Intel's Core i7-980X Extreme Edition – Ready for Sick Scores?: Mathematics: Sandra Arithmetic, Crypto, Microsoft Excel \|website=Techgage \|date=March 10, 2010 \|access-date=February 9, 2012}}</ref> in double precision calculations. [[Graphics processing unit\|GPU]]s are considerably more powerful. For example, [[Nvidia Tesla]] C2050 GPU computing processors perform around 515 gigaFLOPS<ref name="nvidia.com">{{cite web\|url=http://www.nvidia.com/object/product_tesla_C2050_C2070_us.html \|title=NVIDIA Tesla Personal Supercomputer \|publisher=Nvidia.com \|access-date=February 9, 2012}}</ref> in double precision calculations, and the AMD FireStream 9270 peaks at 240 gigaFLOPS.<ref name="ati.amd.com">{{cite web\|url=https://www.amd.com/us/products/workstation/firestream/firestream-9270/pages/firestream-9270.aspx \|title=AMD FireStream 9270 GPU Compute Accelerator \|publisher=Amd.com \|access-date=February 9, 2012}}</ref> In November 2011, it was announced that Japan had achieved 10.51 petaFLOPS with its [[K computer]].<ref name="Petaflops">{{cite web\|url=http://www.fujitsu.com/global/news/pr/archives/month/2011/20111102-02.html \|title='K computer' Achieves Goal of 10 Petaflops \|publisher=Fujitsu.com \|access-date=February 9, 2012}}</ref> It has 88,128 [[SPARC64 VIIIfx]] [[central processing unit\|processor]]s in 864 racks, with theoretical performance of 11.28 petaFLOPS. It is named after the Japanese word "[[wikt:京#Japanese\|kei]]", which stands for 10 [[1,000,000,000,000,000\|quadrillion]],<ref>See [[Japanese numerals#Large numbers\|Japanese numbers]]</ref> corresponding to the target speed of 10 petaFLOPS. Line 534 ⟶ 560: \|${{Inflation\|US\|1.265\|1945\|r=3\|fmt=c}}T \|[[ENIAC]]: {{US$\|long=no\|487000}} in 1945 and ${{Inflation\|US\|487000\|1945\|fmt=c\|r=-3}} in 2023. \|{{US$\|long=no\|487000}} / {{val\|0.000000385\|ul=GFLOPS}}. [[Vacuum-tube computer\|First-generation]] ([[vacuum tube]]-based) electronic digital computer. \|- \| 1961 Line 540 ⟶ 566: \| ${{Inflation\|US\|18.672\|1961\|r=3\|fmt=c}}B \| A basic installation of [[IBM 7030 Stretch]] had a cost at the time of {{US$\|7.78 million}} each. \| The [[IBM 7030 Stretch]] performs one floating-point multiply every {{val\|2.4 \|ul=microseconds}}.<ref>{{cite web\|url=http://computer-history.info/Page4.dir/pages/IBM.7030.Stretch.dir/ \|title=The IBM 7030 (STRETCH) \|publisher=Norman Hardy \|access-date=February 24, 2017}}</ref> [[Transistor computer\|Second-generation]] (discrete [[~~Transistor computer\|~~transistor]]-based) computer. \|- \| 1964 \| $2.~~3[[billion\|B]]~~3B \| ${{Inflation\|US\|2.3\|1964\|r=3\|fmt=c}}B \| Base model [[CDC 6600]] price: $6,891,300. Line 591 ⟶ 617: \|- \| {{sort\|2012/08\|August 2012}} \| 75~~.00~~¢ \| ${{Inflation\|US\|.75\|2012\|r=2\|fmt=c}}¢ \| Quad [[Radeon HD 7000 series\|AMD Radeon 7970]] System \| A quad [[AMD]] [[Radeon HD 7000 series\|Radeon 7970]] desktop computer reaching 16 TFLOPS of single-precision, 4 TFLOPS of double-precision computing performance. Total system cost was $3000; built using only commercially available hardware.<ref>{{cite web \|url=http://www.overclock3d.net/reviews/gpu_displays/hd7970_quadfire_eyefinity_review/12 \|title=HD7970 Quadfire Eyefinity Review \|date=January 9, 2012 \|website=OC3D.net \|author=Tom Logan}}</ref> Line 621 ⟶ 647: \|- \| {{sort\|2017/07\|June 2017}} \| 6~~.00~~¢ \| {{Inflation\|US\|6.00\|2017\|r=2\|fmt=c}}¢ \| [[Zen (first generation)\|AMD Ryzen 7 1700]] & [[Radeon Pro\|AMD Radeon Vega Frontier Edition]] system Line 720 ⟶ 746: * [[Moore's law]] * [[Multiply–accumulate operation]] * [[Performance per watt#FLOPS per watt\|Performance per watt § FLOPS per watt]] * [[SPECfp]] * [[SPECint]]