Instruction scheduling: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 00:29, 1 September 2020 edit Artoria2e5 (talk \| contribs) Extended confirmed users, IP block exemptions 38,987 edits →Compiler examples ← Previous edit		Latest revision as of 23:01, 5 July 2025 edit undo Avessa (talk \| contribs) 244 edits →Compiler examples: add uops.info
(13 intermediate revisions by 6 users not shown)
Line 1: {{Short description\|Compiler optimization technique}} {{Refimprove\|date=April 2018}} In [[computer science]], '''instruction scheduling''' is a [[compiler optimization]] used to improve [[instruction-level parallelism]], which improves performance on machines with [[instruction pipeline]]s. Put more simply, it tries to do the following without changing the meaning of the code: Line 47 ⟶ 48: == Compiler examples == The [[GNU Compiler Collection]] is one compiler known to perform instruction scheduling, using the {{code\|-march}} (both instruction set and scheduling) or {{code\|-mtune}} (only scheduling) flags. It uses descriptions of instruction latencies and what instructions can be run in parallel (or equivalently, which "~~ALU~~ port" each use) for each microarchitecture to perform the task. This feature is available to almost all architectures that GCC supports.<ref>{{cite web \|title=x86 Options \|url=https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html \|website=Using the GNU Compiler Collection (GCC)}}</ref> Until version 12.0.0, the instruction scheduling in [[LLVM]]/Clang could only accept a {{code\|-march}} (called {{code\|target-cpu}} in LLVM parlance) switch for both instruction set and scheduling. Version 12 adds support for {{code\|-mtune}} ({{code\|tune-cpu}}) for x86 only.<ref>{{cite web \|title=⚙ D85384 [X86] Add basic support for -mtune command line option in clang \|url=http://reviews.llvm.org/D85384 \|website=reviews.llvm.org}}</ref> Line 54 ⟶ 55: * GCC and LLVM; * [[Agner Fog]], who compiles extensive data for the [[x86 architecture]];<ref name="optimize">{{cite web \|title=Software optimization resources. C++ and assembly. Windows, Linux, BSD, Mac OS X \|url=https://www.agner.org/optimize/ \|website=Agner Fog}}</ref> * InstLatx64, which uses [[AIDA64]] to collect data on x86 CPUs.<ref>{{cite web \|title=x86, x64 Instruction Latency, Memory Latency and CPUID dumps \|url=http://instlatx64.atw.hu/ \|website=instlatx64.atw.hu}} See also the "Comments" link on the page.</ref> * uops.info, which provides latency, throughput, and port usage information for x86 microarchitectures.<ref>[https://uops.info/ uops.info]</ref><ref>{{Cite conference \|title=uops.info: Characterizing Latency, Throughput, and Port Usage of Instructions on Intel Microarchitectures \|last1=Abel \|first1=Andreas \|last2=Reineke \|first2=Jan \|date=2019 \|conference=[[International Conference on Architectural Support for Programming Languages and Operating Systems]], Providence, RI, USA, April 13–17, 2019 \|arxiv=1810.04610 \|doi=10.1145/3297858.3304062 \|isbn=978-1-4503-6240-5 \|pages=673-686 \|book-title=ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems \|publication-date=April 2019 \|publication-place=New York, NY, USA \|publisher=[[Association for Computing Machinery\|ACM]] }}</ref> LLVM's {{code\|llvm-exegesis}} should be usable on all machines, especially to gather information on non-x86 ~~machines~~ones.<ref>{{cite web \|title=llvm-exegesis - LLVM Machine Instruction Benchmark \|url=https://llvm.org/docs/CommandGuide/llvm-exegesis.html \|website=LLVM 12 Documentation}}</ref> ==See also == Line 62 ⟶ 64: * [[Code generation (compiler)\|Code generation]] * [[Instruction unit]] * [[Out-of-order execution]] ==References== Line 69 ⟶ 72: ==Further reading== * {{cite journal \|author-first=Joseph A. \|author-last=Fisher \|author-link=Joseph A. Fisher \|title=Trace Scheduling: A Technique for Global Microcode Compaction \|journal=IEEE Transactions on Computers \|volume=30 \|number=7 \|pages=478–490 \|date=1981 \|doi=10.1109/TC.1981.1675827 \|s2cid=1650655 }} (''[[Trace scheduling]]'') * {{cite journal \|author-first1=Alexandru \|author-last1=Nicolau \|author-first2=Joseph A. \|author-last2=Fisher \|author-link2=Joseph A. Fisher \|title=Measuring the Parallelism Available for Very Long Instruction Word Architectures \|journal=IEEE Transactions on Computers \|volume=33 \|number=11 \|date=1984}} (''Percolation scheduling'') * {{cite journal \|author-first1=David \|author-last1=Bernstein \|author-first2=Michael \|author-last2=Rodeh \|title=Global Instruction Scheduling for Superscalar Machines \|journal=Proceedings of the ACM, SIGPLAN '91 Conference on Programming Language Design and Implementation \|date=June 1991\|url=http://pages.cs.wisc.edu/~fischer/cs701.f06/berstein_rodeh.pdf}} (''Global scheduling'') * {{cite web \|last1=Cordes \|first1=Peter \|title=assembly - Instruction reordering in x86 / x64 asm - performance optimisation with latest CPUs \|url=https://stackoverflow.com/a/45970664 \|website=Stack Overflow}} {{Compiler optimizations}}