Instruction scheduling: Difference between revisions

Content deleted Content added
Avessa (talk | contribs)
Compiler examples: add uops.info
 
(13 intermediate revisions by 6 users not shown)
Line 1:
{{Short description|Compiler optimization technique}}
{{Refimprove|date=April 2018}}
In [[computer science]], '''instruction scheduling''' is a [[compiler optimization]] used to improve [[instruction-level parallelism]], which improves performance on machines with [[instruction pipeline]]s. Put more simply, it tries to do the following without changing the meaning of the code:
Line 47 ⟶ 48:
 
== Compiler examples ==
The [[GNU Compiler Collection]] is one compiler known to perform instruction scheduling, using the {{code|-march}} (both instruction set and scheduling) or {{code|-mtune}} (only scheduling) flags. It uses descriptions of instruction latencies and what instructions can be run in parallel (or equivalently, which "ALU port" each use) for each microarchitecture to perform the task. This feature is available to almost all architectures that GCC supports.<ref>{{cite web |title=x86 Options |url=https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html |website=Using the GNU Compiler Collection (GCC)}}</ref>
 
Until version 12.0.0, the instruction scheduling in [[LLVM]]/Clang could only accept a {{code|-march}} (called {{code|target-cpu}} in LLVM parlance) switch for both instruction set and scheduling. Version 12 adds support for {{code|-mtune}} ({{code|tune-cpu}}) for x86 only.<ref>{{cite web |title=⚙ D85384 [X86] Add basic support for -mtune command line option in clang |url=http://reviews.llvm.org/D85384 |website=reviews.llvm.org}}</ref>
Line 54 ⟶ 55:
* GCC and LLVM;
* [[Agner Fog]], who compiles extensive data for the [[x86 architecture]];<ref name="optimize">{{cite web |title=Software optimization resources. C++ and assembly. Windows, Linux, BSD, Mac OS X |url=https://www.agner.org/optimize/ |website=Agner Fog}}</ref>
* InstLatx64, which uses [[AIDA64]] to collect data on x86 CPUs.<ref>{{cite web |title=x86, x64 Instruction Latency, Memory Latency and CPUID dumps |url=http://instlatx64.atw.hu/ |website=instlatx64.atw.hu}} See also the "Comments" link on the page.</ref>
* uops.info, which provides latency, throughput, and port usage information for x86 microarchitectures.<ref>[https://uops.info/ uops.info]</ref><ref>{{Cite conference |title=uops.info: Characterizing Latency, Throughput, and Port Usage of Instructions on Intel Microarchitectures |last1=Abel |first1=Andreas |last2=Reineke |first2=Jan |date=2019 |conference=[[International Conference on Architectural Support for Programming Languages and Operating Systems]], Providence, RI, USA, April 13–17, 2019 |arxiv=1810.04610 |doi=10.1145/3297858.3304062 |isbn=978-1-4503-6240-5 |pages=673-686 |book-title=ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems |publication-date=April 2019 |publication-place=New York, NY, USA |publisher=[[Association for Computing Machinery|ACM]] }}</ref>
 
LLVM's {{code|llvm-exegesis}} should be usable on all machines, especially to gather information on non-x86 machinesones.<ref>{{cite web |title=llvm-exegesis - LLVM Machine Instruction Benchmark |url=https://llvm.org/docs/CommandGuide/llvm-exegesis.html |website=LLVM 12 Documentation}}</ref>
 
==See also ==
Line 62 ⟶ 64:
* [[Code generation (compiler)|Code generation]]
* [[Instruction unit]]
* [[Out-of-order execution]]
 
==References==
Line 69 ⟶ 72:
 
==Further reading==
* {{cite journal |author-first=Joseph A. |author-last=Fisher |author-link=Joseph A. Fisher |title=Trace Scheduling: A Technique for Global Microcode Compaction |journal=IEEE Transactions on Computers |volume=30 |number=7 |pages=478–490 |date=1981 |doi=10.1109/TC.1981.1675827 |s2cid=1650655 }} (''[[Trace scheduling]]'')
* {{cite journal |author-first1=Alexandru |author-last1=Nicolau |author-first2=Joseph A. |author-last2=Fisher |author-link2=Joseph A. Fisher |title=Measuring the Parallelism Available for Very Long Instruction Word Architectures |journal=IEEE Transactions on Computers |volume=33 |number=11 |date=1984}} (''Percolation scheduling'')
* {{cite journal |author-first1=David |author-last1=Bernstein |author-first2=Michael |author-last2=Rodeh |title=Global Instruction Scheduling for Superscalar Machines |journal=Proceedings of the ACM, SIGPLAN '91 Conference on Programming Language Design and Implementation |date=June 1991|url=http://pages.cs.wisc.edu/~fischer/cs701.f06/berstein_rodeh.pdf}} (''Global scheduling'')
* {{cite web |last1=Cordes |first1=Peter |title=assembly - Instruction reordering in x86 / x64 asm - performance optimisation with latest CPUs |url=https://stackoverflow.com/a/45970664 |website=Stack Overflow}}
 
{{Compiler optimizations}}